ImageKit engineering team, culture, and tech stack

We started ImageKit.io back in Oct 2016 with a simple goal - to help companies and developers quickly implement image optimization and real-time resizing in their web applications without worrying about scale and investing time in R&D.

Fast forward today, over 700 companies and 60000+ developers in 40+ countries use our product. With our customers spread across different timezones, ImageKit.io processes millions of image requests every hour, day and night, on six geographically distributed processing regions. Many small, medium & big e-commerce, travel, and news & media sites rely on our availability for their business.

Current engineering team & culture

We have a small team of six engineers working out of our office in the WeWork Gurgaon. One is a dedicated DevOps engineer, and everyone else is a full-stack engineer. This team is responsible for rolling out new features, fixing bugs, refactoring existing code, and ensuring 99.9% uptime for ImageKit.

A few things about the team:

Every member brings a different experience and skillset on the table. Their educational background doesn't matter. The practical knowledge, problem-solving skills, technologies, or side projects that they have worked on is more important.
Everyone writes test cases for their code. Every release is subject to thousands of independent tests to ensure that everything is working fine. And the reason for this is that for images, it is easy to miss a small error. Let's say rounding off of the width by 1px. Such errors cannot be caught in manual testing, and given the different variations and image can have through the image transformation features available in our product. This has allowed us to push changes to production without any headaches.
There are no fix check-in and check-out times when coming to work, but we do have an overlap for a few hours when everyone is working. Everyone works more or less independently. It didn't cause us much trouble when a 21 day nationwide lockdown was enforced in India on 24 March due to the COVID-19 situation. We started working from home even before that announcement to be on a safer side. The only thing we miss is those water cooler talks.
Mistakes are bound to happen. They are not frowned upon, as long as we learn from them and take the necessary steps not to repeat them in the future.
Everyone has a common goal - to make the product better for our customers. We share responsibility for the highs and lows and go out of our way to help each other release something that might require urgent attention.
Ideas, feedback, and memes are exchanged freely on the team's Slack channel.
Since all the founders have a technical background, it puts the engineering team in a perfect position. The leadership understands and speaks the same language. After all, running a SaaS business is a dream for many developers, we are lucky to be in this spot, working hard with great minds and enjoying this.
FIFA used to be a daily thing before this lockdown. We are still figuring out the next game we can all play together while working from home.

Tech stack

We follow a greedy approach when it comes to product development - with the bigger picture in mind, what's the minimum viable product or feature that can start adding value for our customers. Time and again, it has saved us a tremendous amount of time to start with a few objective facts and develop an MVP quickly.

This means that we don't worry about that "imaginary" problems we will have at an "imaginary" scale after we roll out a feature which doesn't exist yet.

Of course, there are systems that are designed for a bare minimum scale and functionality from day one. For example - our image resizing engine, log processing & monitoring systems. But more than often, we start with common sense. We build quickly, get feedback from customers, and iterate.

The way our product is being used, it creates a unique set of challenges like:

The response times for old and new images should be in milliseconds most of the time.
We cannot afford to have extended downtimes, unlike a project management system.
We need to account for sudden unplanned spikes from a single customer running a sale on their website. Such an incident can raise the traffic to almost 10x the normal levels in a matter of seconds.
Faster mitigation and failover plans - across multiple geographical regions.

Our choice of programming language & tools for the task in hand mainly depends upon the nature of the task, team's experience, and the community around that tool. Here is an overview of what we use on systems:

The majority of our backend code is written in Node.js. Yes, we love Javascript!
The underlying image processing libraries are standard ones written in C, C++. We have made improvements in these libraries if needed for our use cases and do not hesitate to contribute back to open source if it seems a good fit and of-course if accepted by the community.
For different frontend dashboards, we use Backbone.js (will get phased out soon) and React.
Most of our infrastructure is on AWS with small parts on other cloud providers. The infrastructure can integrate with all leading CDN providers, even outside of AWS.
We use Python for log processing and drawing analytics from CDN logs. Millions of rows are processed, aggregated, and analytics are stored every few hours. Thanks to the speed of pandas, the costs for such analytics are minimal.
HAProxy for reverse proxy & load balancing. Nginx and Varnish for the hot cache.
Redis for API rate limiting.
Artillery for load testing.
Self-hosted Ghost for the blog.
We use RabbitMQ for system-wide event bus and queueing.
MongoDB for storing application data.
We use the TICK stack for real-time request monitoring, alerting, and performance metrics visualization across different customers and processing regions.
Custom monitoring scripts to alert our tech team on different channels (calls or Telegram) depending on the criticality of the issue. This helps to reduce our turn-around time in fixing the issue.
We use Gitbook for hosting developer documentation and internal wikis. We started with Slate but moved to Gitbook for rich UI and better collaboration experience.
For hosting repositories, code reviews, and deployments, we use self-hosted Gitlab.
All our SDKs are hosted on Github. We use Github for issue tracking and review SDK related changes.
Papertrail is used for searching and setting up alerts on cron job logs.
We push product adoption and billing metrics in Zoho Analytics. It allows sales & customer success team members to make informed decisions.
Stripe is used for billing and recurring payments.
Slack for all internal communications.
Asana for project management.

We are hiring

We are working on a complete frontend optimization product and looking for talented engineers to join our team. If you are looking for the next challenge and are passionate about working on large scale engineering problems, we are hiring.