Does your application fit your users' needs?
The problem
A common problem that companies are facing today, especially the big ones, is the growing number of concurrent users using their applications. If their systems are not prepared to handle all the requests they are receiving, it can be very problematic for the company as a whole.
Imagine a company whose business is an e-commerce platform, black-friday is coming and the engineering team hasn’t taken any actions to ensure the platform's availability and resilience on that day. When the day arrives, the number of simultaneous access has grown in a way that even the most optimistic executive wouldn’t imagine. What a great news! Not so great though… the platform becomes unavailable, and users are unable to even access it. Buying things? Not a change!
How harmful for the company this situation can be? There are a lot of e-commerce platforms waiting for black-friday every year. Sometimes they already forecasted the GMV for this day, and are only expecting Monday morning to collect the results of the whole weekend to celebrate.
Now, on the other hand, how this situation could have been if the business had an agreement with the engineering team? How the engineering team could help in this situation? How the whole platform could be prepared to face this highly growing peak?
About scaling
During peak loads, as the number of simultaneous access is growing on your system, the application needs to keep working as it was before, in a way that the users don’t end up being harmed by it.
Scaling is a process in which your software adapts to your users’ needs, growing and shrinking whenever it’s necessary. This process is responsible for responding to the different types of loads, ensuring its availability and resilience to keep working as expected by your users. It must be as transparent as possible, ensuring that not a single user will be impacted by it.
There are a lot of problems that can arise when the number of users accessing your application is high, and I’ll explain them below.
In simple terms, your system can operate in two different loads: normal and high, and it must be prepared to face both of them. Figure 1 demonstrates the differences between these loads.
In simple terms:
Normal load: the number of concurrent users using your system is under normal usage, it’s receiving a known number of requests and as a consequence, using nothing but an expected number of resources from your servers.
High load: the number of concurrent users is above normal, it’s receiving a lot of simultaneous requests, and as a consequence, using a lot of resources from the servers.
Given that context, in which case your application can be more harmful to the company if it isn’t capable of handling the situation as expected? In general, if you guess the last one, you’re right! However, the first one can be quite harmful if you aren’t prepared.
We have already explained above why not being prepared to face the high load situation may be very stressful. But why the system must be prepared to face the normal load as well? Imagine a scenario where your system has a few concurrent users, receiving nothing but a few requests, but your servers are prepared to handle a lot of requests, and you have provisioned a great amount of CPU and RAM to it. This CPU and RAM will cost you, especially if you are running your infrastructure on the cloud.
Provisioning many resources (CPU and RAM) when your application is under low-normal usage will end up being too expensive for the company.
With that being said, how to be prepared for both scenarios? Not underusing your resources and being capable of staying available?
There are two types of scaling to help you with that, and you will decide which to use depending on your application needs. They are vertical scaling and horizontal scaling.
Vertical scaling
Imagine a scenario where your application is running on a single server machine, with 4 GB of RAM and 2 CPU Cores, and it is under normal usage. But suddenly the number of users starts increasing, you are under a high load now and your application starts becoming slow and unavailable.
In that case, you have decided to upgrade your server resources. You add 4 more GB of RAM, and 2 more CPU Cores, now your machine has a total of 8 GB of RAM and 4 CPU Cores. With this upgrade your application becomes more stable, it isn’t presenting slowness and it’s more available now. The users are pretty happy.
This process of upgrading your machine (or choosing a more robust one) is called Vertical Scaling, or Scale-up. Figure 2 illustrates this process.
This approach can help in some cases, but it is definitely not suited for all situations.
The most common usage of Vertical Scaling is the situation where your load is more predictable. Where almost all of the time your load operates on a plateau and it does not change (increase or decrease) for a long period. And this is true because the process of scaling up your server isn’t quick. Your users will face a period of downtime while you are upgrading your resources.
In an on-premise situation, there isn’t a way to upgrade the server resources without shutting it down before. Or even in a cloud environment, if you change your VM, the users won’t be able to access it until the new machine is up and running with your application.
Pros
Implementation: As explained above, when scaling up your server on an on-premise environment, the process is upgrading the machine resources, using more CPU and RAM should do the trick. When handling this situation in a cloud environment, you can just choose a more robust VM.
Management: Running it on the cloud or on-premise, you’ll just need to manage a single machine. It’s easier than managing multiple machines (this will be explained below in more detail).
Costs: Having a single machine running is cheaper than having 5 or 10 for example.
Cons
Single point of failure: As you have just a single server running your application, what will happen if your server goes down? Besides planning the scaling, you must be prepared to face downtimes either.
Hardware limitations: There’s a hardware limit that each machine can run, you cannot add resources infinitely to your server. This limit can be a bottleneck if the application usage keeps growing.
Downtimes: When upgrading your server, the application will be off until the scaling process is finished. During this time, your users won’t be able to use your software. You must ensure that scaling your application won’t affect your current users.
Costs: In some cases, you’ll just need 10% more resources than your server currently has, however, depending on which cloud provider you are running your app, adding this 10% additional hardware can end up costing 50% more at the end of the month.
Horizontal Scaling
In contrast to the Vertical Scaling approach where more resources are added to a single machine, in the Horizontal Scaling (or Scale-out) approach, more machines are added to handle the application’s traffic.
Getting back to our black-friday example, the moment that the concurrent users start growing, copies of the same server are added and the traffic that was previously redirected to a single instance now is routed to each one of the new server instances. These copies contain the same specs as the original one, and now your application is prepared to handle an increased load.
When using this approach, normally the servers will have low hardware resources, instead of 8 GB of RAM, it’ll probably be less than 512 MB, for example. This happens because rather than running your application on a VM, you’ll use containers alongside a container management system like Kubernetes or AWS ECS. Containers are way smaller than VM instances, and they are built to scale horizontally, e.g. running a lot of instances.
Figure 3 illustrates the process of scaling out your system where the number of concurrent users is growing.
This process is all abstracted by the management system you choose. For Kubernetes for example, the HPA will take care of scaling your instances.
But adding more instances isn’t enough to fully scale your system horizontally, it also needs to know how to properly balance its load between all these instances.
When you have a single instance running it’s easier because all the traffic is routed to it, however, how to route the traffic when you have multiple instances running at the same time? R: Load Balancer.
Load balancing is the process of routing the requests to different application instances, this is done by a Load Balancer that sits in front of your instances and receives all the requests sent to your application. When receiving these requests, the Load Balancer knows to which instance redirects the requests.
There are a lot of different load balancing algorithms that your Load Balance can use, these algorithms not just route the requests, but they also take care to not overload a single container with too many requests, they try to balance the load as evenly as possible.
In Figure 4 below there’s an illustration of a Load Balancer sitting in front of the containers to receive and redirect its load.
The Horizontal Scaling approach is more suitable for cases where your load isn’t predictable, that is, it can increase and decrease many times during a single day.
This type of scaling allows more flexibility because it knows how to scale and shrink when necessary. You can configure different scaling thresholds, like RAM and CPU usage, HTTP metrics, or even asynchronous events like AWS SQS messages.
For example, you have set up your scaling based on HTTP metrics, once your app starts receiving a lot of concurrent HTTP requests, your management systems know that it needs to provision other instances to handle all the traffic. When the load decreases, the provisioned instances are killed to avoid unnecessary additional costs.
Pros
Redundancy: As you have multiple instances of your server running, if one of them fails, or gets killed for whatever reason, you’ll always have another instance to handle the traffic.
No downtime: Even if you have just a single instance running, if it fails, your management system will know it and will provision another instance for you. This new instance will start receiving the requests from the one that just died.
Fewer hardware limitations: The hardware limitations are smaller, but it does not mean that it doesn’t exist. When scaling out your containers, you can scale it as much as you need once the required resources are available within your cluster. The limitations are in the cluster, as bigger your cluster is, the fewer limitations you’ll face.
Performance: You can increase your system's parallel processing power by adding more instances to it.
Cons
Management complexity: Rather than managing a single instance, now you have N instances to manage. Usually, your container management system will do it for you, but it does not mean you won’t need to know how to use it. Depending on your scenario, company size, and some other factors, you’ll probably need someone specialized in those systems.
Costs forecasting: It’s harder to forecast the costs without knowing how much hardware you’ll use within a month. Is not impossible, but it’s trickier than when managing a single server.
Conclusion
As the software applications are evolving, the users are becoming more demanding about the products they are using. If your application does not fit your user’s needs, it’s more likely for them to choose a different vendor.
When projecting your software, the non-functional requirements are as important as the functional ones. Users start using your app for what it provides for them but stop using it for what it doesn’t.
Choose a Vertical Scaling approach when your app usage is more likely to remain the same almost all the time and a Horizontal Scaling when it is not predictable and is most likely to vary often.
Now that you know how to approach scaling, does your application fit your users’ needs?
Resources