Skip to content

A collection of various approaches, techniques, and tools for scaling different parts of your backend

License

Notifications You must be signed in to change notification settings

HilliamT/awesome-backend-scaling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Awesome Backend Scaling

Motivation

As an application grows in users, it can face problems that limit its capacity to grow with this demand. Bottlenecks in the system may start being observed that degrade the performance for users:

  • Slow requests to load your social media timeline due to high load on the database
  • Verification emails not being sent because the email-send worker is overloaded
  • Timeouts being hit for API requests leading to failed checkouts

These and many more can deter users away and affect how well you meet your business KPIs.

This guide aims to collate various practices and approaches, some used, to solve different types of bottlenecks. Some may be applicable or not - take what you will.

Monitoring and Observability

Knowing that there is a degradation in performance is the first key step to being able to identify solutions for it.

Systems often rely on different services to provide certain functionalities. You may have an authentication server to dedicated to managing authentication for users. You may have an API server for delivering data to your mobile app. Services like these are integral to making your application work.

If a service has crashed due to an uncaught software bug or a hardware failure, this can be detrimental.

Health Monitoring

To check if a service has crashed, intermittent calls to the service can be made. If the service is not responding, it is likely to be down.

Approaches Description
Manual Heartbeats 🚧 TODO 🚧
Native Integration with AWS Cloudwatch 🚧 TODO 🚧
Native Cloud Monitoring with Google Cloud Platform 🚧 TODO 🚧
API Monitoring with Postman 🚧 TODO 🚧
Website Monitoring with Better Uptime 🚧 TODO 🚧
Kubernetes Cluster Monitoring 🚧 TODO 🚧

Resources Monitoring

A service may also not be down, but it may be experiencing a degradation in performance as it does not have enough resources to handle a certain intended workload. Monitoring resource usage can help you identify when this happens.

Approaches Description
Cron Job 🚧 TODO 🚧
Flame Graphs 🚧 TODO 🚧
Infrastructure Monitoring with Datadog 🚧 TODO 🚧
Native Integration with AWS Cloudwatch 🚧 TODO 🚧
Native Cloud Monitoring with Google Cloud Platform 🚧 TODO 🚧
Kubernetes Metrics Server 🚧 TODO 🚧

Observability

Whilst monitoring tells you if a service is suffering an issue, observability aims to provide you with details on why the issue is occurring.

Approaches Description
Logging to Console 🚧 TODO 🚧
Logging to an API endpoint 🚧 TODO 🚧
Logtail 🚧 TODO 🚧
Splunk 🚧 TODO 🚧
Datadog 🚧 TODO 🚧
New Relic 🚧 TODO 🚧
AppDynamics 🚧 TODO 🚧

Execution

Increasing the capacity of your application can be a key step to solving bottlenecks. This will generally optimise the performance of your application, but you should compare this to other more-dedicated solutions that may be more cost-effective.

Vertical Scaling

Vertical scaling focuses on allocating more resources to a single instance. Included also are considerations to optimise resource usage on an instance-granular level.

Approaches Description
Caching 🚧 TODO 🚧
Changing Programming Language 🚧 TODO 🚧
Code Optimisation 🚧 TODO 🚧
Increase Server RAM 🚧 TODO 🚧
Change Server Processor Type 🚧 TODO 🚧
Increase Number of Cores 🚧 TODO 🚧

Horizontal Scaling

Horizontal scaling notices that there is a limit to how many resources you can dedicate to a single instance and therefore utilises the resources of other instances to meet the demand.

Approaches Description
Moving to a Microservice Architecture 🚧 TODO 🚧
Ansible Provisioning 🚧 TODO 🚧
Cloud Provisioning 🚧 TODO 🚧
Infrastructure-as-a-Code with Terraform 🚧 TODO 🚧
Kubernetes Pod Scaling 🚧 TODO 🚧
Docker Swarm 🚧 TODO 🚧
Serverless Cloud Functions 🚧 TODO 🚧

Internal Messaging

When a system contains different services that are ran as separate processes, they may need to communicate with each other. This can be achieved by using a messaging system. Even if storage on individual services may be large, a system is bottlenecked by the bandwidth of data transfer.

Message Format

The format of a message can be important in efficiency based on the use case.

Approaches Description
HTTP REST 🚧 TODO 🚧
Websockets 🚧 TODO 🚧
Data Streaming 🚧 TODO 🚧
gRPC 🚧 TODO 🚧
On-Trigger Cloud Functions 🚧 TODO 🚧

Communication Protocol

There are more specialised messaging systems that can be used to deliver messages based on need. These generally tend to be towards several services that may be dynamically scaled. Maintenance of updating the endpoint to call can be a bottleneck in developer resources too.

Approaches Description
Inline API Calls 🚧 TODO 🚧
API Gateways 🚧 TODO 🚧
Bi-directional APIs with Pusher 🚧 TODO 🚧
RabbitMQ Messaging Queues 🚧 TODO 🚧
Serverless Job Scheduling with Quirrel 🚧 TODO 🚧
Message Brokers on Apache Kafka 🚧 TODO 🚧
Google Pub/Sub 🚧 TODO 🚧
Istio and Service Meshes 🚧 TODO 🚧

Storage

Data is fundamental to an application. Being able to store and later retrieve data instead of having to recompute calculations is key for processors to not need to re-calculate data. State management also comes under this.

Online Transaction Processing (OLTP)

OLTP is a class of storage that is designed to be used for transactional processing - your general everyday many-reads-and-many-writes workload needed for users. This needs to be handled consistently yet efficiently.

Approaches Description
In-Memory 🚧 TODO 🚧
Redis Caching 🚧 TODO 🚧
PostgresDB 🚧 TODO 🚧
MongoDB 🚧 TODO 🚧
Cassandra 🚧 TODO 🚧
Search Engine Elasticsearch 🚧 TODO 🚧
PgBouncer for PostgresDB 🚧 TODO 🚧
PgPool 🚧 TODO 🚧
Database Sharding 🚧 TODO 🚧
Cloud Databases 🚧 TODO 🚧
Google Cloud SQL 🚧 TODO 🚧
Amazon DynamoDB 🚧 TODO 🚧
Mission-critical Transactional Consistency with Google Spanner 🚧 TODO 🚧
Large-Scale Low-Latency with Google Cloud Bigtable 🚧 TODO 🚧

Online Analytics Processing (OLAP)

OLAP is a class of storage that is designed to be used for producing business analytics - read queries on the database tend to make up the majority of your workload, normally across large amounts of data.

Approaches Description
General Databases 🚧 TODO 🚧
Elasticsearch 🚧 TODO 🚧
Apache Hadoop 🚧 TODO 🚧
Data Warehouses 🚧 TODO 🚧
Data Lakes 🚧 TODO 🚧

Archival

Some data may be read very rarely, and is not needed for the day-to-day operations of an application. This is where archival comes in.

Approaches Description
General Databases 🚧 TODO 🚧
Cold Storage 🚧 TODO 🚧
Arweave: Archiving on the Blockchain 🚧 TODO 🚧

Security

Security is a very important part of any application. It is important to have a secure architecture that is easy to maintain and easy to change. This includes being able to scale an authentication and authorisation solution for your application to meet user demands without compromising on security.

Authentication

Authentication is being able to identify a user for who they are.

Approaches Description
HTTP Basic Authentication 🚧 TODO 🚧
HTTP Digest Authentication 🚧 TODO 🚧
Session Cookies 🚧 TODO 🚧
Self-contained Tokens with JWTs 🚧 TODO 🚧
API Key Authentication 🚧 TODO 🚧
Certificate-bound Access Tokens 🚧 TODO 🚧
Kubernetes Key Management with Hashicorp Vault 🚧 TODO 🚧
One-Key Provisioning 🚧 TODO 🚧
Key Distribution Servers 🚧 TODO 🚧

Authorisation

Authorisation is being able to determine whether a user is allowed to perform certain action.

Approaches Description
Resource Owner Password Credentials (ROPC) 🚧 TODO 🚧
OAuth2 🚧 TODO 🚧
OpenID Connect 🚧 TODO 🚧
Lightweight Directory Access Protocol (LDAP) 🚧 TODO 🚧
Capability URIs and Macaroons 🚧 TODO 🚧

Rate-Limiting

Another part of security is rate-limiting. This is a way of limiting the number of requests made by a user to a particular resource. This is useful for preventing denial of service attacks.

Approaches Description
In-Memory Store 🚧 TODO 🚧
Redis 🚧 TODO 🚧
Proxy Rate Limiter 🚧 TODO 🚧

Robustness

After implementing new changes, you may find that your application will behave differently, for better or for worst. Adding a scaffold for tests and running them will help you to quickly identify and fix any issues that may arise.

Reactionary Testing

Building a ever-growing list of tests is a good way to test that your application still behaves as expected after every change. Catching any unexpected and potentially nefarious errors ensures that these errors aren't deployed to production.

Approaches Description
Unit Testing 🚧 TODO 🚧
Component Testing 🚧 TODO 🚧
Integration Testing 🚧 TODO 🚧
End-to-End Load Testing 🚧 TODO 🚧
Web Performance Testing 🚧 TODO 🚧

Preemptive Testing

For mission-critical software, crashes and bug fixes may be incredibly detrimental. Preemptive testing is a way to find bugs or issues first, with a general frame of expecting the worst to occur.

Approaches Description
Stress Testing 🚧 TODO 🚧
Fuzzing 🚧 TODO 🚧
Symbolic Execution 🚧 TODO 🚧
Static Analysis 🚧 TODO 🚧
Formal Verification 🚧 TODO 🚧
Chaos Engineering for Microservices 🚧 TODO 🚧

Resources

These books and articles have been helpful in my development of this guide:

About

A collection of various approaches, techniques, and tools for scaling different parts of your backend

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published