Awesome Backend Scaling

Motivation
Monitoring and Observability
Execution
- Vertical Scaling
- Horizontal Scaling
Internal Messaging
- Message Format
- Communication Protocol
Storage
External APIs
- Rate Limiting
- Endpoint Handler
Security
Robustness
- Reactionary Testing
- Preemptive Testing
Resources

Motivation

As an application grows in users, it can face problems that limit its capacity to grow with this demand. Bottlenecks in the system may start being observed that degrade the performance for users:

Slow requests to load your social media timeline due to high load on the database
Verification emails not being sent because the email-send worker is overloaded
Timeouts being hit for API requests leading to failed checkouts

These and many more can deter users away and affect how well you meet your business KPIs.

This guide aims to collate various practices and approaches, some used, to solve different types of bottlenecks. Some may be applicable or not - take what you will.

Monitoring and Observability

Knowing that there is a degradation in performance is the first key step to being able to identify solutions for it.

Systems often rely on different services to provide certain functionalities. You may have an authentication server to dedicated to managing authentication for users. You may have an API server for delivering data to your mobile app. Services like these are integral to making your application work.

If a service has crashed due to an uncaught software bug or a hardware failure, this can be detrimental.

Health Monitoring

To check if a service has crashed, intermittent calls to the service can be made. If the service is not responding, it is likely to be down.

Approaches	Description
Manual Heartbeats	`🚧 TODO 🚧`
Native Integration with AWS Cloudwatch	`🚧 TODO 🚧`
Native Cloud Monitoring with Google Cloud Platform	`🚧 TODO 🚧`
API Monitoring with Postman	`🚧 TODO 🚧`
Website Monitoring with Better Uptime	`🚧 TODO 🚧`
Kubernetes Cluster Monitoring	`🚧 TODO 🚧`

Resources Monitoring

A service may also not be down, but it may be experiencing a degradation in performance as it does not have enough resources to handle a certain intended workload. Monitoring resource usage can help you identify when this happens.

Approaches	Description
Cron Job	`🚧 TODO 🚧`
Flame Graphs	`🚧 TODO 🚧`
Infrastructure Monitoring with Datadog	`🚧 TODO 🚧`
Native Integration with AWS Cloudwatch	`🚧 TODO 🚧`
Native Cloud Monitoring with Google Cloud Platform	`🚧 TODO 🚧`
Kubernetes Metrics Server	`🚧 TODO 🚧`

Observability

Whilst monitoring tells you if a service is suffering an issue, observability aims to provide you with details on why the issue is occurring.

Approaches	Description
Logging to Console	`🚧 TODO 🚧`
Logging to an API endpoint	`🚧 TODO 🚧`
Logtail	`🚧 TODO 🚧`
Splunk	`🚧 TODO 🚧`
Datadog	`🚧 TODO 🚧`
New Relic	`🚧 TODO 🚧`
AppDynamics	`🚧 TODO 🚧`

Execution

Increasing the capacity of your application can be a key step to solving bottlenecks. This will generally optimise the performance of your application, but you should compare this to other more-dedicated solutions that may be more cost-effective.

Vertical Scaling

Vertical scaling focuses on allocating more resources to a single instance. Included also are considerations to optimise resource usage on an instance-granular level.

Approaches	Description
Caching	`🚧 TODO 🚧`
Changing Programming Language	`🚧 TODO 🚧`
Code Optimisation	`🚧 TODO 🚧`
Increase Server RAM	`🚧 TODO 🚧`
Change Server Processor Type	`🚧 TODO 🚧`
Increase Number of Cores	`🚧 TODO 🚧`

Horizontal Scaling

Horizontal scaling notices that there is a limit to how many resources you can dedicate to a single instance and therefore utilises the resources of other instances to meet the demand.

Approaches	Description
Moving to a Microservice Architecture	`🚧 TODO 🚧`
Ansible Provisioning	`🚧 TODO 🚧`
Cloud Provisioning	`🚧 TODO 🚧`
Infrastructure-as-a-Code with Terraform	`🚧 TODO 🚧`
Kubernetes Pod Scaling	`🚧 TODO 🚧`
Docker Swarm	`🚧 TODO 🚧`
Serverless Cloud Functions	`🚧 TODO 🚧`

Internal Messaging

When a system contains different services that are ran as separate processes, they may need to communicate with each other. This can be achieved by using a messaging system. Even if storage on individual services may be large, a system is bottlenecked by the bandwidth of data transfer.

Message Format

The format of a message can be important in efficiency based on the use case.

Approaches	Description
HTTP REST	`🚧 TODO 🚧`
Websockets	`🚧 TODO 🚧`
Data Streaming	`🚧 TODO 🚧`
gRPC	`🚧 TODO 🚧`
On-Trigger Cloud Functions	`🚧 TODO 🚧`

Communication Protocol

There are more specialised messaging systems that can be used to deliver messages based on need. These generally tend to be towards several services that may be dynamically scaled. Maintenance of updating the endpoint to call can be a bottleneck in developer resources too.

Approaches	Description
Inline API Calls	`🚧 TODO 🚧`
API Gateways	`🚧 TODO 🚧`
Bi-directional APIs with Pusher	`🚧 TODO 🚧`
RabbitMQ Messaging Queues	`🚧 TODO 🚧`
Serverless Job Scheduling with Quirrel	`🚧 TODO 🚧`
Message Brokers on Apache Kafka	`🚧 TODO 🚧`
Google Pub/Sub	`🚧 TODO 🚧`
Istio and Service Meshes	`🚧 TODO 🚧`

Storage

Data is fundamental to an application. Being able to store and later retrieve data instead of having to recompute calculations is key for processors to not need to re-calculate data. State management also comes under this.

Online Transaction Processing (OLTP)

OLTP is a class of storage that is designed to be used for transactional processing - your general everyday many-reads-and-many-writes workload needed for users. This needs to be handled consistently yet efficiently.

Approaches	Description
In-Memory	`🚧 TODO 🚧`
Redis Caching	`🚧 TODO 🚧`
PostgresDB	`🚧 TODO 🚧`
MongoDB	`🚧 TODO 🚧`
Cassandra	`🚧 TODO 🚧`
Search Engine Elasticsearch	`🚧 TODO 🚧`
PgBouncer for PostgresDB	`🚧 TODO 🚧`
PgPool	`🚧 TODO 🚧`
Database Sharding	`🚧 TODO 🚧`
Cloud Databases	`🚧 TODO 🚧`
Google Cloud SQL	`🚧 TODO 🚧`
Amazon DynamoDB	`🚧 TODO 🚧`
Mission-critical Transactional Consistency with Google Spanner	`🚧 TODO 🚧`
Large-Scale Low-Latency with Google Cloud Bigtable	`🚧 TODO 🚧`

Online Analytics Processing (OLAP)

OLAP is a class of storage that is designed to be used for producing business analytics - read queries on the database tend to make up the majority of your workload, normally across large amounts of data.

Approaches	Description
General Databases	`🚧 TODO 🚧`
Elasticsearch	`🚧 TODO 🚧`
Apache Hadoop	`🚧 TODO 🚧`
Data Warehouses	`🚧 TODO 🚧`
Data Lakes	`🚧 TODO 🚧`

Archival

Some data may be read very rarely, and is not needed for the day-to-day operations of an application. This is where archival comes in.

Approaches	Description
General Databases	`🚧 TODO 🚧`
Cold Storage	`🚧 TODO 🚧`
Arweave: Archiving on the Blockchain	`🚧 TODO 🚧`

Security

Security is a very important part of any application. It is important to have a secure architecture that is easy to maintain and easy to change. This includes being able to scale an authentication and authorisation solution for your application to meet user demands without compromising on security.

Authentication

Authentication is being able to identify a user for who they are.

Approaches	Description
HTTP Basic Authentication	`🚧 TODO 🚧`
HTTP Digest Authentication	`🚧 TODO 🚧`
Session Cookies	`🚧 TODO 🚧`
Self-contained Tokens with JWTs	`🚧 TODO 🚧`
API Key Authentication	`🚧 TODO 🚧`
Certificate-bound Access Tokens	`🚧 TODO 🚧`
Kubernetes Key Management with Hashicorp Vault	`🚧 TODO 🚧`
One-Key Provisioning	`🚧 TODO 🚧`
Key Distribution Servers	`🚧 TODO 🚧`

Authorisation

Authorisation is being able to determine whether a user is allowed to perform certain action.

Approaches	Description
Resource Owner Password Credentials (ROPC)	`🚧 TODO 🚧`
OAuth2	`🚧 TODO 🚧`
OpenID Connect	`🚧 TODO 🚧`
Lightweight Directory Access Protocol (LDAP)	`🚧 TODO 🚧`
Capability URIs and Macaroons	`🚧 TODO 🚧`

Rate-Limiting

Another part of security is rate-limiting. This is a way of limiting the number of requests made by a user to a particular resource. This is useful for preventing denial of service attacks.

Approaches	Description
In-Memory Store	`🚧 TODO 🚧`
Redis	`🚧 TODO 🚧`
Proxy Rate Limiter	`🚧 TODO 🚧`

Robustness

After implementing new changes, you may find that your application will behave differently, for better or for worst. Adding a scaffold for tests and running them will help you to quickly identify and fix any issues that may arise.

Reactionary Testing

Building a ever-growing list of tests is a good way to test that your application still behaves as expected after every change. Catching any unexpected and potentially nefarious errors ensures that these errors aren't deployed to production.

Approaches	Description
Unit Testing	`🚧 TODO 🚧`
Component Testing	`🚧 TODO 🚧`
Integration Testing	`🚧 TODO 🚧`
End-to-End Load Testing	`🚧 TODO 🚧`
Web Performance Testing	`🚧 TODO 🚧`

Preemptive Testing

For mission-critical software, crashes and bug fixes may be incredibly detrimental. Preemptive testing is a way to find bugs or issues first, with a general frame of expecting the worst to occur.

Approaches	Description
Stress Testing	`🚧 TODO 🚧`
Fuzzing	`🚧 TODO 🚧`
Symbolic Execution	`🚧 TODO 🚧`
Static Analysis	`🚧 TODO 🚧`
Formal Verification	`🚧 TODO 🚧`
Chaos Engineering for Microservices	`🚧 TODO 🚧`

Resources

These books and articles have been helpful in my development of this guide:

API Security in Action

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Backend Scaling

Motivation

Monitoring and Observability

Health Monitoring

Resources Monitoring

Observability

Execution

Vertical Scaling

Horizontal Scaling

Internal Messaging

Message Format

Communication Protocol

Storage

Online Transaction Processing (OLTP)

Online Analytics Processing (OLAP)

Archival

Security

Authentication

Authorisation

Rate-Limiting

Robustness

Reactionary Testing

Preemptive Testing

Resources

About

Releases

Packages

License

HilliamT/awesome-backend-scaling

Folders and files

Latest commit

History

Repository files navigation

Awesome Backend Scaling

Motivation

Monitoring and Observability

Health Monitoring

Resources Monitoring

Observability

Execution

Vertical Scaling

Horizontal Scaling

Internal Messaging

Message Format

Communication Protocol

Storage

Online Transaction Processing (OLTP)

Online Analytics Processing (OLAP)

Archival

Security

Authentication

Authorisation

Rate-Limiting

Robustness

Reactionary Testing

Preemptive Testing

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages