This project is an implementation of a dynamic, load-aware rate limiter for FastAPI. It's designed to be used in a distributed environment and provides different rate limits based on user tiers and the overall system health.
- FastAPI Middleware: Easily integrable into any FastAPI application.
- Tier-Based Rate Limiting: Supports different rate limits for different user tiers (e.g., Free, Pro, Enterprise).
- Dynamic and Load-Aware: Adjusts rate limits based on the system's health (NORMAL or DEGRADED) and reloads configuration changes without restarting.
- Distributed Consistency: Uses Redis as a central store for rate limiting counters, ensuring consistency across multiple server instances.
- Configurable: Tiers and their rate limits are defined in a JSON configuration file and can be reloaded dynamically.
- High Performance: Utilizes Redis for fast read/write operations and an efficient hybrid Pub/Sub and polling mechanism for system health checks.
- Correct HTTP Responses: Returns
429 Too Many Requestswhen the rate limit is exceeded and includesX-RateLimit-*headers in the responses. - Tier-Based Fail-over: Implements intelligent fail-over logic when Redis is unavailable: Free tier requests fail closed (503 Service Unavailable), while Pro and Enterprise tier requests fail open (allowed).
- Admin UI: A simple, secure web interface to view and manage the system's health status.
- Structured Logging: Logs key events, including rate-limiting decisions and system health changes, to standard output for easy monitoring.
.
├── app/
│ ├── __init__.py
│ ├── config.py # Pydantic models for configuration and auto-reloading logic
│ ├── health.py # System health store with Redis backend and hybrid Pub/Sub logic
│ ├── main.py # FastAPI application entry point
│ ├── middleware.py # Rate limiting middleware
│ ├── models.py # API key to tier mapping
│ └── rate_limiter.py # Core rate limiting logic using a fixed window algorithm
├── config/
│ └── rate_limit_config.json # Configuration file for tiers and rate limits
├── tests/
│ ├── ... # Unit and integration tests
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── pyproject.toml
└── README.md
The rate limiter is implemented as a RateLimitMiddleware in app/middleware.py. This middleware intercepts incoming requests and performs the following steps:
- Bypass Excluded Paths: Skips rate limiting for predefined paths like
/docs,/openapi.json, and/admin. - Extract API Key: Retrieves the API key from the
X-API-Keyheader. - Map API Key to Tier: Determines the user's tier based on their API key.
- Smart Configuration Reloading: The application uses an efficient auto-reloading mechanism. It checks the
rate_limit_config.jsonfile's modification time and only reloads the configuration if the file has changed. This ensures that configuration updates are applied almost instantly without requiring an application restart and without the performance overhead of reading the file on every request. - Check System Health: Fetches the current system health (
NORMALorDEGRADED) from theHealthStore. - Apply Rate Limiting: Calls the
RedisRateLimiterto check if the request is within the allowed limit for the user's tier and the current system health, passing the dynamically loaded configuration. - Return Response:
- If the request is allowed, it's passed to the application, and the
X-RateLimit-*headers are added to the response. - If the request is denied, it immediately returns an
HTTP 429 Too Many Requestsresponse.
- If the request is allowed, it's passed to the application, and the
The core of the dynamic limiting logic is in the get_active_limit function in app/rate_limiter.py and the HealthStore in app/health.py.
-
System Health Status: The system's health is stored in Redis and can be either
NORMALorDEGRADED. This status can be updated via the Admin UI (accessible at/admin) or directly via a protected admin endpoint (POST /system/health). -
Efficient Health Checks (Hybrid Pub/Sub and Polling): To avoid querying Redis on every request, the
HealthStoreuses a highly efficient hybrid strategy:- Real-time Push (Pub/Sub): Each application instance subscribes to a Redis Pub/Sub channel. When an admin changes the system health, a message is published to this channel. All instances receive this message instantly and update their local, in-memory health status. This makes the system extremely responsive to changes.
- Resilient Pull (Polling): As a fail-safe, the in-memory status has a long TTL (e.g., 60 seconds). If the cache expires (which would only happen if the application misses a Pub/Sub message, for instance, due to a temporary disconnect), it will poll Redis for the current status. This periodic polling guarantees eventual consistency and makes the system resilient.
This hybrid approach provides the best of both worlds: the near-zero latency of a push-based system and the reliability of a pull-based one, all while keeping the load on Redis to an absolute minimum.
-
Dynamic Limit Adjustment: The
get_active_limitfunction selects the appropriate rate limit from the configuration file based on the current system health:- In the
NORMALstate, it uses theburst_limitto maximize resource utilization. - In the
DEGRADEDstate, it enforces stricter limits to shed load and protect the system.
- In the
The rate limits for each tier are defined in config/rate_limit_config.json. The configuration is loaded and validated using Pydantic models in app/config.py. The load_config function also implements an auto-reloading mechanism that reloads the configuration file whenever it's modified, without requiring an application restart.
window_seconds: The duration of the rate-limiting window in seconds.tiers: A dictionary where each key is a tier name (e.g., "free", "pro").normal: Configuration for theNORMALsystem health state.limit: The base request limit for the window.burst_limit: The burstable request limit for the window.
degraded: Configuration for theDEGRADEDsystem health state.limit: The strict request limit for the window.
- Docker
- Docker Compose
-
Clone the repository:
git clone https://github.com/rn23thakur/fastapi-rate-limiter.git cd fastapi-rate-limiter -
Start the application:
docker-compose up --build -d
This command builds the Docker image (including the
--reloadflag foruvicornin theDockerfile) and starts the FastAPI application and a Redis container. The application will be available athttp://localhost:8000.
The application is configured to log to standard output. To view the logs in real-time, you can use the following command:
docker-compose logs -fYou will see logs for rate-limiting decisions, system health changes, and application startup/shutdown.
To access the admin UI, navigate to the following URL in your browser:
http://localhost:8000/admin?secret=letmein
The secret is a simple secret defined in app/main.py (defaulting to "letmein"). In a production environment, this should be managed securely via the ADMIN_SECRET environment variable.
You can test the rate limiting functionality using curl or a tool like Postman.
1. Test a Free Tier User (Normal State - Burst Limit 20 RPM)
# Make requests with a free tier API key (e.g., 20 requests within 60 seconds should pass)
for i in $(seq 1 21)
do
curl -i -H "X-API-Key: free_123" http://localhost:8000/test
sleep 0.1
doneExpected: The first 20 requests should return 200 OK with X-RateLimit-Remaining decreasing. The 21st request should return 429 Too Many Requests.
2. Change System Health to DEGRADED
curl -i -X POST -H "X-Admin-Secret: letmein" -H "Content-Type: application/json" -d '{"status": "DEGRADED"}' http://localhost:8000/system/healthExpected: 200 OK with {"status": "DEGRADED"}.
3. Test a Free Tier User (DEGRADED State - Limit 2 RPM)
# Make requests with a free tier API key (e.g., 2 requests within 60 seconds should pass)
for i in $(seq 1 3)
do
curl -i -H "X-API-Key: free_123" http://localhost:8000/test
sleep 0.1
doneExpected: The first 2 requests should return 200 OK with X-RateLimit-Remaining decreasing. The 3rd request should return 429 Too Many Requests.
4. Test Pro/Enterprise Tier Users (DEGRADED State - Limits 100/1000 RPM)
# Pro Tier (limit 100 RPM in degraded state)
for i in $(seq 1 5)
do
curl -i -H "X-API-Key: pro_123" http://localhost:8000/test
sleep 0.1
done
# Enterprise Tier (limit 1000 RPM in degraded state)
for i in $(seq 1 5)
do
curl -i -H "X-API-Key: ent_123" http://localhost:8000/test
sleep 0.1
doneExpected: All requests for Pro and Enterprise tiers should return 200 OK.
5. Simulate Redis Failure (and test tier-based fail-over)
First, stop the Redis container:
docker stop fastapi-rate-limiter-redis-1 # Replace with your actual Redis container name if differentThen, test each tier:
# Free Tier (should fail closed - 503 Service Unavailable)
curl -i -H "X-API-Key: free_123" http://localhost:8000/test
# Pro Tier (should fail open - 200 OK)
curl -i -H "X-API-Key: pro_123" http://localhost:8000/test
# Enterprise Tier (should fail open - 200 OK)
curl -i -H "X-API-Key: ent_123" http://localhost:8000/testExpected: Free tier gets 503 Service Unavailable. Pro and Enterprise tiers get 200 OK.
6. Restart Redis and set health back to NORMAL
docker start fastapi-rate-limiter-redis-1 # Replace with your actual Redis container name if different
curl -i -X POST -H "X-Admin-Secret: letmein" -H "Content-Type: application/json" -d '{"status": "NORMAL"}' http://localhost:8000/system/health7. Test Dynamic Configuration Reloading
The application is designed to automatically reload the rate_limit_config.json file whenever it's modified, so you can change the rate limits without restarting the server.
-
Make an initial request to see the current limit for the "pro" tier.
curl -i -H "X-API-Key: pro_123" http://localhost:8000/testExpected: The
X-RateLimit-Limitheader should be150(theburst_limitfrom the original config). -
Edit
config/rate_limit_config.jsonand change theburst_limitfor the "pro" tier from150to5.// ... "pro": { "normal": { "limit": 100, "burst_limit": 5 }, // Changed from 150 "degraded": { "limit": 100 } }, // ...
-
Make another request with the same "pro" API key.
curl -i -H "X-API-Key: pro_123" http://localhost:8000/testExpected: The
X-RateLimit-Limitheader should now be5, reflecting the change you just made.
This project is designed for a distributed environment. You can simulate and test this locally using Docker Compose to run multiple instances of the application. This is the best way to verify that the Redis Pub/Sub mechanism is correctly synchronizing the health status across all instances.
The docker-compose.yml is configured to provide a stable instance at localhost:8000 and allow you to scale up additional instances on dynamic ports.
fastapi-app: A single instance always available athttp://localhost:8000.fastapi-app-scaled: A service you can scale. By default, one instance will run on a random port.
1. Scale Up Additional Instances
Use the --scale flag with docker-compose to start multiple instances of the fastapi-app-scaled service. For example, to run 3 scaled instances (for a total of 4 application instances):
docker-compose up --build -d --scale fastapi-app-scaled=32. Find the Instance Ports
To find the specific port for each running replica, use the docker-compose ps command:
docker-compose psYou will see output showing the fastapi-app instance on port 8000 and the fastapi-app-scaled instances on random host ports.
3. View Logs from All Instances
To see the synchronized logs from all running containers, use the -f (follow) flag:
docker-compose logs -f fastapi-app fastapi-app-scaledYou will see interleaved logs from all application instances, each identified by its container name.
4. Trigger a Health Status Change
In a separate terminal, send a request to the admin endpoint of any single instance to change the system health. You can use the stable localhost:8000 endpoint or any of the dynamically assigned ports.
# Using the stable port
curl -i -X POST -H "X-Admin-Secret: letmein" -H "Content-Type: application/json" -d '{"status": "DEGRADED"}' http://localhost:8000/system/health5. Verify the Pub/Sub Synchronization
Now, observe the logs in your first terminal. You should see log messages from all four instances indicating that they received the health status update via Redis Pub/Sub.
The request to /system/health will only go to one of the instances, but that instance publishes the change to Redis. The logs confirm that all other instances, which are subscribed to the channel, receive the message and update their internal state in real-time. This verifies that the distributed synchronization is working correctly.
6. Scale Down
Once you're done testing, you can bring the services down:
docker-compose downTo run the tests, you can use pytest:
# Make sure you have the dev dependencies installed
pip install -e ".[dev]"
# Run the tests
pytestThe current implementation uses a Fixed Window Counter algorithm.
- Pros:
- Simple to implement and understand.
- Memory efficient as it only stores one counter per user per window.
- Fast, as it only requires a single
INCRoperation in Redis, now made atomic using a Redis pipeline.
- Cons:
- "Thundering Herd" Problem: It can allow a burst of traffic at the edge of a window. For example, if the limit is 10 requests per minute, a user can make 10 requests at the end of a minute and another 10 requests at the beginning of the next minute, resulting in 20 requests in a short period.
For this assignment, the Fixed Window Counter is a reasonable choice. However, for a more robust solution, a Sliding Window Log or Sliding Window Counter algorithm could be used to avoid the "thundering herd" problem.
- Redis: Redis was chosen as the central store for its high performance, low latency, and atomic operations (
INCR). TheINCRandEXPIREoperations are now executed atomically using a Redis pipeline to prevent race conditions. - Health Check Caching: The local caching in the
HealthStoreis a crucial optimization to avoid hitting Redis on every request, which would add significant latency. - Tier-Based Fail-over: The middleware now implements a tier-based fail-over strategy. When Redis is unavailable:
- Free Tier: Requests fail closed, returning
503 Service Unavailable. This prioritizes system stability by shedding non-paying traffic. - Pro & Enterprise Tiers: Requests fail open, allowing them to proceed. This prioritizes availability for paying customers, honoring SLAs even during Redis outages.
- Free Tier: Requests fail closed, returning
- Explicit Health Checks for Robust Fail-over: To reliably trigger the tier-based fail-over, the system first sends a
PINGcommand to Redis on every request. This acts as a fast, explicit health check. While this adds a minimal latency overhead (one extra network round-trip), it provides a clean and immediate way to detect if Redis is unavailable, allowing the system to switch to its fail-over logic without waiting for a command timeout. This trade-off prioritizes predictable behavior and resilience over shaving microseconds off the "happy path" response time.
The current solution is designed to be scalable.
- Horizontal Scaling: The FastAPI application can be scaled horizontally by running multiple instances behind a load balancer. Since the rate limiting counters are stored in a central Redis instance, the rate limiting will be consistent across all instances.
- Redis Scalability: Redis can be scaled using clustering or a primary/replica setup to handle a large number of concurrent requests.