Skip to content

rn23thakur/fastapi-rate-limiter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastAPI Dynamic Rate Limiter

This project is an implementation of a dynamic, load-aware rate limiter for FastAPI. It's designed to be used in a distributed environment and provides different rate limits based on user tiers and the overall system health.

Features

  • FastAPI Middleware: Easily integrable into any FastAPI application.
  • Tier-Based Rate Limiting: Supports different rate limits for different user tiers (e.g., Free, Pro, Enterprise).
  • Dynamic and Load-Aware: Adjusts rate limits based on the system's health (NORMAL or DEGRADED) and reloads configuration changes without restarting.
  • Distributed Consistency: Uses Redis as a central store for rate limiting counters, ensuring consistency across multiple server instances.
  • Configurable: Tiers and their rate limits are defined in a JSON configuration file and can be reloaded dynamically.
  • High Performance: Utilizes Redis for fast read/write operations and an efficient hybrid Pub/Sub and polling mechanism for system health checks.
  • Correct HTTP Responses: Returns 429 Too Many Requests when the rate limit is exceeded and includes X-RateLimit-* headers in the responses.
  • Tier-Based Fail-over: Implements intelligent fail-over logic when Redis is unavailable: Free tier requests fail closed (503 Service Unavailable), while Pro and Enterprise tier requests fail open (allowed).
  • Admin UI: A simple, secure web interface to view and manage the system's health status.
  • Structured Logging: Logs key events, including rate-limiting decisions and system health changes, to standard output for easy monitoring.

Project Structure

.
├── app/
│   ├── __init__.py
│   ├── config.py         # Pydantic models for configuration and auto-reloading logic
│   ├── health.py         # System health store with Redis backend and hybrid Pub/Sub logic
│   ├── main.py           # FastAPI application entry point
│   ├── middleware.py     # Rate limiting middleware
│   ├── models.py         # API key to tier mapping
│   └── rate_limiter.py   # Core rate limiting logic using a fixed window algorithm
├── config/
│   └── rate_limit_config.json # Configuration file for tiers and rate limits
├── tests/
│   ├── ...               # Unit and integration tests
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── pyproject.toml
└── README.md

How it Works

1. FastAPI Middleware

The rate limiter is implemented as a RateLimitMiddleware in app/middleware.py. This middleware intercepts incoming requests and performs the following steps:

  1. Bypass Excluded Paths: Skips rate limiting for predefined paths like /docs, /openapi.json, and /admin.
  2. Extract API Key: Retrieves the API key from the X-API-Key header.
  3. Map API Key to Tier: Determines the user's tier based on their API key.
  4. Smart Configuration Reloading: The application uses an efficient auto-reloading mechanism. It checks the rate_limit_config.json file's modification time and only reloads the configuration if the file has changed. This ensures that configuration updates are applied almost instantly without requiring an application restart and without the performance overhead of reading the file on every request.
  5. Check System Health: Fetches the current system health (NORMAL or DEGRADED) from the HealthStore.
  6. Apply Rate Limiting: Calls the RedisRateLimiter to check if the request is within the allowed limit for the user's tier and the current system health, passing the dynamically loaded configuration.
  7. Return Response:
    • If the request is allowed, it's passed to the application, and the X-RateLimit-* headers are added to the response.
    • If the request is denied, it immediately returns an HTTP 429 Too Many Requests response.

2. Dynamic, Load-Aware Limiting & Efficient Health Checks

The core of the dynamic limiting logic is in the get_active_limit function in app/rate_limiter.py and the HealthStore in app/health.py.

  • System Health Status: The system's health is stored in Redis and can be either NORMAL or DEGRADED. This status can be updated via the Admin UI (accessible at /admin) or directly via a protected admin endpoint (POST /system/health).

  • Efficient Health Checks (Hybrid Pub/Sub and Polling): To avoid querying Redis on every request, the HealthStore uses a highly efficient hybrid strategy:

    1. Real-time Push (Pub/Sub): Each application instance subscribes to a Redis Pub/Sub channel. When an admin changes the system health, a message is published to this channel. All instances receive this message instantly and update their local, in-memory health status. This makes the system extremely responsive to changes.
    2. Resilient Pull (Polling): As a fail-safe, the in-memory status has a long TTL (e.g., 60 seconds). If the cache expires (which would only happen if the application misses a Pub/Sub message, for instance, due to a temporary disconnect), it will poll Redis for the current status. This periodic polling guarantees eventual consistency and makes the system resilient.

    This hybrid approach provides the best of both worlds: the near-zero latency of a push-based system and the reliability of a pull-based one, all while keeping the load on Redis to an absolute minimum.

  • Dynamic Limit Adjustment: The get_active_limit function selects the appropriate rate limit from the configuration file based on the current system health:

    • In the NORMAL state, it uses the burst_limit to maximize resource utilization.
    • In the DEGRADED state, it enforces stricter limits to shed load and protect the system.

3. Configuration

The rate limits for each tier are defined in config/rate_limit_config.json. The configuration is loaded and validated using Pydantic models in app/config.py. The load_config function also implements an auto-reloading mechanism that reloads the configuration file whenever it's modified, without requiring an application restart.

Configuration Schema

  • window_seconds: The duration of the rate-limiting window in seconds.
  • tiers: A dictionary where each key is a tier name (e.g., "free", "pro").
    • normal: Configuration for the NORMAL system health state.
      • limit: The base request limit for the window.
      • burst_limit: The burstable request limit for the window.
    • degraded: Configuration for the DEGRADED system health state.
      • limit: The strict request limit for the window.

Getting Started

Prerequisites

  • Docker
  • Docker Compose

Running the Application

  1. Clone the repository:

    git clone https://github.com/rn23thakur/fastapi-rate-limiter.git
    cd fastapi-rate-limiter
  2. Start the application:

    docker-compose up --build -d

    This command builds the Docker image (including the --reload flag for uvicorn in the Dockerfile) and starts the FastAPI application and a Redis container. The application will be available at http://localhost:8000.

Viewing Logs

The application is configured to log to standard output. To view the logs in real-time, you can use the following command:

docker-compose logs -f

You will see logs for rate-limiting decisions, system health changes, and application startup/shutdown.

Admin UI

To access the admin UI, navigate to the following URL in your browser:

http://localhost:8000/admin?secret=letmein

The secret is a simple secret defined in app/main.py (defaulting to "letmein"). In a production environment, this should be managed securely via the ADMIN_SECRET environment variable.

Testing the Rate Limiter

You can test the rate limiting functionality using curl or a tool like Postman.

Using curl

1. Test a Free Tier User (Normal State - Burst Limit 20 RPM)

# Make requests with a free tier API key (e.g., 20 requests within 60 seconds should pass)
for i in $(seq 1 21)
do
  curl -i -H "X-API-Key: free_123" http://localhost:8000/test
  sleep 0.1
done

Expected: The first 20 requests should return 200 OK with X-RateLimit-Remaining decreasing. The 21st request should return 429 Too Many Requests.

2. Change System Health to DEGRADED

curl -i -X POST -H "X-Admin-Secret: letmein" -H "Content-Type: application/json" -d '{"status": "DEGRADED"}' http://localhost:8000/system/health

Expected: 200 OK with {"status": "DEGRADED"}.

3. Test a Free Tier User (DEGRADED State - Limit 2 RPM)

# Make requests with a free tier API key (e.g., 2 requests within 60 seconds should pass)
for i in $(seq 1 3)
do
  curl -i -H "X-API-Key: free_123" http://localhost:8000/test
  sleep 0.1
done

Expected: The first 2 requests should return 200 OK with X-RateLimit-Remaining decreasing. The 3rd request should return 429 Too Many Requests.

4. Test Pro/Enterprise Tier Users (DEGRADED State - Limits 100/1000 RPM)

# Pro Tier (limit 100 RPM in degraded state)
for i in $(seq 1 5)
do
  curl -i -H "X-API-Key: pro_123" http://localhost:8000/test
  sleep 0.1
done

# Enterprise Tier (limit 1000 RPM in degraded state)
for i in $(seq 1 5)
do
  curl -i -H "X-API-Key: ent_123" http://localhost:8000/test
  sleep 0.1
done

Expected: All requests for Pro and Enterprise tiers should return 200 OK.

5. Simulate Redis Failure (and test tier-based fail-over)

First, stop the Redis container:

docker stop fastapi-rate-limiter-redis-1 # Replace with your actual Redis container name if different

Then, test each tier:

# Free Tier (should fail closed - 503 Service Unavailable)
curl -i -H "X-API-Key: free_123" http://localhost:8000/test

# Pro Tier (should fail open - 200 OK)
curl -i -H "X-API-Key: pro_123" http://localhost:8000/test

# Enterprise Tier (should fail open - 200 OK)
curl -i -H "X-API-Key: ent_123" http://localhost:8000/test

Expected: Free tier gets 503 Service Unavailable. Pro and Enterprise tiers get 200 OK.

6. Restart Redis and set health back to NORMAL

docker start fastapi-rate-limiter-redis-1 # Replace with your actual Redis container name if different
curl -i -X POST -H "X-Admin-Secret: letmein" -H "Content-Type: application/json" -d '{"status": "NORMAL"}' http://localhost:8000/system/health

7. Test Dynamic Configuration Reloading

The application is designed to automatically reload the rate_limit_config.json file whenever it's modified, so you can change the rate limits without restarting the server.

  1. Make an initial request to see the current limit for the "pro" tier.

    curl -i -H "X-API-Key: pro_123" http://localhost:8000/test

    Expected: The X-RateLimit-Limit header should be 150 (the burst_limit from the original config).

  2. Edit config/rate_limit_config.json and change the burst_limit for the "pro" tier from 150 to 5.

    // ...
    "pro": {
      "normal":   { "limit": 100, "burst_limit": 5 }, // Changed from 150
      "degraded": { "limit": 100 }
    },
    // ...
  3. Make another request with the same "pro" API key.

    curl -i -H "X-API-Key: pro_123" http://localhost:8000/test

    Expected: The X-RateLimit-Limit header should now be 5, reflecting the change you just made.

Testing in a Multi-Instance Environment

This project is designed for a distributed environment. You can simulate and test this locally using Docker Compose to run multiple instances of the application. This is the best way to verify that the Redis Pub/Sub mechanism is correctly synchronizing the health status across all instances.

The docker-compose.yml is configured to provide a stable instance at localhost:8000 and allow you to scale up additional instances on dynamic ports.

  • fastapi-app: A single instance always available at http://localhost:8000.
  • fastapi-app-scaled: A service you can scale. By default, one instance will run on a random port.

1. Scale Up Additional Instances

Use the --scale flag with docker-compose to start multiple instances of the fastapi-app-scaled service. For example, to run 3 scaled instances (for a total of 4 application instances):

docker-compose up --build -d --scale fastapi-app-scaled=3

2. Find the Instance Ports

To find the specific port for each running replica, use the docker-compose ps command:

docker-compose ps

You will see output showing the fastapi-app instance on port 8000 and the fastapi-app-scaled instances on random host ports.

3. View Logs from All Instances

To see the synchronized logs from all running containers, use the -f (follow) flag:

docker-compose logs -f fastapi-app fastapi-app-scaled

You will see interleaved logs from all application instances, each identified by its container name.

4. Trigger a Health Status Change

In a separate terminal, send a request to the admin endpoint of any single instance to change the system health. You can use the stable localhost:8000 endpoint or any of the dynamically assigned ports.

# Using the stable port
curl -i -X POST -H "X-Admin-Secret: letmein" -H "Content-Type: application/json" -d '{"status": "DEGRADED"}' http://localhost:8000/system/health

5. Verify the Pub/Sub Synchronization

Now, observe the logs in your first terminal. You should see log messages from all four instances indicating that they received the health status update via Redis Pub/Sub.

The request to /system/health will only go to one of the instances, but that instance publishes the change to Redis. The logs confirm that all other instances, which are subscribed to the channel, receive the message and update their internal state in real-time. This verifies that the distributed synchronization is working correctly.

6. Scale Down

Once you're done testing, you can bring the services down:

docker-compose down

Running the Tests

To run the tests, you can use pytest:

# Make sure you have the dev dependencies installed
pip install -e ".[dev]"

# Run the tests
pytest

Design Justification and Trade-offs

Rate Limiting Algorithm: Fixed Window Counter

The current implementation uses a Fixed Window Counter algorithm.

  • Pros:
    • Simple to implement and understand.
    • Memory efficient as it only stores one counter per user per window.
    • Fast, as it only requires a single INCR operation in Redis, now made atomic using a Redis pipeline.
  • Cons:
    • "Thundering Herd" Problem: It can allow a burst of traffic at the edge of a window. For example, if the limit is 10 requests per minute, a user can make 10 requests at the end of a minute and another 10 requests at the beginning of the next minute, resulting in 20 requests in a short period.

For this assignment, the Fixed Window Counter is a reasonable choice. However, for a more robust solution, a Sliding Window Log or Sliding Window Counter algorithm could be used to avoid the "thundering herd" problem.

Distributed Consistency and Performance

  • Redis: Redis was chosen as the central store for its high performance, low latency, and atomic operations (INCR). The INCR and EXPIRE operations are now executed atomically using a Redis pipeline to prevent race conditions.
  • Health Check Caching: The local caching in the HealthStore is a crucial optimization to avoid hitting Redis on every request, which would add significant latency.
  • Tier-Based Fail-over: The middleware now implements a tier-based fail-over strategy. When Redis is unavailable:
    • Free Tier: Requests fail closed, returning 503 Service Unavailable. This prioritizes system stability by shedding non-paying traffic.
    • Pro & Enterprise Tiers: Requests fail open, allowing them to proceed. This prioritizes availability for paying customers, honoring SLAs even during Redis outages.
  • Explicit Health Checks for Robust Fail-over: To reliably trigger the tier-based fail-over, the system first sends a PING command to Redis on every request. This acts as a fast, explicit health check. While this adds a minimal latency overhead (one extra network round-trip), it provides a clean and immediate way to detect if Redis is unavailable, allowing the system to switch to its fail-over logic without waiting for a command timeout. This trade-off prioritizes predictable behavior and resilience over shaving microseconds off the "happy path" response time.

Scalability

The current solution is designed to be scalable.

  • Horizontal Scaling: The FastAPI application can be scaled horizontally by running multiple instances behind a load balancer. Since the rate limiting counters are stored in a central Redis instance, the rate limiting will be consistent across all instances.
  • Redis Scalability: Redis can be scaled using clustering or a primary/replica setup to handle a large number of concurrent requests.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published