This project uses aiocache for caching, providing both in-memory and Redis-backed cache backends with full async/await support.
Caching is configured through the settings module with the following environment variables:
- CACHE_ENABLED: Enable or disable caching (default:
True)- When set to
False, all cache operations become no-ops without requiring code changes
- When set to
- CACHE_REDIS_HOST: Redis hostname (default:
None)- If not set, the persistent cache falls back to in-memory storage
- CACHE_REDIS_PORT: Redis port (default:
6379)
- CACHE_DEFAULT_TTL: Default TTL for memory cache in seconds (default:
300/ 5 minutes) - CACHE_PERSISTENT_TTL: Default TTL for persistent cache in seconds (default:
3600/ 1 hour)
Two cache backends are configured:
- Alias:
memory - Implementation: Always uses in-memory storage
- Use case: Fast, ephemeral caching for request-scoped or temporary data
- Serializer: Pickle
- Default TTL: 300 seconds (configurable via
CACHE_DEFAULT_TTL)
- Alias:
persistent - Implementation: Uses Redis if
CACHE_REDIS_HOSTis configured, otherwise falls back to in-memory - Use case: Data that needs to persist across restarts or be shared across instances
- Serializer: Pickle
- Default TTL: 3600 seconds (configurable via
CACHE_PERSISTENT_TTL)
from gitbrag.services.cache import get_cached, set_cached, delete_cached, clear_cache
# Get a cached value (uses memory cache by default)
value = await get_cached("my_key")
# Get from persistent cache
value = await get_cached("my_key", alias="persistent")
# Set a cached value with default TTL (5 minutes for memory cache)
await set_cached("my_key", "my_value")
# Set with custom TTL
await set_cached("my_key", "my_value", ttl=300, alias="persistent")
# Delete a cached value
await delete_cached("my_key", alias="persistent")
# Clear entire cache
await clear_cache(alias="persistent")You can use aiocache's built-in decorators directly:
from aiocache import cached
@cached(ttl=600, alias="persistent", key_builder=lambda f, *args, **kwargs: f"user:{args[0]}")
async def get_user_data(user_id: int):
# Expensive operation here
return await fetch_user_from_database(user_id)For more control, you can get a cache instance directly:
from gitbrag.services.cache import get_cache
# Get memory cache
cache = get_cache("memory")
await cache.set("key", "value", ttl=300)
value = await cache.get("key")
# Get persistent cache (Redis or fallback to memory)
cache = get_cache("persistent")
await cache.set("key", "value", ttl=3600)
value = await cache.get("key")The cache system must be initialized before use.
Caches are automatically initialized in the FastAPI startup event. No manual initialization is required.
If you need to initialize caches manually (e.g., in a custom script or CLI command), use:
from gitbrag.services.cache import configure_caches
from gitbrag.settings import settings
configure_caches(settings)-
Choose the right backend:
- Use
memorycache for request-scoped or temporary data - Use
persistentcache for data that needs to survive restarts or be shared across instances
- Use
-
Set appropriate TTLs:
- Default TTLs are configured via settings and automatically applied
- Override with custom TTLs only when needed
- Shorter TTLs for frequently changing data, longer TTLs for stable data
-
Use meaningful keys:
- Include version numbers or namespaces in cache keys to avoid conflicts
- Example:
user:v1:123instead of just123
-
Handle cache misses:
- Always check if cached data is
Noneand have a fallback mechanism - Cache operations are safe when caching is disabled
- Always check if cached data is
-
Disable caching in development:
- Set
CACHE_ENABLED=Falseto disable caching without code changes - Useful for debugging or testing uncached behavior
- Set
-
Monitor cache size:
- Redis caches can grow large; implement eviction policies and monitor memory usage
- Use appropriate TTLs to prevent unbounded growth
GitBrag implements a two-tier caching approach for PR enrichment data to balance freshness with efficiency:
Purpose: Cache GitHub API responses for PR file lists
Implementation:
- File lists fetched via
/repos/{owner}/{repo}/pulls/{number}/filesAPI - Stored with 6-hour TTL (default, configurable via settings)
- Used by
fetch_pr_files()inpullrequests.py
Benefits:
- Enables efficient regeneration when users request overlapping time periods
- Example: User generates 1-year report, then 2-year report → reuses cached file lists for first year
- Reduces API calls significantly for frequently accessed PRs
- Fresh enough to capture recent changes
Configuration:
# In settings or environment
PR_FILES_CACHE_TTL = 21600 # 6 hours in seconds (default)Purpose: Store final computed report data with all aggregated metrics
Implementation:
- Final report data with calculated code statistics, language percentages, repo roles
- No expiration - cached permanently
- Used by
generate_report_data()inreports.py
Benefits:
- Enables public viewing without authentication (no GitHub API calls needed)
- Extremely fast response times for cached reports
- Reduces API rate limit consumption
- User can share report URLs publicly
Cache Keys:
# Report cache key format
f"report:{username}:{since_iso}:{until_iso}:{include_private}"
# Example
"report:octocat:2024-01-01:2024-12-31:false"-
First Request (user generates 1-year report):
- Fetch PRs from GitHub Search API
- For each PR, fetch file list → cache with 6-hour TTL
- Calculate aggregated metrics (languages, code stats, roles)
- Store final report → cache permanently
- Return report
-
Second Request (same user, 2-year report, within 6 hours):
- Fetch PRs from GitHub Search API (includes year 1 + year 2)
- For year 1 PRs: file lists retrieved from cache (no API calls!)
- For year 2 PRs: fetch file lists → cache with 6-hour TTL
- Calculate aggregated metrics across both years
- Store final report → cache permanently
- Return report
-
Public Viewing (anonymous user views report):
- Check permanent cache for report key
- If found: return cached report (no GitHub API calls, no auth needed)
- If not found: report must be regenerated (requires auth)
Intermediate Cache (PR Files):
- Automatically expires after 6 hours
- Manual flush:
FLUSHALLon Redis (development only) - Reasonable staleness for file lists (PRs don't change often after creation)
Permanent Cache (Reports):
- Never expires automatically
- Manual flush:
FLUSHALLon Redis (development only) - Intentional design: historical reports are snapshots in time
With two-tier caching:
- First generation: ~2-5 seconds for 100 PRs (file fetching dominates)
- Overlapping regeneration: ~1-2 seconds (50% cache hit on files)
- Cached report viewing: <100ms (instant, no API calls)
Configure caching behavior through environment variables:
# Disable caching entirely for debugging
export CACHE_ENABLED=False
# Or use caching without Redis (both backends use memory)
export CACHE_REDIS_HOST=
export CACHE_DEFAULT_TTL=60
export CACHE_PERSISTENT_TTL=300
# Or use local Redis
export CACHE_REDIS_HOST=localhost
export CACHE_REDIS_PORT=6379# Enable caching with Redis
export CACHE_ENABLED=True
export CACHE_REDIS_HOST=redis-cluster
export CACHE_REDIS_PORT=6379
export CACHE_DEFAULT_TTL=300
export CACHE_PERSISTENT_TTL=3600To disable caching without changing code:
- Via Environment Variable: Set
CACHE_ENABLED=False - Result: All cache operations (get, set, delete, clear) become no-ops
- Use Cases:
- Debugging issues related to stale cache data
- Testing application behavior without caching
- Temporary troubleshooting in production
When caching is disabled, your application continues to work normally - cache operations simply don't store or retrieve any data.
The web interface uses Redis for task tracking to prevent duplicate report generation and implement per-user rate limiting.
Report Task Keys: Track individual report generation tasks
task:report:{username}:{period}:{params_hash}
Example: task:report:tedivm:1_year:abc123
- Value: JSON metadata (started_at, username, period, params_hash)
- TTL: 300 seconds (5 minutes, configurable via
TASK_TIMEOUT_SECONDS) - Purpose: Prevent duplicate tasks for the same report
User Task Keys: Track active tasks per GitHub username
task:user:{reported_username}:active
Example: task:user:tedivm:active
- Value: JSON array of active task IDs for this user
- TTL: 300 seconds (refreshed when tasks are added)
- Purpose: Per-user rate limiting (max 1 concurrent task per username)
- Check: Before scheduling, check if task exists:
await cache.get(task_key) - Start: Set task key with metadata:
await cache.set(task_key, metadata, ttl=300) - Track User: Add to user's active tasks list
- Generate: Run report generation in background
- Complete: Delete task key:
await cache.delete(task_key) - Cleanup: Remove from user's active tasks list
Limiting to 1 concurrent task per reported GitHub username allows sequential report generation to reuse cached data:
- User profile information
- Repository metadata
- Previously fetched PR data
This can reduce GitHub API calls by 50-70% when generating multiple reports for the same user.
Tasks that hang or fail are automatically cleaned up after TTL expiration (300s), preventing orphaned keys from blocking future generations.