A clean, Pythonic wrapper for MLB Stats API endpoints with automatic schema-driven parameter validation.
- π― Clean API: Parameters are intelligently routed to path or query params based on the schema configuration
- πͺΆ Lean: Only requires
requests- no heavy dependencies - π Schema-driven: All endpoints and methods generated from JSON schemas
- Sourced from https://beta-statsapi.mlb.com/docs/ (now offline)
- β Type-safe: Automatic parameter validation from API schemas
- π Dynamic: Zero hardcoded models - updates via schema changes only
- π§ͺ Well-tested: Comprehensive unit tests with pytest and BDD test suite with stub capture/replay
- π Self-documenting: Auto-generated docstrings from API schemas
- π Fast: Stub-based testing runs in <1 second
# With pip
pip install pymlb-statsapi
# With uv (recommended)
uv add pymlb-statsapifrom pymlb_statsapi import api
# Get today's game schedule
response = api.Schedule.schedule(sportId=1, date="2024-10-27")
data = response.json()
# Get latest live game data (no timecode = most recent)
response = api.Game.liveGameV1(game_pk=747175)
data = response.json()
# Get game data at specific time
response = api.Game.liveGameV1(game_pk=747175, timecode="20241027_233000")
data = response.json()
# Get team information
response = api.Team.team(teamId=147, season=2024)
team_data = response.json()
# Save response to gzipped file with metadata
result = response.gzip(prefix="mlb-data")
print(f"Saved to: {result['path']}")
print(f"Captured at: {result['timestamp']}")Parameters accept both integers and strings - the library handles type conversion automatically:
# These are equivalent - use whichever is more convenient
api.Game.liveGameV1(game_pk=747175) # Integer (Pythonic)
api.Game.liveGameV1(game_pk="747175") # String (API format)
api.Team.team(teamId=147) # Integer
api.Team.team(teamId="147") # String
# The MLB API sometimes returns IDs as strings in responses
# You can pass them directly without conversion:
games = api.Schedule.schedule(sportId=1, date="2024-10-27").json()
for game in games['dates'][0]['games']:
# game['gamePk'] is an integer from the API
live_data = api.Game.liveGameV1(game_pk=game['gamePk'])
# Or if you have a string ID from elsewhere:
game_id = "747175" # From database, user input, etc.
live_data = api.Game.liveGameV1(game_pk=game_id) # Works!Why this matters: The MLB Stats API returns some fields as integers and others as strings. This flexible parameter handling means you never need to worry about type conversion - just pass what you have!
The Schema Reference is the heart of this library - browse all 21 MLB Stats API endpoints with detailed parameter docs and working examples for every method.
- Full Documentation - Complete guide on ReadTheDocs
- API Reference - Implementation documentation
- Examples - Check the
examples/directory for working code samples - Testing Guide - See the Testing documentation
All MLB API endpoints are defined as JSON schemas rather than hardcoded. These schemas were sourced from the MLB Stats API Beta documentation site (https://beta-statsapi.mlb.com/docs/), which is no longer publicly available:
pymlb_statsapi/resources/schemas/statsapi/stats_api_1_0/
βββ schedule.json β Schedule endpoint with methods
βββ game.json β Game endpoint (live feed, boxscore, etc.)
βββ team.json β Team endpoint (roster, stats, etc.)
βββ person.json β Person endpoint (player data)
βββ ...
Each schema defines which parameters are path parameters vs query parameters. Method paths are mapped in endpoint-model.json:
{
"schedule": {
"schedule": {
"path": "/v1/schedule",
"name": "schedule"
},
"tieGames": {
"path": "/v1/schedule/games/tied",
"name": "tieGames"
}
}
}The library automatically routes parameters to path or query parameters based on schema configuration:
# Parameters are routed correctly based on the schema
response = api.Game.liveGameV1(game_pk=747175, timecode="20241027_233000")
# Resolves to: /api/v1/game/747175/feed/live?timecode=20241027_233000
# game_pk β path parameter, timecode β query parameter
response = api.Schedule.schedule(sportId=1, date="2024-10-27")
# Resolves to: /api/v1/schedule?sportId=1&date=2024-10-27
# Both are query parameters
# Latest game data (omit optional timecode)
response = api.Game.liveGameV1(game_pk=747175)
# Resolves to: /api/v1/game/747175/feed/liveDynamic Factory (factory.py):
- Generates endpoint classes and methods from schemas at runtime
- Creates clean function signatures with proper parameter handling
- Handles method overloading (e.g.,
seasons()with/withoutseasonId)
Registry (registry.py):
- Central
apisingleton that loads all endpoints - Provides discovery API for exploring available methods
API Response (factory.py: APIResponse):
- Wraps
requests.Responsewith metadata - Provides
.json(),.save_json(),.get_path(),.get_uri()methods - Generates consistent resource paths for file storage
from pymlb_statsapi import api
# Schedule queries
response = api.Schedule.schedule(
sportId=1,
date="2024-10-27",
teamId=147
)
# Get all teams
response = api.Team.teams(sportId=1, season=2024)
# Get player information
response = api.Person.people(personId=660271)
# Get season information (overloaded method)
response = api.Season.seasons(sportId=1) # All seasons
response = api.Season.seasons(seasonId=2024) # Specific season
# Get game stats
response = api.Stats.stats(
group="hitting",
stats="season",
season=2024,
sportId=1
)# Auto-generate file path
result = response.save_json(prefix="mlb-data")
print(f"Saved to: {result['path']}")
print(f"Bytes written: {result['bytes_written']}")
# Explicit file path
response.save_json("/path/to/file.json")
# Gzipped JSON
response.gzip(prefix="mlb-data")# File protocol (default)
uri = response.get_uri(protocol="file", prefix="mlb-data")
# Result: file:///path/to/.var/local/mlb_statsapi/mlb-data/schedule/schedule/date=2025-06-01.json
# S3 protocol (requires PYMLB_STATSAPI__S3_BUCKET env var)
uri = response.get_uri(protocol="s3", prefix="raw-data", gzip=True)
# Result: s3://my-bucket/raw-data/schedule/schedule/date=2025-06-01.json.gz
# Redis protocol
uri = response.get_uri(protocol="redis", prefix="mlb")
# Result: redis://localhost:6379/0/mlb/schedule/schedule/date=2025-06-01# List all available endpoints
print(api.get_endpoint_names())
# ['schedule', 'game', 'team', 'person', 'season', ...]
# List methods for an endpoint
endpoint = api.get_endpoint("schedule")
print(endpoint.get_method_names())
# ['schedule', 'tieGames', 'postseason', ...]
# Get detailed method information
info = api.get_method_info("schedule", "schedule")
print(info["path"]) # /v1/schedule
print(info["summary"]) # View schedule info
print(info["path_params"]) # []
print(info["query_params"]) # [{"name": "sportId", ...}, ...]# Run unit tests
pytest
# With coverage
pytest --cov=pymlb_statsapi --cov-report=html
# Specific test file
pytest tests/unit/pymlb_statsapi/model/test_factory.py# Run all BDD tests with stubs (completes in <1 second)
behave
# Or explicitly
STUB_MODE=replay behave# Capture stubs by making real API calls
STUB_MODE=capture behave
# Capture stubs for specific endpoint
STUB_MODE=capture behave features/schedule.feature# Test specific feature
behave features/game.feature
# Verbose output
behave -v features/season.feature
# Test with specific tag
behave --tags=@game# Clone repository
git clone https://github.com/power-edge/pymlb_statsapi.git
cd pymlb_statsapi
# Install dependencies
uv sync
# Install pre-commit hooks
pre-commit install# Linting
ruff check .
# Auto-fix linting issues
ruff check --fix .
# Formatting
ruff format .
# Run all pre-commit hooks
pre-commit run --all-files
# Security scan
bandit -r pymlb_statsapi/# Build package
hatch build
# Or with uv
uv build
# Version is auto-generated from git tags via hatch-vcsFor local publishing (requires .env configuration):
# Set up credentials
cp .env.example .env
# Edit .env with your PyPI tokens
# Publish to TestPyPI (for testing)
./scripts/publish.sh testpypi
# Publish to PyPI (production)
./scripts/publish.sh pypiFor automated publishing, use GitHub Actions:
- Push a version tag (e.g.,
v1.2.0) to trigger automatic PyPI publishing - Configure
PYPI_TOKENsecret in GitHub repository settings
- 30 unit tests with pytest covering core functionality
- 39 BDD scenarios covering all major endpoints
- 277 test steps with path/query parameter variations
- Stub-based testing for fast, deterministic CI/CD
- Real-world data using completed games (October 2024 World Series)
If you find this library useful, consider supporting its development:
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to fork the repository and submit a Pull Request.
We have a git helper for common operations, see scripts/git.sh
================================
Git Workflow Helper
================================
1) Format & commit changes
2) Create release (bump version & tag)
3) Push to remote
4) Full release (format, commit, bump, tag, push, build)
5) Status
6) Exit- Built on the excellent MLB Stats API
- Inspired by various MLB data projects in the community
- Thanks to all contributors!
Made with β€οΈ by the PyMLB StatsAPI Team