Skip to content

power-edge/pymlb_statsapi

PyMLB StatsAPI

PyPI version Documentation Status Tests codecov Python 3.10+ License: MIT GitHub stars GitHub forks

A clean, Pythonic wrapper for MLB Stats API endpoints with automatic schema-driven parameter validation.

✨ Features

  • 🎯 Clean API: Parameters are intelligently routed to path or query params based on the schema configuration
  • πŸͺΆ Lean: Only requires requests - no heavy dependencies
  • πŸ“‹ Schema-driven: All endpoints and methods generated from JSON schemas
  • βœ… Type-safe: Automatic parameter validation from API schemas
  • πŸ”„ Dynamic: Zero hardcoded models - updates via schema changes only
  • πŸ§ͺ Well-tested: Comprehensive unit tests with pytest and BDD test suite with stub capture/replay
  • πŸ“š Self-documenting: Auto-generated docstrings from API schemas
  • πŸš€ Fast: Stub-based testing runs in <1 second

πŸš€ Quick Start

Installation

# With pip
pip install pymlb-statsapi

# With uv (recommended)
uv add pymlb-statsapi

Basic Usage

from pymlb_statsapi import api

# Get today's game schedule
response = api.Schedule.schedule(sportId=1, date="2024-10-27")
data = response.json()

# Get latest live game data (no timecode = most recent)
response = api.Game.liveGameV1(game_pk=747175)
data = response.json()

# Get game data at specific time
response = api.Game.liveGameV1(game_pk=747175, timecode="20241027_233000")
data = response.json()

# Get team information
response = api.Team.team(teamId=147, season=2024)
team_data = response.json()

# Save response to gzipped file with metadata
result = response.gzip(prefix="mlb-data")
print(f"Saved to: {result['path']}")
print(f"Captured at: {result['timestamp']}")

πŸ”§ Smart Parameter Validation

Parameters accept both integers and strings - the library handles type conversion automatically:

# These are equivalent - use whichever is more convenient
api.Game.liveGameV1(game_pk=747175)          # Integer (Pythonic)
api.Game.liveGameV1(game_pk="747175")        # String (API format)

api.Team.team(teamId=147)                     # Integer
api.Team.team(teamId="147")                   # String

# The MLB API sometimes returns IDs as strings in responses
# You can pass them directly without conversion:
games = api.Schedule.schedule(sportId=1, date="2024-10-27").json()
for game in games['dates'][0]['games']:
    # game['gamePk'] is an integer from the API
    live_data = api.Game.liveGameV1(game_pk=game['gamePk'])

    # Or if you have a string ID from elsewhere:
    game_id = "747175"  # From database, user input, etc.
    live_data = api.Game.liveGameV1(game_pk=game_id)  # Works!

Why this matters: The MLB Stats API returns some fields as integers and others as strings. This flexible parameter handling means you never need to worry about type conversion - just pass what you have!

πŸ“– Documentation

πŸ” Start Here: Schema Reference

The Schema Reference is the heart of this library - browse all 21 MLB Stats API endpoints with detailed parameter docs and working examples for every method.

Additional Resources

  • Full Documentation - Complete guide on ReadTheDocs
  • API Reference - Implementation documentation
  • Examples - Check the examples/ directory for working code samples
  • Testing Guide - See the Testing documentation

πŸ—οΈ Architecture

Config-Driven Design

All MLB API endpoints are defined as JSON schemas rather than hardcoded. These schemas were sourced from the MLB Stats API Beta documentation site (https://beta-statsapi.mlb.com/docs/), which is no longer publicly available:

pymlb_statsapi/resources/schemas/statsapi/stats_api_1_0/
β”œβ”€β”€ schedule.json  β†’ Schedule endpoint with methods
β”œβ”€β”€ game.json      β†’ Game endpoint (live feed, boxscore, etc.)
β”œβ”€β”€ team.json      β†’ Team endpoint (roster, stats, etc.)
β”œβ”€β”€ person.json    β†’ Person endpoint (player data)
└── ...

Each schema defines which parameters are path parameters vs query parameters. Method paths are mapped in endpoint-model.json:

{
  "schedule": {
    "schedule": {
      "path": "/v1/schedule",
      "name": "schedule"
    },
    "tieGames": {
      "path": "/v1/schedule/games/tied",
      "name": "tieGames"
    }
  }
}

Clean API: Intelligent Parameter Routing

The library automatically routes parameters to path or query parameters based on schema configuration:

# Parameters are routed correctly based on the schema
response = api.Game.liveGameV1(game_pk=747175, timecode="20241027_233000")
# Resolves to: /api/v1/game/747175/feed/live?timecode=20241027_233000
#              game_pk β†’ path parameter, timecode β†’ query parameter

response = api.Schedule.schedule(sportId=1, date="2024-10-27")
# Resolves to: /api/v1/schedule?sportId=1&date=2024-10-27
#              Both are query parameters

# Latest game data (omit optional timecode)
response = api.Game.liveGameV1(game_pk=747175)
# Resolves to: /api/v1/game/747175/feed/live

Key Components

Dynamic Factory (factory.py):

  • Generates endpoint classes and methods from schemas at runtime
  • Creates clean function signatures with proper parameter handling
  • Handles method overloading (e.g., seasons() with/without seasonId)

Registry (registry.py):

  • Central api singleton that loads all endpoints
  • Provides discovery API for exploring available methods

API Response (factory.py: APIResponse):

  • Wraps requests.Response with metadata
  • Provides .json(), .save_json(), .get_path(), .get_uri() methods
  • Generates consistent resource paths for file storage

πŸŽ“ Examples

Working with Different Endpoints

from pymlb_statsapi import api

# Schedule queries
response = api.Schedule.schedule(
    sportId=1,
    date="2024-10-27",
    teamId=147
)

# Get all teams
response = api.Team.teams(sportId=1, season=2024)

# Get player information
response = api.Person.people(personId=660271)

# Get season information (overloaded method)
response = api.Season.seasons(sportId=1)  # All seasons
response = api.Season.seasons(seasonId=2024)  # Specific season

# Get game stats
response = api.Stats.stats(
    group="hitting",
    stats="season",
    season=2024,
    sportId=1
)

File Storage

# Auto-generate file path
result = response.save_json(prefix="mlb-data")
print(f"Saved to: {result['path']}")
print(f"Bytes written: {result['bytes_written']}")

# Explicit file path
response.save_json("/path/to/file.json")

# Gzipped JSON
response.gzip(prefix="mlb-data")

URI Generation for Different Protocols

# File protocol (default)
uri = response.get_uri(protocol="file", prefix="mlb-data")
# Result: file:///path/to/.var/local/mlb_statsapi/mlb-data/schedule/schedule/date=2025-06-01.json

# S3 protocol (requires PYMLB_STATSAPI__S3_BUCKET env var)
uri = response.get_uri(protocol="s3", prefix="raw-data", gzip=True)
# Result: s3://my-bucket/raw-data/schedule/schedule/date=2025-06-01.json.gz

# Redis protocol
uri = response.get_uri(protocol="redis", prefix="mlb")
# Result: redis://localhost:6379/0/mlb/schedule/schedule/date=2025-06-01

API Discovery

# List all available endpoints
print(api.get_endpoint_names())
# ['schedule', 'game', 'team', 'person', 'season', ...]

# List methods for an endpoint
endpoint = api.get_endpoint("schedule")
print(endpoint.get_method_names())
# ['schedule', 'tieGames', 'postseason', ...]

# Get detailed method information
info = api.get_method_info("schedule", "schedule")
print(info["path"])          # /v1/schedule
print(info["summary"])       # View schedule info
print(info["path_params"])   # []
print(info["query_params"])  # [{"name": "sportId", ...}, ...]

πŸ§ͺ Testing

Unit Tests

# Run unit tests
pytest

# With coverage
pytest --cov=pymlb_statsapi --cov-report=html

# Specific test file
pytest tests/unit/pymlb_statsapi/model/test_factory.py

BDD Tests with Stubs (Fast)

# Run all BDD tests with stubs (completes in <1 second)
behave

# Or explicitly
STUB_MODE=replay behave

Capture Fresh Stubs

# Capture stubs by making real API calls
STUB_MODE=capture behave

# Capture stubs for specific endpoint
STUB_MODE=capture behave features/schedule.feature

Run Specific BDD Tests

# Test specific feature
behave features/game.feature

# Verbose output
behave -v features/season.feature

# Test with specific tag
behave --tags=@game

πŸ› οΈ Development

Setup

# Clone repository
git clone https://github.com/power-edge/pymlb_statsapi.git
cd pymlb_statsapi

# Install dependencies
uv sync

# Install pre-commit hooks
pre-commit install

Code Quality

# Linting
ruff check .

# Auto-fix linting issues
ruff check --fix .

# Formatting
ruff format .

# Run all pre-commit hooks
pre-commit run --all-files

# Security scan
bandit -r pymlb_statsapi/

Building

# Build package
hatch build

# Or with uv
uv build

# Version is auto-generated from git tags via hatch-vcs

Publishing

For local publishing (requires .env configuration):

# Set up credentials
cp .env.example .env
# Edit .env with your PyPI tokens

# Publish to TestPyPI (for testing)
./scripts/publish.sh testpypi

# Publish to PyPI (production)
./scripts/publish.sh pypi

For automated publishing, use GitHub Actions:

  • Push a version tag (e.g., v1.2.0) to trigger automatic PyPI publishing
  • Configure PYPI_TOKEN secret in GitHub repository settings

πŸ“Š Test Coverage

  • 30 unit tests with pytest covering core functionality
  • 39 BDD scenarios covering all major endpoints
  • 277 test steps with path/query parameter variations
  • Stub-based testing for fast, deterministic CI/CD
  • Real-world data using completed games (October 2024 World Series)

🌟 Support This Project

If you find this library useful, consider supporting its development:

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to fork the repository and submit a Pull Request.

We have a git helper for common operations, see scripts/git.sh

================================
   Git Workflow Helper
================================
1) Format & commit changes
2) Create release (bump version & tag)
3) Push to remote
4) Full release (format, commit, bump, tag, push, build)
5) Status
6) Exit

πŸ”— Links

πŸ™ Acknowledgments

  • Built on the excellent MLB Stats API
  • Inspired by various MLB data projects in the community
  • Thanks to all contributors!

Made with ❀️ by the PyMLB StatsAPI Team