baseball-data-lab is a Python application and library for creating advanced stat
summary sheets for MLB players. It supports yearly customizations and provides
visualizations. The project can also be imported as a library so you can extend
its functionality for custom applications or data processing workflows. It uses
the pybaseball and
MLB-StatsAPI libraries along with
other Python packages to gather and format data for dashboards, reports and
other analytical tools.
The project retrieves data from MLB and FanGraphs to ensure accurate, up‑to‑date statistics. Future releases will continue to expand the application's capabilities so it can serve as both a standalone tool and a reusable library.
Below are samples of the summary sheets that can be generated by this project. The first sample is a Batting Summary for Riley Greene for the 2024 season. The second sample is a Pitching Summary for Tarik Skubal for the 2024 season.
In addition to the baseball stats you would expect, the summary sheets also include the following "advanced" stats:
| Batters | Pitchers | ||||
|---|---|---|---|---|---|
| BB% | UBR | K/9 | Opponent Avg | Swing % | |
| K% | wRC | BB/9 | WHIP | Splits | |
| OBP | wRAA | K/BB | BABIP | ||
| SLG | wOBA | H/9 | LOB% | ||
| OPS | wRC+ | HR/9 | ERA- | ||
| ISO | WAR | K% | FIP- | ||
| Spd | Splits | BB% | FIP | ||
| BABIP | K-BB% | RS/9 | |||
The project is organized as follows:
baseball-data-lab/
├── README.md
├── setup.py
├── requirements.txt
├── baseball_data_lab/ # Source code
│ ├── apis/ # API clients for MLB and FanGraphs
│ ├── data_viz/ # Plotting utilities
│ ├── player/ # Player models and helpers
│ ├── summary_sheets/ # Classes that generate summary sheets
│ ├── team/ # Team utilities
│ └── ...
├── examples/ # Example scripts for data collection
└── tests/ # Unit tests
To get started with the project, follow these steps:
- Clone the repository:
git clone https://github.com/timothyf/baseball-data-lab.git
cd baseball-data-lab- Set up a Python virtual environment (optional but recommended):
python3 -m venv venv
source venv/bin/activate- Install the required dependencies:
pip install -r requirements.txtThere are several scripts in the examples directory for some basic functionality:
python examples/generate_player_summary.py [options]
Options:
--players [1 or more player names]
--teams [1 or more team names]
--year [specify a 4-digit year]Run the project by executing the script in the examples directory:
python examples/save_statcast_data.py [options]
--players [1 or more player names]
--teams [1 or more team names]
--year [specify a 4-digit year]python examples/generate_player_summary.py --players 'Riley Greene'Output:
output/2024/Tigers/batter_summary_riley_greene.png

python examples/generate_player_summary.py --teams 'Detroit Tigers' --year 2024To set up the PostgreSQL database for Baseball Data Lab, follow these steps:
-
Install PostgreSQL:
Download and install PostgreSQL from postgresql.org. -
Create the Database: Open your terminal and run:
createdb baseball_data_lab_db
-
Initialize the Schema: Run the provided
setup_db.sqlfile to create the tables:psql -d baseball_data_lab_db -f setup_db.sql
-
Verify the Setup: Connect to your database and list the tables:
psql -d baseball_data_lab_db \dtYou should see tables such as
games,players,umpiresandplate_appearances.
This project was inspired by my time working in the R&D department of the Washington Nationals, and the pitching summary project from Thomas Nestico. Here is a link to an article describing his project:
https://medium.com/@thomasjamesnestico/creating-the-perfect-pitching-summary-7b8a981ef0c5
This package and its author are not affiliated with MLB or any MLB team. This API wrapper interfaces with MLB's Stats API. Use of MLB data is subject to the notice posted at http://gdx.mlb.com/components/copyright.txt.
<style> table td.batter-col { background-color: lightblue; color: black; } table td.pitcher-col { background-color: lightgreen; color: black; } </style>