A comprehensive Garmin health data analysis platform with interactive dashboard, machine learning capabilities, automated reporting, and 24-hour coverage filtering for high-quality data analysis.
- Quick Start
- What's New
- Features
- Getting Started
- Common Commands
- 24-Hour Coverage Filtering
- Dashboard
- Day-of-Week Analysis
- Time-of-Day Stress Analysis
- Activity Calendar
- Machine Learning & Modeling
- Reporting & Analytics
- Data Quality Tools
- Use Cases
- Testing
- Project Structure
- Dependencies
- Contributing
- License
Option A: Automated sync with Garmin Connect (requires GarminDB from source):
# 1. Install dependencies
pipx install poetry
poetry install
# 2. Install GarminDB from source (required for automated sync)
git clone --recursive https://github.com/tcgoetz/GarminDB.git ~/GarminDB
cd ~/GarminDB && make setup
# 3. Configure Garmin Connect credentials
cd ~/Code/garmin-analysis
poetry run python -m garmin_analysis.cli_garmin_sync --setup \
--username your@email.com \
--password yourpassword \
--start-date 01/01/2024
# 4. Download your Garmin data
poetry run python -m garmin_analysis.cli_garmin_sync --sync --all
# 5. Generate unified dataset
poetry run python -m garmin_analysis.data_ingestion.load_all_garmin_dbs
# 6. Launch the dashboard
poetry run python run_dashboard.py
# Open http://localhost:8050 in your browserOption B: Manual setup (if you already have garmin.db):
# 1. Install dependencies
pipx install poetry
poetry install
# 2. Set up your Garmin data
mkdir -p db
cp /path/to/GarminDB/garmin.db db/garmin.db
# 3. Generate unified dataset
poetry run python -m garmin_analysis.data_ingestion.load_all_garmin_dbs
# 4. Launch the dashboard
poetry run python run_dashboard.py
# Open http://localhost:8050 in your browserFor detailed setup instructions, see Getting Started.
NEW! Comprehensive model analyzing how heart rate metrics and physical activities affect sleep quality!
- β Sophisticated ML model - 6 algorithms tested (ElasticNet best: RΒ²=0.258)
- β 28 features analyzed - HR (min/max/resting), activities, and lag features
- β Configurable imputation - 6 strategies for handling missing data
- β 4 visualizations - Performance, importance, predictions, correlations
- β Comprehensive testing - 42 dedicated tests ensuring reliability
- β Extensive documentation - Complete guides and examples
Key Finding: Body Battery is the strongest predictor of sleep quality, followed by heart rate metrics (23.4% importance) and activity metrics (20.7% importance).
# Run the sleep analysis model
poetry run python src/garmin_analysis/modeling/hr_activity_sleep_model.py
# Or use programmatically
from garmin_analysis.modeling.hr_activity_sleep_model import HRActivitySleepModel
model = HRActivitySleepModel()
results = model.run_analysis(imputation_strategy='median')See: docs/imputation_strategies.md for complete guide
Standardized missing value handling across all core modeling files!
- β
Shared imputation utility -
utils/imputation.pywith 6 strategies - β Applied to 4 core files - Prevents 53% data loss
- β Improved performance - 33% better RΒ² with median vs drop
- β 32 comprehensive tests - Full coverage of all strategies
- β Backward compatible - Existing code works unchanged
Strategies: median (default, recommended), mean, drop, forward_fill, backward_fill, none
from garmin_analysis.utils.imputation import impute_missing_values
# Robust median imputation (recommended for health data)
df_clean = impute_missing_values(df, ['hr_min', 'steps'], strategy='median')Analyze sleep score, body battery, and water intake patterns by day of the week to identify weekly trends and optimize your health routines.
- β Interactive dashboard with day-of-week analysis tab
- β CLI tool for standalone day-of-week analysis
- β Comprehensive visualizations with bar charts and trend comparisons
- β 24-hour coverage filtering support for reliable analysis
- β Automated summary reports showing best/worst days
24-Hour Coverage Filtering is available across all analysis tools! This major feature enhancement allows you to filter your analysis to only include days with complete 24-hour continuous data coverage, ensuring more reliable and accurate results.
- β
All visualization tools now support
--filter-24h-coverage - β Interactive dashboard has real-time filtering checkboxes
- β Modeling pipeline can train on high-quality data only
- β Reporting tools generate cleaner, more reliable reports
- β Configurable parameters for gap tolerance and edge tolerance
- π Garmin Connect Integration: NEW! CLI tools for GarminDB to automate data download and configuration
- π HR & Activity β Sleep Model: NEW! Analyze how heart rate and activities affect sleep quality with 6 ML algorithms
- π§ Flexible Imputation: NEW! 6 strategies for handling missing data (median, mean, drop, forward/backward fill, none)
- π Day-of-Week Analysis: Analyze sleep score, body battery, and water intake patterns by day of the week
- β° 24-Hour Coverage Filtering: Filter analysis to only days with complete 24-hour continuous data coverage for more reliable results
- π Activity Calendar: Visualize activity patterns with color-coded calendar showing different activity types
- π·οΈ Activity Type Mappings: Customize display names and colors for unknown or poorly named activity types
- π Interactive Dashboard: Real-time metric trends, correlation analysis, and day-of-week analysis with filtering options
- π€ Machine Learning: Comprehensive ML pipeline with anomaly detection, clustering, and predictive modeling
- π Visualization: Multiple plotting tools for trends, correlations, and feature analysis
- π Reporting: Automated summaries and comprehensive analytics reports
- π Data Quality: Advanced data quality analysis and coverage assessment tools
- ποΈ Data Ingestion: Unified data loading from multiple Garmin databases with schema validation
- π§ͺ Testing: Comprehensive test suite with 435 tests (unit and integration)
- π Notebooks: Interactive Jupyter notebooks for exploratory analysis
- Python 3.11 or 3.12 or 3.13 (required)
- Poetry for dependency management
- Garmin Connect account (for automated data sync via GarminDB)
- OR a pre-existing
garmin.dbfile from GarminDB
- Install Poetry (if not already installed):
pipx install poetry- Clone the repository (if you haven't already):
git clone <repository-url>
cd garmin-analysis- Install dependencies:
poetry installNEW! Automated sync with Garmin Connect (requires GarminDB from source):
- Install GarminDB (one-time setup):
git clone --recursive https://github.com/tcgoetz/GarminDB.git ~/GarminDB
cd ~/GarminDB && make setup- Set up your Garmin Connect credentials:
cd ~/Code/garmin-analysis
poetry run python -m garmin_analysis.cli_garmin_sync --setup \
--username your@email.com \
--password yourpassword \
--start-date 01/01/2024- Download all your Garmin data (first time only):
poetry run python -m garmin_analysis.cli_garmin_sync --sync --all- Generate the unified dataset:
poetry run python -m garmin_analysis.data_ingestion.load_all_garmin_dbsAlternative: Manual export (if you prefer):
-
Export your Garmin data using GarminDB to produce a
garmin.dbfile. -
Copy the database to this project:
mkdir -p db
cp /path/to/GarminDB/garmin.db db/garmin.db- Generate the unified dataset:
poetry run python -m garmin_analysis.data_ingestion.load_all_garmin_dbsThis creates data/master_daily_summary.csv combining all your Garmin data.
Check your data quality:
poetry run python -m garmin_analysis.features.quick_data_check --summary- Launch the dashboard:
poetry run python run_dashboard.py - Run your first analysis: See Common Commands below
- Explore visualizations: Check the Visualization utilities section
First-time setup:
# Install GarminDB from source (one-time)
git clone --recursive https://github.com/tcgoetz/GarminDB.git ~/GarminDB
cd ~/GarminDB && make setup
# Configure your Garmin Connect credentials
cd ~/Code/garmin-analysis
poetry run python -m garmin_analysis.cli_garmin_sync --setup \
--username your@email.com \
--password yourpassword \
--start-date 01/01/2024
# Download all historical data (do this once)
poetry run python -m garmin_analysis.cli_garmin_sync --sync --allDaily updates:
# Download only the latest data (fast, run daily)
poetry run python -m garmin_analysis.cli_garmin_sync --sync --latest
# Then regenerate your unified dataset
poetry run python -m garmin_analysis.data_ingestion.load_all_garmin_dbsEquivalent to running GarminDB directly:
# Our wrapper runs this for you:
garmindb_cli.py --all --download --import --analyze --latestSetup options:
# Full setup command with all options
poetry run python -m garmin_analysis.cli_garmin_sync --setup \
--username your@email.com \
--password yourpassword \
--start-date 01/01/2024 \
--download-latest-activities 50 \
--download-all-activities 2000Other operations:
# Backup your databases
poetry run python -m garmin_analysis.cli_garmin_sync --backup
# View statistics about your data
poetry run python -m garmin_analysis.cli_garmin_sync --statsCopy databases to project:
# Find GarminDB databases (located in ~/HealthData/DBs/)
poetry run python -m garmin_analysis.cli_garmin_sync --find-dbs
# Copy databases to project db/ directory
poetry run python -m garmin_analysis.cli_garmin_sync --copy-dbsAutomation script:
# Use the provided script for daily updates
./examples/daily_update.sh
# Or with dashboard restart
./examples/daily_update.sh --restart
# Add to cron for automatic daily updates (6 AM)
crontab -e
# Add: 0 6 * * * /path/to/garmin-analysis/examples/daily_update.sh >> ~/garmin-update.log 2>&1Note: See docs/garmin_connect_integration.md for complete GarminDB integration guide and troubleshooting.
- Generate unified dataset:
poetry run python -m garmin_analysis.data_ingestion.load_all_garmin_dbsCreates data/master_daily_summary.csv.
- Prepare modeling-ready dataset:
poetry run python -m garmin_analysis.data_ingestion.prepare_modeling_datasetCreates data/modeling_ready_dataset.csv with cleaned data for ML.
- Schema inspection and drift detection:
# Inspect database schemas
poetry run python -m garmin_analysis.data_ingestion.inspect_sqlite_schema db/garmin.db
# Inspect all databases in a directory
poetry run python -m garmin_analysis.data_ingestion.inspect_sqlite_schema --dir db
# Export expected schema
poetry run python -m garmin_analysis.data_ingestion.inspect_sqlite_schema export db/garmin.db reports/expected_schema.json
# Compare live DB vs expected schema
poetry run python -m garmin_analysis.data_ingestion.inspect_sqlite_schema compare db/garmin.db reports/expected_schema.json --fail-on-driftNEW FEATURE! All analysis tools now support filtering to only days with complete 24-hour continuous data coverage. This ensures more reliable analysis by excluding days with data gaps.
# Generate plots with high-quality data only
poetry run python -m garmin_analysis.viz.plot_trends_range --filter-24h-coverage
# Run comprehensive modeling with filtered data
poetry run python -m garmin_analysis.modeling.comprehensive_modeling_pipeline --filter-24h-coverage
# Generate reports with 24h coverage filtering
poetry run python -m garmin_analysis.reporting.run_all_analytics --filter-24h-coverage
# Launch dashboard with filtering options
poetry run python -m garmin_analysis.dashboard.app
# Then check the "Only days with 24-hour continuous coverage" checkbox
# and set "Max gap (minutes)" to your preferred tolerance (default: 2)- π― More Reliable Analysis: Ensures data completeness for time-series analysis
- π§ Better Model Training: Reduces noise from incomplete data days
- π Consistent Comparisons: Enables fair comparison across different time periods
- β‘ Configurable: Customize gap tolerance and edge tolerance parameters
Recommended for:
- π Time-series analysis - Ensures continuous data streams
- π€ Machine learning - Reduces noise and improves model accuracy
- π Trend analysis - Provides consistent data points for comparison
- π¬ Research studies - Ensures data quality for scientific analysis
- π Reporting - Generates cleaner, more reliable reports
Optional for:
- π Exploratory analysis - When you want to see all available data
- π± Quick checks - When data completeness is less critical
- π― Specific day analysis - When analyzing particular events or days
# Using the convenient script
poetry run python run_dashboard.py
# Or run directly
poetry run python -m garmin_analysis.dashboard.appOpen http://localhost:8050.
The dashboard now includes:
- π Day of Week Analysis: Sleep score, body battery, and water intake by day of week
- π 30-Day Health Overview: Variable 30-day window for stress, HR, body battery, and sleep
- π Metric Trends: Time series plots with filtering
# Generate comprehensive trend plots for all metrics
poetry run python -m garmin_analysis.viz.plot_trends_range
# With 24-hour coverage filtering
poetry run python -m garmin_analysis.viz.plot_trends_range --filter-24h-coverage# Generate feature correlation heatmaps
poetry run python -m garmin_analysis.viz.plot_feature_correlation
# Plot individual feature trends
poetry run python -m garmin_analysis.viz.plot_feature_trend# Generate calendar for all available data
poetry run python -m garmin_analysis.viz.cli_activity_calendar
# Last 6 months
poetry run python -m garmin_analysis.viz.cli_activity_calendar --months 6
# Specific date range
poetry run python -m garmin_analysis.viz.cli_activity_calendar --start-date 2024-01-01 --end-date 2024-12-31# Generate summary statistics for all metrics
poetry run python -m garmin_analysis.features.summary_statsNEW FEATURE! Analyze your sleep score, body battery, and water intake patterns by day of the week to identify weekly trends and optimize your health routines.
# Run day-of-week analysis with visualizations
poetry run python -m garmin_analysis.cli_day_of_week
# Run with verbose output
poetry run python -m garmin_analysis.cli_day_of_week --verbose
# Show plots interactively (instead of saving)
poetry run python -m garmin_analysis.cli_day_of_week --show-plots
# Skip saving plots to files
poetry run python -m garmin_analysis.cli_day_of_week --no-save
# Use 24-hour coverage filtering (optional)
poetry run python -m garmin_analysis.cli_day_of_week --filter-24h-coverage
# Customize filtering parameters
poetry run python -m garmin_analysis.cli_day_of_week --filter-24h-coverage \
--max-gap 5 --day-edge-tolerance 5 --coverage-allowance-minutes 60The day-of-week analysis is also available in the interactive dashboard:
# Launch the dashboard
poetry run python run_dashboard.py
# Or: poetry run python -m garmin_analysis.dashboard.appThen navigate to the "π Day of Week Analysis" tab to:
- Select which metrics to analyze (Sleep Score, Body Battery Max/Min, Water Intake)
- Apply 24-hour coverage filtering for reliable results
- View interactive bar charts and trend comparisons
- Explore patterns in real-time
- π Comprehensive Analysis: Sleep score, body battery max/min, and water intake
- π Multiple Visualizations: Bar charts with error bars and trend line comparisons
- π― Interactive Controls: Select metrics and apply filters in real-time
- π Automated Summaries: Best/worst days with statistical differences
- β‘ 24-Hour Coverage Filtering: Optional filtering for high-quality data only
- π¨ Color-Coded Metrics: Easy identification of different health metrics
The analysis shows:
- Sleep Score: Average sleep quality by day of week (0-100 scale)
- Body Battery Max: Peak energy level by day of week (0-100 scale)
- Body Battery Min: Lowest energy level by day of week (0-100 scale)
- Water Intake: Daily hydration by day of week (ml)
DAY-OF-WEEK AVERAGES SUMMARY
============================================================
Sleep Score:
----------------------------------------
Monday: 61.8 Β± 18.2 (n=62)
Tuesday: 62.1 Β± 15.8 (n=58)
Wednesday: 61.3 Β± 15.7 (n=61)
Thursday: 61.1 Β± 17.8 (n=50)
Friday: 59.5 Β± 20.1 (n=53)
Saturday: 60.9 Β± 19.0 (n=61)
Sunday: 60.4 Β± 20.5 (n=57)
Best day: Tuesday (62.1)
Worst day: Friday (59.5)
Difference: 2.6
- π Sleep Optimization: Identify which days you sleep best and adjust your routine
- β‘ Energy Management: Find patterns in your body battery to optimize activity timing
- π§ Hydration Tracking: Monitor water intake patterns (if tracked by your device)
- π Weekly Planning: Use insights to plan your week for optimal health
- π Pattern Recognition: Spot trends that might not be obvious in daily data
The analysis creates several visualization files in the plots/ directory:
*_day_of_week_sleep_score.png- Sleep score by day of week*_day_of_week_body_battery_max.png- Peak body battery by day of week*_day_of_week_body_battery_min.png- Minimum body battery by day of week*_day_of_week_water_intake.png- Water intake by day of week*_day_of_week_combined.png- All metrics comparison chart
- Sleep Score: Requires data in the
sleeptable withscorecolumn - Body Battery: Requires data in the
daily_summarytable withbb_maxandbb_mincolumns - Water Intake: Requires data in the
daily_summarytable withhydration_intakecolumn
# Run day-of-week analysis tests
poetry run pytest tests/test_day_of_week_analysis.py -v
# Run dashboard integration tests
poetry run pytest tests/test_dashboard_integration.py -vNEW FEATURE! Analyze your stress patterns throughout the day to identify peak stress times, low-stress periods, and patterns by day of week.
# Run full stress analysis with all visualizations
poetry run python -m garmin_analysis.cli_time_of_day_stress
# Run with verbose output
poetry run python -m garmin_analysis.cli_time_of_day_stress --verbose
# Show plots interactively (instead of saving)
poetry run python -m garmin_analysis.cli_time_of_day_stress --show-plots
# Skip weekday analysis (faster for large datasets)
poetry run python -m garmin_analysis.cli_time_of_day_stress --no-weekday-analysis
# Use custom database path
poetry run python -m garmin_analysis.cli_time_of_day_stress --db-path /path/to/garmin.dbThe stress analysis is also available in the interactive dashboard:
# Launch the dashboard
poetry run python run_dashboard.pyThen navigate to the "π° Stress by Time of Day" tab to:
- View hourly stress patterns with confidence intervals
- See color-coded stress distribution by hour
- Explore interactive heatmaps showing stress by day of week and hour
- Toggle weekday breakdown on/off
- π Hourly Patterns: Average stress levels for each hour of the day (0-23)
- π Interactive Visualizations: Line charts, bar charts, and heatmaps
- ποΈ Day-of-Week Breakdown: See how stress patterns vary across the week
- π Confidence Intervals: Statistical confidence bands (95% CI) on line charts
- π¨ Color-Coded Insights: Green (low), orange (medium), red (high stress)
- π Time Period Analysis: Automatic grouping into morning, afternoon, evening, night
The analysis provides:
- Hourly Averages: Mean stress level for each hour with standard deviation
- Peak Stress Times: The 5 hours with highest average stress
- Low Stress Times: The 5 hours with lowest average stress
- Time Period Breakdown:
- Morning (06:00-11:59)
- Afternoon (12:00-17:59)
- Evening (18:00-22:59)
- Night (23:00-05:59)
- Weekday Patterns: How stress varies by day of week at different times
======================================================================
STRESS ANALYSIS BY TIME OF DAY
======================================================================
π Overall Stress Statistics:
----------------------------------------------------------------------
Total measurements: 1,003,864
Overall mean stress: 42.3
Overall std dev: 18.7
β¬οΈ Peak Stress Times:
----------------------------------------------------------------------
14:00 - 15:00: 52.3 Β± 17.2 (n=42,156)
15:00 - 16:00: 51.8 Β± 17.5 (n=42,089)
13:00 - 14:00: 51.2 Β± 17.1 (n=42,201)
16:00 - 17:00: 50.9 Β± 17.8 (n=41,987)
12:00 - 13:00: 50.1 Β± 17.3 (n=42,034)
β¬οΈ Low Stress Times:
----------------------------------------------------------------------
03:00 - 04:00: 28.5 Β± 12.3 (n=41,234)
04:00 - 05:00: 28.9 Β± 12.5 (n=41,156)
02:00 - 03:00: 29.2 Β± 12.7 (n=41,298)
05:00 - 06:00: 30.1 Β± 13.1 (n=41,087)
01:00 - 02:00: 30.8 Β± 13.4 (n=41,267)
π Time Period Analysis:
----------------------------------------------------------------------
Morning (06:00-11:59): 38.2 Β± 15.4
Afternoon (12:00-17:59): 51.5 Β± 17.6
Evening (18:00-22:59): 45.3 Β± 16.8
Night (23:00-05:59): 29.7 Β± 12.9
The analysis creates visualization files in the plots/ directory:
*_stress_by_hour.png- Hourly stress with confidence interval*_stress_by_hour_bars.png- Color-coded bar chart by hour*_stress_heatmap_weekday_hour.png- Heatmap of stress by day/hour*_stress_by_weekday_hour.png- Line chart comparison by day of week
- β° Schedule Optimization: Plan important tasks during your low-stress periods
- π§ Stress Management: Identify when to take breaks or practice relaxation
- π€ Sleep Insights: See how nighttime stress affects your rest
- π Weekly Planning: Find which days/times are most stressful
- π Pattern Discovery: Uncover stress triggers you weren't aware of
- βοΈ Work-Life Balance: Compare weekday vs weekend stress patterns
- Stress Data: Requires minute-by-minute stress measurements in the
stresstable ofgarmin.db - Continuous Monitoring: Best results with devices that track stress 24/7
- Data Volume: Analysis works with any amount of data but more data provides better insights
# Run time-of-day stress analysis tests
poetry run pytest tests/test_time_of_day_stress_analysis.py -v
# Run integration test with real database
poetry run pytest tests/test_time_of_day_stress_analysis.py::test_real_database_integration -vNEW FEATURES! Visualize your activity patterns with a beautiful calendar view and customize how unknown activity types are displayed.
Create calendar-style visualizations showing your daily activities with different colors for each activity type:
# Create calendar for all available data
poetry run python -m garmin_analysis.viz.cli_activity_calendar
# Create calendar for last 6 months
poetry run python -m garmin_analysis.viz.cli_activity_calendar --months 6
# Create calendar for specific date range
poetry run python -m garmin_analysis.viz.cli_activity_calendar --start-date 2024-01-01 --end-date 2024-12-31
# Create calendar with custom figure size
poetry run python -m garmin_analysis.viz.cli_activity_calendar --figsize 20 15
# Create calendar without activity type mappings (raw names)
poetry run python -m garmin_analysis.viz.cli_activity_calendar --no-mappingsCustomize how unknown or poorly named activity types are displayed:
# Check for unmapped activity types
poetry run python -m garmin_analysis.viz.cli_activity_calendar --suggest-mappings
# Use custom mappings configuration file
poetry run python -m garmin_analysis.viz.cli_activity_calendar --mappings-config my_mappings.json- π¨ Color-coded activities: Each activity type gets a distinct color
- π Calendar grid layout: Shows days in a proper weekly calendar format
- π Multiple activities handling: Darker colors for days with multiple activities
- π Activity statistics: Summary of activity patterns and frequencies
- π·οΈ Custom mappings: Map unknown activity types to meaningful names
- βοΈ Configurable: Customize colors, date ranges, and display options
The system automatically maps unknown activity types to more meaningful names. For example:
UnknownEnumValue_67β"Training Assessment"(automatic fitness assessments)genericβ"General Activity"(unspecified activities)
Edit config/activity_type_mappings.json to customize mappings:
{
"unknown_activity_mappings": {
"UnknownEnumValue_67": {
"display_name": "Training Assessment",
"description": "Automatic fitness assessments and recovery measurements",
"category": "assessment",
"color": "#9B59B6"
}
}
}from garmin_analysis.utils.activity_mappings import add_activity_mapping
add_activity_mapping(
activity_type="UnknownEnumValue_68",
display_name="Recovery Check",
description="Automatic recovery measurements",
category="assessment",
color="#3498DB"
)The activity calendar generates:
- Calendar grid with days colored by activity type
- Legend showing all activity types with their colors
- Summary statistics in logs showing activity frequency
- High-resolution PNG saved to the
plots/directory
- πββοΈ Activity Pattern Analysis: See when you're most active throughout the year
- π― Goal Tracking: Visualize consistency in your workout routines
- π Trend Identification: Spot seasonal patterns in your activities
- π Data Quality: Identify gaps in your activity data
- π Progress Monitoring: Track improvement in activity consistency
Analyze how heart rate metrics and physical activities affect sleep quality:
# Run the sleep analysis model
poetry run python -m garmin_analysis.modeling.hr_activity_sleep_modelProgrammatic usage:
from garmin_analysis.modeling.hr_activity_sleep_model import HRActivitySleepModel
model = HRActivitySleepModel()
results = model.run_analysis(
use_lag_features=True, # Include yesterday's metrics
imputation_strategy='median' # Robust to outliers (recommended)
)
# Results include:
# - Best model and performance metrics
# - Top features affecting sleep
# - Visualizations (4 plots)
# - Detailed text reportRun all modeling analyses in one command:
# Run full pipeline
poetry run python -m garmin_analysis.modeling.comprehensive_modeling_pipeline
# With 24-hour coverage filtering
poetry run python -m garmin_analysis.modeling.comprehensive_modeling_pipeline --filter-24h-coverageAll modules support flexible imputation strategies:
# Enhanced anomaly detection
poetry run python -m garmin_analysis.modeling.enhanced_anomaly_detection
# Advanced clustering analysis
poetry run python -m garmin_analysis.modeling.enhanced_clustering
# Predictive modeling
poetry run python -m garmin_analysis.modeling.predictive_modeling
# Activity-sleep-stress correlation
poetry run python -m garmin_analysis.modeling.activity_sleep_stress_analysis
# Basic clustering
poetry run python -m garmin_analysis.modeling.clustering_behavior
# Basic anomaly detection
poetry run python -m garmin_analysis.modeling.anomaly_detectionAll modeling modules support 6 imputation strategies for handling missing data:
# Median imputation (default, robust to outliers - RECOMMENDED)
predictor.prepare_features(df, imputation_strategy='median')
# Mean imputation
predictor.prepare_features(df, imputation_strategy='mean')
# Drop rows with missing values
predictor.prepare_features(df, imputation_strategy='drop')
# Forward fill
predictor.prepare_features(df, imputation_strategy='forward_fill')
# Backward fill
predictor.prepare_features(df, imputation_strategy='backward_fill')
# No imputation
predictor.prepare_features(df, imputation_strategy='none')See docs/imputation_strategies.md for detailed guidance.
Run all analytics and generate comprehensive reports:
# Full analytics report
poetry run python -m garmin_analysis.reporting.run_all_analytics
# With 24-hour coverage filtering
poetry run python -m garmin_analysis.reporting.run_all_analytics --filter-24h-coverageGenerate statistical trend summaries:
# Generate trend summary
poetry run python -m garmin_analysis.reporting.generate_trend_summary
# With 24-hour coverage filtering
poetry run python -m garmin_analysis.reporting.generate_trend_summary --filter-24h-coverageReports are saved to the reports/ directory.
- Quick check (summary, completeness, feature suitability):
poetry run python -m garmin_analysis.features.quick_data_check # full quick check
poetry run python -m garmin_analysis.features.quick_data_check --summary
poetry run python -m garmin_analysis.features.quick_data_check --completeness
poetry run python -m garmin_analysis.features.quick_data_check --features
poetry run python -m garmin_analysis.features.quick_data_check --continuous-24h- Comprehensive audit with reports (JSON + Markdown in
data_quality_reports/):
poetry run python -m garmin_analysis.features.data_quality_analysis- Additional data quality tools:
# Check for missing data patterns
poetry run python -m garmin_analysis.features.check_missing_data
# Generate comprehensive coverage analysis
poetry run python -m garmin_analysis.features.coverageThis platform is designed for comprehensive Garmin health data analysis:
- Track daily activity trends and patterns
- Identify optimal workout timing and intensity
- Monitor sleep quality and recovery metrics
- Analyze stress levels and their impact on performance
- Conduct longitudinal health studies
- Analyze correlations between different health metrics
- Detect anomalies in health patterns
- Generate comprehensive health reports
- Apply machine learning to health data
- Build predictive models for health outcomes
- Perform clustering analysis to identify health patterns
- Create custom visualizations and reports
- Monitor patient health trends over time
- Identify potential health issues through anomaly detection
- Generate patient health summaries
- Track treatment effectiveness
- π Sleep Quality Analysis: NEW! Analyze how HR and activities affect sleep with ML models
- π§ Flexible Data Imputation: NEW! 6 strategies for handling missing values (prevents data loss)
- π Time Series Analysis: Comprehensive trend analysis with configurable time windows
- π€ Machine Learning: Multiple algorithms for anomaly detection, clustering, and prediction
- π Interactive Visualization: Real-time dashboard with filtering capabilities
- π Activity Calendar: Calendar-style visualization of activity patterns with color coding
- π·οΈ Activity Type Mapping: Customize display names and colors for unknown activity types
- π Data Quality Assurance: Advanced tools for data validation and quality assessment
- π Automated Reporting: Generate comprehensive health reports automatically
- β‘ Performance Optimization: 24-hour coverage filtering for faster, more reliable analysis
- π§ͺ Comprehensive Testing: 435 tests with full coverage (unit and integration)
- π Interactive Analysis: Jupyter notebooks for exploratory data analysis
Many analysis tools now support filtering to only days with complete 24-hour continuous data coverage. This is useful for:
- More reliable analysis: Ensures data completeness for time-series analysis
- Better model training: Reduces noise from incomplete data days
- Consistent comparisons: Enables fair comparison across different time periods
The system analyzes the stress timeseries data to identify days where:
- Data coverage starts within 2 minutes of midnight
- Data coverage ends within 2 minutes of midnight
- No gap between consecutive samples exceeds 2 minutes
| Tool | Command | Description |
|---|---|---|
| Plot Generation | plot_trends_range --filter-24h-coverage |
Generate trend plots with filtered data |
| Trend Summary | generate_trend_summary --filter-24h-coverage |
Create summary reports with filtered data |
| Full Analytics | run_all_analytics --filter-24h-coverage |
Run comprehensive analytics with filtering |
| Modeling Pipeline | comprehensive_modeling_pipeline --filter-24h-coverage |
Complete ML pipeline with filtered data |
| Interactive Dashboard | Checkbox in UI | Real-time filtering in web interface |
| Category | Tool | Command | Description |
|---|---|---|---|
| Data Ingestion | Load All DBs | load_all_garmin_dbs |
Merge all Garmin databases into unified dataset |
| Prepare Dataset | prepare_modeling_dataset |
Clean data for machine learning | |
| Schema Inspector | inspect_sqlite_schema |
Inspect and validate database schemas | |
| Visualization | Trend Plots | plot_trends_range |
Generate comprehensive trend visualizations |
| Correlation Matrix | plot_feature_correlation |
Create feature correlation heatmaps | |
| Feature Trends | plot_feature_trend |
Plot individual feature trends over time | |
| Activity Calendar | cli_activity_calendar |
Create calendar view of activity patterns | |
| Day-of-Week Analysis | cli_day_of_week |
NEW! Analyze sleep, body battery, water intake by day of week | |
| Summary Stats | summary_stats |
Generate statistical summaries | |
| Modeling | Full Pipeline | comprehensive_modeling_pipeline |
Complete ML analysis pipeline |
| Anomaly Detection | enhanced_anomaly_detection |
Advanced anomaly detection algorithms | |
| Clustering | enhanced_clustering |
Multiple clustering algorithms | |
| Predictive Modeling | predictive_modeling |
Health outcome prediction models | |
| Activity Analysis | activity_sleep_stress_analysis |
Correlation analysis between metrics | |
| Data Quality | Quick Check | quick_data_check |
Fast data quality assessment |
| Comprehensive Audit | data_quality_analysis |
Detailed data quality reports | |
| Missing Data | check_missing_data |
Analyze missing data patterns | |
| Coverage Analysis | coverage |
24-hour coverage assessment | |
| Reporting | Full Analytics | run_all_analytics |
Comprehensive analytics reports |
| Trend Summary | generate_trend_summary |
Statistical trend summaries | |
| Dashboard | Web Interface | dashboard.app |
Interactive web dashboard with day-of-week analysis |
| Testing | Unit Tests | pytest -m "not integration" |
Fast unit tests |
| Integration Tests | pytest -m integration |
Full integration tests | |
| All Tests | pytest |
Complete test suite |
# Basic usage - filter to high-quality data only
poetry run python -m garmin_analysis.viz.plot_trends_range --filter-24h-coverage
poetry run python -m garmin_analysis.reporting.generate_trend_summary --filter-24h-coverage
# Advanced usage - customize coverage parameters
poetry run python -m garmin_analysis.viz.plot_trends_range --filter-24h-coverage --max-gap 5 --day-edge-tolerance 5 --coverage-allowance-minutes 60
# Full pipeline with filtering
poetry run python -m garmin_analysis.modeling.comprehensive_modeling_pipeline --filter-24h-coverage --target-col score
# Monthly reports with filtering
poetry run python -m garmin_analysis.reporting.run_all_analytics --filter-24h-coverage --monthly --coverage-allowance-minutes 120In the interactive dashboard, you can toggle the "Only days with 24-hour continuous coverage" checkbox to filter trend plots and analysis views. Use the adjacent "Max gap (minutes)" input to set the maximum allowed gap between samples (default 2). The filtering is applied in real-time and plot titles will indicate when filtering is active.
--filter-24h-coverage: Enable 24-hour coverage filtering--max-gap: Maximum allowed gap between consecutive samples (default: 2 minutes)--day-edge-tolerance: Allowed tolerance at day start/end (default: 2 minutes)--coverage-allowance-minutes: Total allowed missing minutes within a day (0β300, default: 0). This allowance applies to the sum of: (a) all internal gaps that exceed--max-gapand (b) late starts/early ends beyond--day-edge-toleranceat the day's edges. If the cumulative missing time is within the allowance, the day qualifies even if individual gaps exceed--max-gap.
Dashboard-specific:
- "Max gap (minutes)": Same as
--max-gap, adjustable per-tab in the UI - "Coverage allowance (minutes)": Same as
--coverage-allowance-minutes
Check which days have 24-hour coverage:
poetry run python -m garmin_analysis.features.quick_data_check --continuous-24hUsing 24-hour coverage filtering can also improve performance:
- β‘ Faster Processing: Fewer data points mean faster analysis
- πΎ Lower Memory Usage: Reduced dataset size for large-scale analysis
- π― Focused Results: More relevant insights from high-quality data
- π Better Visualizations: Cleaner plots without gaps or missing data artifacts
# Check how many days you have total
poetry run python -m garmin_analysis.features.quick_data_check --summary
# Check how many days have 24h coverage
poetry run python -m garmin_analysis.features.quick_data_check --continuous-24h
# Compare results with and without filtering
poetry run python -m garmin_analysis.viz.plot_trends_range # All data
poetry run python -m garmin_analysis.viz.plot_trends_range --filter-24h-coverage # Filtered dataNo qualifying days found?
- Check if you have stress data:
poetry run python -m garmin_analysis.features.quick_data_check --continuous-24h - Try relaxing the parameters:
--max-gap 10 --day-edge-tolerance 10 - Ensure your Garmin device was worn continuously during the day
Filtering too strict?
- Increase gap tolerance:
--max-gap 5(default: 2 minutes) - Increase edge tolerance:
--day-edge-tolerance 5(default: 2 minutes)
Want to see what's being filtered?
- Run without filtering first to see all data
- Use
--continuous-24hto see which specific days qualify - Compare results side-by-side
# Inspect one DB
poetry run python -m garmin_analysis.data_ingestion.inspect_sqlite_schema db/garmin.db
# Inspect directory of DBs (default: db)
poetry run python -m garmin_analysis.data_ingestion.inspect_sqlite_schema --dir db
# Export expected schema
poetry run python -m garmin_analysis.data_ingestion.inspect_sqlite_schema export db/garmin.db reports/expected_schema.json
# Compare live DB vs expected (nonβzero exit on drift)
poetry run python -m garmin_analysis.data_ingestion.inspect_sqlite_schema compare db/garmin.db reports/expected_schema.json --fail-on-driftThe project has a comprehensive test suite with 435 tests across 29 test modules covering unit and integration scenarios.
# Run all tests (recommended for CI/CD)
poetry run pytest
# Run with verbose output
poetry run pytest -v
# Run with quiet mode
poetry run pytest -qUnit Tests (fast, in-memory DB fixtures):
poetry run pytest -m "not integration"Integration Tests (file-backed temp DBs, tests real I/O):
poetry run pytest -m integration# Coverage filtering tests
poetry run pytest tests/test_coverage_filtering.py -v
# Data quality tests
poetry run pytest tests/test_data_quality.py -v
# Dashboard tests
poetry run pytest tests/test_dashboard_dependencies.py -v
poetry run pytest tests/test_dashboard_integration.py -v
# Modeling tests
poetry run pytest tests/test_hr_activity_sleep_model.py -v
poetry run pytest tests/test_imputation.py -v
# Day-of-week analysis tests
poetry run pytest tests/test_day_of_week_analysis.py -v- 435 total tests across 29 test modules
- 42 tests for HR & Activity Sleep Model
- 32 tests for imputation strategies
- Full coverage of unit and integration scenarios
- Tests use in-memory SQLite for speed and file-backed DBs for integration testing
Interactive analysis notebooks are available in the notebooks/ directory:
analysis.ipynb- Comprehensive data analysishr_daily.ipynb- Heart rate daily analysis
To use notebooks:
# Start Jupyter Lab
poetry run jupyter lab
# Or start Jupyter Notebook
poetry run jupyter notebooksrc/garmin_analysis/
βββ dashboard/ # Interactive Dash web application
βββ data_ingestion/ # Database loading and CSV generation
βββ features/ # Data quality and feature engineering
βββ modeling/ # Machine learning models
βββ reporting/ # Automated report generation
βββ viz/ # Visualization tools
βββ utils/ # Utility modules
β βββ data_loading.py # Load data from DB/CSV
β βββ data_processing.py # Transform and clean data
β βββ data_filtering.py # Date filters and feature prep
β βββ imputation.py # Missing value strategies
β βββ activity_mappings.py # Activity type customization
βββ config/ # Configuration files
When to use each:
| Module | Use For | Example |
|---|---|---|
utils.data_loading |
Loading master dataframe, Garmin tables | load_master_dataframe() |
utils.data_processing |
Date normalization, time conversions | normalize_day_column() |
utils.data_filtering |
Date ranges, feature filtering | filter_by_date() |
utils.imputation |
Handling missing values | impute_missing_values() |
config/activity_type_mappings.json- Customize activity names and colorslogging_config.py- Centralized logging setup
garmin-analysis/
βββ src/garmin_analysis/ # Main package
β βββ dashboard/ # Interactive web dashboard
β βββ data_ingestion/ # Data loading and preparation
β βββ features/ # Data quality and feature analysis
β βββ modeling/ # Machine learning algorithms
β β βββ hr_activity_sleep_model.py # NEW! HR & Activity β Sleep analysis
β β βββ predictive_modeling.py # General predictive models (with imputation)
β β βββ enhanced_clustering.py # Clustering algorithms (with imputation)
β β βββ enhanced_anomaly_detection.py # Anomaly detection (with imputation)
β βββ reporting/ # Automated report generation
β βββ utils/ # Utility modules
β β βββ data_loading.py # Database and file loading
β β βββ data_processing.py # Data transformation and cleaning
β β βββ data_filtering.py # Filtering and feature preparation
β β βββ imputation.py # Missing value handling strategies
β β βββ activity_mappings.py # Activity type customization
β βββ viz/ # Visualization tools
β βββ utils_cleaning.py # Data cleaning utilities
βββ config/ # Configuration files
β βββ activity_type_mappings.json # Activity type mappings
βββ docs/ # Documentation
β βββ imputation_strategies.md # NEW! Imputation guide
β βββ missing_value_analysis.md # NEW! Repository analysis
β βββ imputation_migration_guide.md # NEW! Migration guide
β βββ IMPUTATION_QUICK_REFERENCE.md # NEW! Quick reference
β βββ activity_type_mappings.md # Activity mapping documentation
βββ examples/ # Example scripts
β βββ activity_calendar_example.py # Activity calendar example
βββ run_dashboard.py # Convenient dashboard launcher script
βββ tests/ # Test suite (435 tests total)
β βββ test_imputation.py # NEW! Imputation utility tests (32 tests)
β βββ test_hr_activity_sleep_model.py # NEW! Sleep model tests (42 tests)
β βββ ... # Other test files (26 modules)
βββ notebooks/ # Jupyter notebooks
βββ data/ # Generated datasets
βββ plots/ # Generated plots
βββ reports/ # Generated reports
βββ modeling_results/ # ML model outputs
β βββ plots/ # Model visualizations
β βββ reports/ # Model analysis reports
βββ db/ # Garmin database files
Dependencies are managed via Poetry in pyproject.toml.
Installation:
poetry installFor non-Poetry users, generate requirements.txt:
poetry export -f requirements.txt --output requirements.txt --without-hashes
pip install -r requirements.txtCore Libraries:
- Data: pandas, numpy
- ML: scikit-learn, tsfresh, statsmodels, prophet
- Visualization: matplotlib, seaborn, plotly, dash
- Garmin Integration: GarminDB - For Garmin Connect data export (see Credits)
- Development: pytest, jupyter
See pyproject.toml for version constraints.
mem_db(unit): In-memory SQLite with minimal schema and seed data for pure SQL/transform functions.tmp_db(integration): Temp file-backed SQLite DBs with realistic seeds; test code patchesgarmin_analysis.data_ingestion.load_all_garmin_dbs.DB_PATHSto point to these files.
Notes on data sources:
- If real Garmin DBs are available, place them under
db/(e.g.,db/garmin.db,db/garmin_summary.db,db/garmin_activities.db). - When DBs are missing outside of tests, some commands may generate a synthetic dataset for convenience and log clear WARNINGS. This synthetic data is only for smoke testing and should not be used for real analysis.
This project uses:
- Python 3.11+: Required for compatibility
- Poetry: Dependency management
- pytest: Testing framework with 435 tests
- Black/Flake8: Code formatting (if configured)
See the LICENSE file for details.
This project builds upon and integrates with several excellent open-source projects:
- GarminDB by Tom Goetz - Provides the core functionality for downloading and parsing Garmin Connect data. Licensed under GPL-2.0.
- Python Garmin Connect API - Used by GarminDB for Garmin Connect authentication.
- scikit-learn, pandas, matplotlib and other open-source libraries that make this analysis possible.
Special thanks to the Garmin developer community for their work on reverse-engineering and documenting Garmin's data formats.
- This repo uses the
src/garmin_analysispackage layout. Always run modules viapython -m garmin_analysis.<module>. - Logging is used instead of print statements throughout the codebase.
- Test fixtures use in-memory SQLite (
mem_db) for unit tests and file-backed DBs (tmp_db) for integration tests. - If real Garmin databases are unavailable, some commands may generate synthetic data for smoke testing (with warnings).