Skip to content

giovadg/data-science-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Overview

This repository contains two quantitative-data science projects:

  • USA corn yield prediction

  • QRT datachallenge on electricity price forecast (ongoing)

USA corn yield prediction

Build a transparent, reproducible statistical model to predict U.S. national corn yield (bu/acre) for 2024, using historical USDA yield data and daily county-level weather data. The project emphasizes feature design, out-of-sample validation, and uncertainty quantification, rather than black-box optimization

Data

Yield data : Automatic download of data through USDA QuickStats (county, state, national; annual). Weather data: Daily county-level historical observations (provided in .parquet -not available in the repo-).

All preprocessing, aggregation, and validation steps are fully documented in the notebook.

Methodology

  1. Data Validation:
    Unit consistency checks. Missing-value diagnostics. Temporal and spatial alignment between yield and weather data. Aggregation from county → national level using production-weighted schemes.

  2. Feature Engineering
    Weather features are constructed to reflect agronomic stress mechanisms, including:
    Temperature-based degree day measures.
    Extreme heat indicators during critical phenological windows.
    Precipitation totals and deficits.
    Seasonal aggregations aligned with planting–pollination–grain fill phases.
    Feature choices are explicitly justified and constrained to avoid leakage.\

  3. Modeling:
    Regression-based statistical models with regularization.
    National yield modeled as an aggregation of weather-driven signals.
    Emphasis on interpretability and stability over marginal accuracy gains.

  4. Evaluation:
    Rolling and holdout validation on recent years.
    Performance metrics reported on genuinely out-of-sample periods.
    Error decomposition and robustness checks.

  5. Uncertainty Estimation:
    Predictive uncertainty derived from residual distributions and model variance.\

    Final output reported as point estimate + uncertainty band.

Results

2024 national corn yield prediction: reported with confidence interval.

Model performance benchmarks against historical variability.

Sensitivity analysis highlighting dominant weather drivers.

QRT datachallenge on electricity price

description..