An academic-grade, research-oriented Spatial Digital Twin built to model, predict, and visualize district-level rice yields (Aus, Aman, and Boro seasons) across Bangladesh. The platform integrates real-world subnational agricultural census records, multi-sensor NASA satellite climatology, NOAA oceanic teleconnection indices, and an Ensemble Machine Learning pipeline to project yields up to 2029 with recursive error-calibration loops.
Bangladesh is one of the most climate-vulnerable nations in the world. Rice is the primary staple crop, critical to the food security of over 170 million people. However, yields are constantly threatened by erratic monsoons, winter droughts, flash floods, and global warming.
A Digital Twin is a virtual, real-time replica of a physical system. This platform acts as a digital twin of Bangladesh's agricultural landscape:
- Observes: It maps NASA satellite weather telemetry directly onto all 64 districts.
- Learns: It trains on historical yields (BBS yearbook data) to recognize how weather factors (like rainfall anomalies, warm nights, and soil drought) impact crop growth.
- Corrects & Predicts: It estimates future yields up to 2029, helping policy makers, grain importers, and researchers anticipate regional food deficits, schedule irrigation, and evaluate climate change risks years in advance.
The platform's backend operates on a fully reproducible Python pipeline, processing data from raw APIs to a trained Ensemble ML model.
To prevent tree-split biases and reduce prediction variances, the model is trained as a hybrid Ensemble Voting Regressor combining three robust algorithms:
- XGBoost Regressor: Handles non-linear feature splits and additive gradient residuals.
- Random Forest Regressor: Stabilizes variance through bootstrap aggregation (bagging) of deep trees.
- Gradient Boosting Regressor: Regularizes predictions using shallow additive estimators.
Including district directly as a categorical feature allows the model to capture local soil profiles, baseline irrigation grids, and regional variety changes.
-
Mean Chronological Cross-Validation
$R^2$ :$97.74%$ -
Final Test set
$R^2$ (Years 2022β2023):$98.19%$ -
Test RMSE:
$0.1266$ MT/ha (predictions deviate by less than$\pm 127$ kg per hectare on average).
The model incorporates physical, satellite-derived indicators rather than simple monthly averages:
-
Accumulated Growing Degree Days (GDD): Models plant thermal maturity.
$$\text{GDD} = \sum \max\left( \frac{T_{\max} + T_{\min}}{2} - T_{\text{base}}, 0 \right) \times \text{Days} \quad (T_{\text{base}} = 10^\circ\text{C})$$ -
Diurnal Temperature Range (DTR): Measures nighttime respiration stress (narrower margins combined with warm nights decrease photosynthesis efficiency).
$$\text{DTR} = \text{Mean}(T_{\max} - T_{\min})$$ -
Root-Zone Soil Hydration (
GWETROOT): Derived from NASA's GLDAS 0β100cm percolation telemetry, representing the underground water reservoir at the crop roots (far more stable than rapid surface wetnessGWETTOP). -
Seasonal Water Deficit Index (SWDI): Models regional precipitation supply against evapotranspiration demand.
$$\text{SWDI} = \text{Precipitation} - (1.15 \times \text{Temperature} \times \text{Solar Radiation})$$ -
Monsoon Flood Index: Quantifies excess surface soil saturation during Aus/Aman monsoon seasons.
$$\text{Flood} = \max\left(0,; (\text{GWETTOP} - 0.82) \times 50\right)$$ -
Dry-Season Drought Index: Captures root-zone water stress during Boro dry season.
$$\text{Drought} = \max\left(0,; (0.50 - \text{GWETROOT}) \times 50\right)$$ -
Actual Evapotranspiration (ET): Accumulated seasonal land surface evaporation (
EVLAND) from NASA MERRA-2 reanalysis, measuring actual water loss (mm/season). -
Potential Evapotranspiration (PET): Estimated via the HargreavesβSamani equation using temperature extremes and downward solar irradiance:
$$\text{PET} = 0.0023 \times (T_{\text{mean}} + 17.8) \times \sqrt{T_{\max} - T_{\min}} \times R_a$$ - Oceanic NiΓ±o Index (ONI): Seasonal mean ENSO anomaly (Β°C) from NOAA CPC's NiΓ±o 3.4 SST time series, capturing teleconnection impacts on Bangladesh's monsoon variability.
To correct for systematic drift (such as localized soil degradation or salinity updates), predictions are recursively calibrated using historical prediction errors: $$\hat{y}{t}^{\text{corrected}} = \hat{y}{t}^{\text{base}} + K \cdot (y_{t-1} - \hat{y}{t-1}^{\text{corrected}}) \quad (K = 0.35)$$ For future forecast years (2024β2029) where actual BBS yields $y{t-1}$ are unobserved, the feedback loop carries forward the last observed historical correction offset.
-
Weather Projections: Future weather uses 10-year district climatology medians. To prevent the timeline from looking linear or flat, we inject stochastic climate noise (
$\pm 85%$ of historical seasonal standard deviations). -
Acreage Projections: Cultivated area (
area_ha) is projected using a rolling 3-year historical median with operational land-use fluctuations ($\pm 2%$ variance) to simulate crop rotations.
The project features a decoupled serverless structure. All machine learning predictions are precomputed on a physical twin and exported as flat static JSON payloads. The frontend is built using Next.js and TailwindCSS, served directly from a global CDN cache with zero database overhead.
CropPred/
βββ app/ # Next.js App Router (Frontend)
β βββ explorer/
β β βββ page.js # CSV Registry Explorer Route
β βββ methodology/
β β βββ page.js # Research Methodology & validation Hub
β βββ globals.css # Tailwind styles, scrollbars & visual tokens
β βββ layout.js # HTML headers & SEO optimizations
β βββ page.js # Interactive Twin Dashboard UI
βββ data/
β βββ raw/ # Raw inputs: BBS yearbook, NASA POWER, FAOSTAT, NOAA ONI
β βββ processed/ # Standardized wide feature matrix
βββ model/
β βββ crop_yield_model.joblib # Trained Scikit-Learn/Ensemble Pipeline
βββ public/
β βββ data/ # Decoupled Static JSON Exports
β βββ districts.json # Geolocation centroids for SVG nodes
β βββ fao_national.json # FAO country validation statistics
β βββ bbs_raw.json # Digitized raw BBS yield census
β βββ nasa_raw.json # NASA monthly telemetry records
β βββ yield_data.json # 2015-2029 predictions and weather registry
β βββ summary.json # Feature importances & CV stats
βββ src/ # Python data pipeline
β βββ fetch/
β β βββ fetch_bbs.py # Download historical BBS crop sheets
β β βββ fetch_fao.py # Pull FAOSTAT reference records
β β βββ fetch_nasa.py # Query NASA POWER climate telemetry (incl. EVLAND)
β β βββ fetch_oni.py # Download NOAA CPC Oceanic NiΓ±o Index time series
β βββ model/
β β βββ train_model.py # Train Ensemble Regressor with rolling CV
β β βββ export_frontend_data.py # Compile predictions into static JSON
β βββ process/
β β βββ merge_process.py # Spelling standardization & wide format pivot
β β βββ feature_engineer.py # Compute GDD, DTR, GWETROOT, SWDI, ET, PET, ONI, Flood/Drought
β βββ tests/
β β βββ validate_metrics.py # Pipeline validation suite
β βββ utils/
β βββ coordinates.py # Centroid maps for all 64 districts
βββ package.json # Next.js node modules config
βββ requirements.txt # Python dependencies registry
Ensure you have Python 3.9+ installed. Set up your virtual environment and install dependencies:
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install requirements
pip install -r requirements.txtTo run the pipeline, train the Ensemble models, and compile the frontend JSON payloads:
# 1. Fetch raw datasets (BBS, NASA POWER, FAOSTAT, NOAA ONI)
python3 src/fetch/fetch_bbs.py
python3 src/fetch/fetch_nasa.py
python3 src/fetch/fetch_fao.py
python3 src/fetch/fetch_oni.py
# 2. Pivot monthly telemetry and merge on district-season boundaries
python3 src/process/merge_process.py
# 3. Engineer seasonal agronomic indices (GDD, DTR, SWDI, GWETROOT, ET, PET, ONI)
python3 src/process/feature_engineer.py
# 4. Train the Ensemble Regressor models and export pipeline weights
python3 src/model/train_model.py
# 5. Export precomputed predictions & raw files as static JSON arrays
python3 src/model/export_frontend_data.py
# 6. Run automated schema validations
python3 src/tests/validate_metrics.pyEnsure you have Node.js 18+ installed.
# Install NPM dependencies
npm install
# Start the local development server
npm run devOpen http://localhost:3000 in your browser to interact with the dashboard.
The live site is hosted on GitHub Pages. To support near-instantaneous live page updates, we optimized the GitHub Actions runner pipeline, reducing total deployment times from 1m 2s down to 44 seconds (a
- Shallow Git Clones (
deploy.yml): Restricts checkout history to the latest commit (fetch-depth: 1). - Node Modules Caching (
deploy.yml): Cachesnode_modulestied topackage-lock.json, completely bypassing the slownpm cicommand on cache hits (reduces dependency setup from 25s to 2s). - Uncompressed Pages Artifacts (
deploy.yml): Skips CPU-intensive zip/gzip packaging (compression-level: 0) since the GitHub Pages CDN handles compression on-the-fly. - Skipped Build Checks (
next.config.mjs): Configures Next.js to ignore ESLint and TypeScript checks during the production build step, as these are already verified locally.
- BBS: Subnational agricultural crop yields digitized from the Bangladesh Bureau of Statistics (Ministry of Planning) yearbooks (2015β2023).
- BBS Historical Excel Registries: Raw subnational crop records parsed from official Excel sheets (1995β2014) to compute long-term crop productivity baselines.
- NASA POWER: Climatological temperature, wind, humidity, solar radiation, and GLDAS soil hydration telemetry courtesy of NASA's Prediction of Worldwide Energy Resources project.
- Variables:
T2M,T2M_MAX,T2M_MIN,PRECTOTCORR,RH2M,ALLSKY_SFC_SW_DWN,GWETTOP,GWETROOT,TS,EVLAND(MERRA-2 surface evaporation). - Endpoint:
https://power.larc.nasa.gov/api/temporal/monthly/point
- Variables:
- NOAA CPC Oceanic NiΓ±o Index (ONI): Sea Surface Temperature (SST) anomalies from the NiΓ±o 3.4 region, published by the NOAA Climate Prediction Center. Used to capture El NiΓ±o/La NiΓ±a teleconnection impacts on Bangladesh's monsoon rainfall.
- Source: NOAA CPC ONI Data
- Reference: Huang, B., et al. (2017). Extended Reconstructed Sea Surface Temperature, Version 5 (ERSSTv5). Journal of Climate, 30(20), 8179β8205.
- Division-Level Agroclimatic Dataset: Compiled annual average crop yields and climatic indicators (2000β2024) used to pre-train division-level GBR prior models.
- FAOSTAT: National validation statistics compiled by the Food and Agriculture Organization (FAO) of the United Nations.