This project implements a time series forecasting system for predicting fresh vegetable and fruit prices 1 to 4 weeks ahead, incorporating weather data and supporting automated retraining.
Two primary model versions are available as Docker tags:
v2.0(Recommended, XGBoost): Utilizes XGBoost models, demonstrating superior accuracy on the evaluation dataset.v1.0(LightGBM): Utilizes LightGBM models.
Following the legacy of Cornelius Greenvale, the Market King, this system implements the "Freezer Gambit" strategy envisioned by his son Magnus. It uses historical weather patterns and commodity prices to forecast future prices, enabling better decisions on when to sell fresh produce and when to utilize the AgroChill cold storage network.
The core task is to predict weekly fresh prices across various economic centers, one month (4 weeks) ahead. The system includes an API for retrieving forecasts and ingesting new data, and it automatically retrains itself to adapt to new information using the model specified by the Docker tag (XGBoost for v2.0, LightGBM for v1.0).
- Multi-Horizon Forecasting: Predicts prices 1, 2, 3, and 4 weeks into the future.
- Data-Driven: Uses historical weather and price data.
- Machine Learning Models:
v2.0: XGBoost tuned with Optuna.v1.0: LightGBM tuned with Optuna.
- Feature Engineering: Creates time-based, lag, and rolling window features.
- RESTful API (FastAPI):
GET /api/status: Check API status, active model type, and retraining info.POST /api/predict: Get price forecasts for the next 4 weeks using the loaded model (implements rolling forecast).POST /api/data/weather: Submit new weather data records.POST /api/data/prices: Submit new price data records.POST /api/retrain: Manually trigger the background retraining process for the loaded model type.
- Automated Retraining: Uses APScheduler to automatically retrain models periodically (default: 24h) incorporating newly submitted data.
- Data Persistence: Incoming data is appended to CSV files (requires volume mounting).
- Dockerized: Containerized for deployment and reproducibility.
- Language: Python 3.9+
- API Framework: FastAPI
- Data Handling: Pandas, NumPy
- Machine Learning: Scikit-learn, XGBoost, LightGBM
- Hyperparameter Optimization: Optuna
- Scheduling: APScheduler
- Serialization: Joblib
- Containerization: Docker
- Concurrency/Locking: asyncio, python-filelock
- Timezone: pytz
/time-series-forecasting
│
├── deployment/ # Core application and build files
│ ├── Dockerfile # Instructions to build the Docker image
│ ├── requirements.txt # Python dependencies for the XGBoost version
│ └── main.py # FastAPI app with XGBoost, retraining, API logic
│
├── data/ # datasets
│ ├── weather_train_data.csv
│ ├── price_train_data.csv
│ ├── weather_val_data.csv
│ └── price_val_data.csv
│
├── saved_models_xgb/ # Trained models
│
├── image_name.txt # Contains Docker Hub URI
├── Documentation.pdf
├── Presentation/
└── README.md
# Note: Trained model (.joblib) files in saved_models_xgb/
# Those are also included INSIDE the Docker image referenced in image_name.txt.
- Python (3.9+ recommended)
- pip
- Docker Engine / Docker Desktop
- Git (for cloning the repository if needed for local dev)
Note: Local development runs main.py directly. The primary method for evaluation is intended to be via the provided Docker image.
- Clone:
git clone https://github.com/FouetteBytes/time-series-forecasting.git && cd time-series-forecasting - Venv:
python -m venv venv && .\venv\Scripts\activate(Win) orsource venv/bin/activate(Mac/Linux) - Install:
pip install -r deployment/requirements.txt(Install from the requirements inside deployment) - Models: Ensure the
saved_models_xgbdirectory containing pre-trained.joblibfiles exists in the project root. - Data: Ensure the original
.csvfiles are in thedatadirectory.
This uses the pre-built image from Docker Hub, which includes the application and trained models.
-
Pull the recommended image:
docker pull melkor1/agrochill-app:v2.0
-
Run the container:
# Create a local 'data' directory first if it doesn't exist: mkdir data # Place original competition CSVs inside this local 'data' directory. # Then run: docker run -d --name agrochill-container \ -p 8000:8000 \ -v "$(pwd)/data":/app/data \ melkor1/agrochill-app:v2.0
-p 8000:8000: Required. Maps host:container port. Access API athttp://localhost:8000.-v "$(pwd)/data":/app/data: Required. Mounts your localdatafolder (containing original CSVs) into the container at/app/data. Incoming data will also be saved here. (Adjust$(pwd)for your OS if needed:%CD%for Win CMD,${PWD}for PowerShell).
-
Access API Docs:
http://localhost:8000/docs -
Check Logs:
docker logs agrochill-container -
Stop Container:
docker stop agrochill-container
Runs the code from your local deployment folder. Useful for debugging. Assumes models are in ../saved_models_xgb relative to main.py.
# Navigate to the 'deployment' folder first
cd deployment
# Run uvicorn pointing to main:app
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Make sure the ../saved_models_xgb and ../data paths are correct relative to execution
cd .. # Go back to project root when doneAccess at http://localhost:8000/docs. (Note: Path resolution for models/data might need adjustment in main.py if running locally this way vs running from project root).
See the interactive API documentation (Swagger UI) at /docs when running. Key endpoints include /api/status, /api/predict, /api/data/weather, /api/data/prices, and /api/retrain.
- Configured for automatic retraining every 24 hours using XGBoost models.
- Loads original + incoming data from the mounted
/app/datavolume. - Re-tunes and re-trains models, updating the live models in memory upon success. Check status via
GET /api/status. - Can be triggered manually via
POST /api/retrain.
-
v2.0 (XGBoost):
- 1w: ~106.6
- 2w: ~120.6
- 3w: ~123.8
- 4w: ~139.8
-
v1.0 (LightGBM):
- 1w: ~136.7
- 2w: ~126.9
- 3w: ~151.0
- 4w: ~162.0
(XGBoost demonstrated significantly lower RMSE than LightGBM (v1.0) in testing).