This project is aimed at learning time-series patterns in IoT device usage and predicting future activity patterns.
The current FastAPI implementation accepts data in CSV format exported from Home Assistant and outputs YAML code ready to be saved into automations.yaml in Home Assistant OS.
A small FastAPI service that predicts daily on/off events for devices from historical CSV usage data and generates Home Assistant automation YAML as output.
This service is part of the GhostAI project and provides a single POST endpoint (/predict/) that accepts a historical CSV, a date range, and a device id, and returns predicted on/off events formatted as Home Assistant automation YAML (packaged in a JSON response and exposed with an attachment header for download).
The original Home Occupancy Simulation Using Machine Learning paper implemented predictions through an XGBoost classifier for time intervals every n minutes.
This version approaches the problem differently, employing Kernel Density Estimation (KDE) to extract times of activity with highest probability and generate automations accordingly.
- main.py
- FastAPI application with the
/predict/endpoint. - Uploads the CSV to a temporary file, invokes preprocessing, runs prediction logic over each day in the requested date range, and generates YAML automations.
- FastAPI application with the
- Preprocessing.py
- Contains data-loading and cleaning utilities used to convert the raw CSV into the format expected by the prediction model.
- KDE_Model.py
- Implements a kernel density estimation–based algorithm to infer likely on/off minute ranges (including a fallback mechanism).
- Yaml_Generator.py
- Builds Home Assistant automation YAML from the predicted on/off timestamps.
Prerequisites
- Python 3.8+
- pip
Install dependencies:
pip install fastapi uvicorn pandas pyyaml numpy scikit-learnRun the service:
uvicorn main:app --reload --host 0.0.0.0 --port 8000API available at:
http://127.0.0.1:8000
Description:
Accepts historical CSV data and returns predicted on/off events as YAML automations.
Query Parameters
start_date— YYYY-MM-DD (required)end_date— YYYY-MM-DD (required)device_id— device identifier (required)
Form Data
file— CSV file upload (required)
Example:
curl -X POST "http://127.0.0.1:8000/predict/?start_date=2025-12-01&end_date=2025-12-07&device_id=light.living_room" -F "file=@history.csv" -H "accept: application/json"Required / typical fields:
timestamp— ISO 8601 datetimeState— "on" or "off"device_id(optional)
main.pysaves uploaded CSV and calls preprocessing.- For each day in the selected range:
- Extracts temporal features.
- Uses KDE to infer likely minute ranges for on/off transitions.
- Output is passed to
Yaml_Generatorto produce automation-ready YAML.
Adjustable parameters include:
day_weight,month_weight- KDE
bandwidth percentile
- Error responses include a traceback.
- Use small CSV samples for debugging.
- Add logging in preprocessing, KDE, and YAML generation modules.
Contributions are welcome, including:
- CSV parsing improvements
- Prediction model adjustments
- YAML structure customization
- Home Assistant integration extensions
If you find this work helpful in your research, please cite:
APA citation:
Al-Shami, H. A. (2024). Home Occupancy Simulation Using Machine Learning. In K. Arai (Ed.), Proceedings of the Future Technologies Conference (FTC) 2024, Volume 1 (Lecture Notes in Networks and Systems, Vol. 1154). Springer, Cham. https://doi.org/10.1007/978-3-031-73110-5_33
BibTeX:
@inproceedings{alshami2024home,
author = {Al{-}Shami, H. A.},
title = {Home Occupancy Simulation Using Machine Learning},
booktitle = {Proceedings of the Future Technologies Conference (FTC) 2024, Volume 1},
editor = {Arai, K.},
series = {Lecture Notes in Networks and Systems},
volume = {1154},
publisher = {Springer},
address = {Cham},
year = {2024},
doi = {10.1007/978-3-031-73110-5_33}
}This project is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.