Your ML model may be trained on the future. Find out in one command.
Docs · Changelog · Contributing
Timefence finds and fixes temporal data leakage in ML training sets. No infrastructure required — runs locally, reads Parquet/CSV, and finishes in seconds.
If you build training data by joining features to labels, your model may be training on the future. A LEFT JOIN or merge_asof gives each label the latest feature row — including data from after the event you're predicting. The model trains on the future. Offline metrics look great. Production doesn't match. No error, no warning, no way to tell from the output alone.
pip install timefence
timefence quickstart churn-example && cd churn-exampleAudit the training set — Timefence finds 3 leaky features:
timefence audit data/train_LEAKY.parquetRebuild with temporal correctness:
timefence build -o train_CLEAN.parquetVerify the new dataset is clean:
timefence audit train_CLEAN.parquet
# ALL CLEAN — no temporal leakage detectedAlready have a training set? Audit it directly — no config needed:
timefence audit your_data.parquet --features features.py --keys user_id --label-time label_timeSee the Getting Started guide for more.
import timefence
users = timefence.Source(path="data/users.parquet", keys=["user_id"], timestamp="updated_at")
txns = timefence.Source(path="data/txns.parquet", keys=["user_id"], timestamp="created_at")
country = timefence.Feature(source=users, columns=["country"])
spend = timefence.Feature(source=txns, embargo="1d", name="spend_30d", sql="""
SELECT user_id, created_at AS feature_time,
SUM(amount) OVER (PARTITION BY user_id ORDER BY created_at
RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW) AS spend_30d
FROM {source}
""")
labels = timefence.Labels(
path="data/labels.parquet", keys=["user_id"],
label_time="label_time", target=["churned"],
)
result = timefence.build(labels=labels, features=[country, spend], output="train.parquet")
result # renders in JupyterAudit an existing dataset without rebuilding:
report = timefence.audit(
"train.parquet",
features=[country, spend],
keys=["user_id"],
label_time="label_time",
)
report.assert_clean() # raises if leakage foundStop leakage before it reaches production:
- run: pip install timefence && timefence audit data/train.parquet --features features.py --strict--strict exits with code 1 on leakage. Your pipeline fails before a leaky model ever trains.
Built on DuckDB's columnar engine. Median of 3 runs after warmup (Intel i7, 16 GB):
| Scenario | Labels | Features | Build | Audit |
|---|---|---|---|---|
| Small project | 100K | 1 | 0.5s | 0.3s |
| Typical project | 100K | 10 | 1.9s | 1.7s |
| Large project | 1M | 1 | 3.0s | 2.0s |
| Large + many features | 1M | 10 | 12s | 8.5s |
Adding embargo, staleness, and splits costs seconds, not minutes.
Run benchmarks yourself
uv run python benchmarks/bench.py --quick
uv run python benchmarks/bench.py --quick --include-pandas- Define — declare sources, features, and labels in Python or
timefence.yaml - Build — Timefence generates SQL (ASOF JOIN or ROW_NUMBER) and runs it in an embedded DuckDB, enforcing
feature_time < label_time - embargofor every row - Audit — point at any existing dataset to check for leakage, no rebuild needed
No server, no JVM, no Spark. Every query is inspectable via timefence -v build or timefence explain.
| Joins | Point-in-time correct. ASOF JOIN fast path, ROW_NUMBER fallback |
| Guardrails | Embargo, max lookback, max staleness — all configurable |
| Inputs | Parquet, CSV, SQL query, DataFrame |
| Feature modes | Column selection, SQL, Python transform |
| Splitting | Time-based train / validation / test splits |
| Caching | Feature-level cache with content-hash keys |
| Audit | Full rebuild-and-compare or lightweight temporal check |
| Reports | Severity classification. JSON manifest, HTML report, Rich terminal |
| CLI | quickstart build audit explain diff inspect catalog doctor |
| Flags | -v verbose · --debug · --strict CI gate · --json · --html |
| Not This | Why | Use Instead |
|---|---|---|
| Feature store | No server, no online serving | Tecton, Feast |
| Data orchestrator | No scheduling, no DAGs | Airflow, Dagster |
| Data quality framework | Temporal correctness only | Great Expectations |
| ML pipeline framework | Produces training data only | MLflow, Metaflow |
One tool. One job. Temporal correctness for ML training data.
If Timefence helps you, consider giving it a ⭐️ on GitHub — it helps others find it.
Documentation · Contributing · Changelog
MIT License



