Gleaned

A fast toolkit for “gleaning” structured facts from messy files. Turn folders of CSV/JSON/Excel/Parquet into tidy, typed dataframes with validated metadata and simple pipelines you can run in code or from the CLI.

Features

Smart column normalization – map vendor/legacy headers to your canonical schema.
Lightweight pipelines – compose steps like load → normalize → validate → enrich → write.
Schema & units checks – optional soft/hard validation of required fields and units.
Fast I/O – pandas/pyarrow based; reads CSV, XLSX, JSON, Parquet.
Declarative rules – YAML/JSON rules for synonyms, dtypes, units, computed fields.
CLI & Python API – run one-off or batch jobs locally or in CI.
Extensible – drop-in custom steps and validators.

Installation

Clone the repository. In the directory, install the package in editable mode.

pip install -e .

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
examples		examples
src/gleaned		src/gleaned
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
battery_data.csv		battery_data.csv
battery_data.json		battery_data.json
battery_data.parquet		battery_data.parquet
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gleaned

Features

Installation

About

Uh oh!

Releases

Packages

Languages

License

DigiBatt/gleaned

Folders and files

Latest commit

History

Repository files navigation

Gleaned

Features

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages