ETL, Analytics, Versioning for Unstructured Data
-
Updated
Apr 9, 2025 - Python
ETL, Analytics, Versioning for Unstructured Data
A Python toolbox for gaining geometric insights into high-dimensional data
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Machine learning for dataframes
Tools for test driven data-wrangling and data validation.
Package python to remove common ugliness from a csv-like file
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Data Cleaning with Python
A framework for data piping in python
data wrangling simplicity, complete audit transparency, and at speed
Execute OpenRefine JSON scripts without OpenRefine (or Java)
Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration (under development)
A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.
Library to make MongoDB aggregation framework and pipelines easy to use in python.
🚀🤖 Cognito - Simplifies AutoML Data Preprocessing.
Fluent dataset operations, compatible with your favorite libraries
Make quick and dirty data mining made easier in Sublime Text
Wrangle messy numerical, image, and text data into consistent well-organized formats
Import, maintain and export tag metadata to/from audio files and a dynamically created SQLite table. Automates incremental tag cleanup, enrichment and standardisation for your digital audio library at scale using pre-scripted SQL queries, achieving quality and consistency throughout your music collection in a manner not possible with a tagger.
Add a description, image, and links to the data-wrangling topic page so that developers can more easily learn about it.
To associate your repository with the data-wrangling topic, visit your repo's landing page and select "manage topics."