DataLint-ml is the core machine learning engine powering the DataLint ecosystem. This repository contains advanced ML models and algorithms designed to automatically detect, analyze, and report data quality issues inside datasets.
- Python 3.12+
- pip or conda
- uv (package manager, optional but recommended)
- Virtual Environment (recommended)
- GPU (recommended)
See pyproject.toml for a complete list of dependencies.
- Run commands to set up the environment and install dependencies:
./scripts/setup-env.sh- If you are using a GPU, ensure that you have the appropriate CUDA toolkit installed.
To run unit tests and ensure code quality, run the following commands:
./scripts/run-pytest.shLinting is done using ruff. To check for linting issues, run:
.scripts/run-ruff.shCode formatting is done using black. To format the code, run:
./scripts/run-black.shWe welcome contributions to enhance the capabilities of DataLint-ml.
- DataLint - Main DataLint platform