Skip to content

Maxime-Cllt/DataLint-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 DataLint-ml

Python Version PyTorch Version

📖 Overview

DataLint-ml is the core machine learning engine powering the DataLint ecosystem. This repository contains advanced ML models and algorithms designed to automatically detect, analyze, and report data quality issues inside datasets.

🔧 Pre-requisites

  • Python 3.12+
  • pip or conda
  • uv (package manager, optional but recommended)
  • Virtual Environment (recommended)
  • GPU (recommended)

See pyproject.toml for a complete list of dependencies.

🚀 Installation

  1. Run commands to set up the environment and install dependencies:
./scripts/setup-env.sh
  1. If you are using a GPU, ensure that you have the appropriate CUDA toolkit installed.

🧪 Code quality

Unit Tests available

To run unit tests and ensure code quality, run the following commands:

./scripts/run-pytest.sh

Linting available

Linting is done using ruff. To check for linting issues, run:

.scripts/run-ruff.sh

Formatting available

Code formatting is done using black. To format the code, run:

./scripts/run-black.sh

🤝 Contributing

We welcome contributions to enhance the capabilities of DataLint-ml.

🔗 Related Projects

About

Deep learning model for unsafe char/word detection in CSV file

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •