Check the CHANGELOG file to have a global overview of the latest updates / new features ! 😋
Check the provided notebooks to have an overview of the available features !
├── example_data : data used for the demonstrations
├── loggers : custom utilities for the `logging` module
│ ├── __init__.py : defines useful utilities to control `logging`
│ ├── telegram_handler.py : custom logger using the telegram bot api
│ ├── time_logging.py : custom timer features
│ └── tts_handler.py : custom logger using the Text-To-Speech models
├── tests : custom unit-testing for the different modules
│ ├── data : test data files
│ ├── __reproduction : expected output files for reproducibility tests
│ └── test_*.py : test file
├── utils
│ ├── audio : audio utilities
│ │ ├── audio_annotation.py : annotation features for new TTS/STT dataset creation
│ │ ├── audio_io.py : audio loading / writing
│ │ ├── audio_player.py : audio playback functionality
│ │ ├── audio_processing.py : audio normalization / processing
│ │ ├── audio_recorder.py : audio recording functionality
│ │ ├── audio_stream.py : audio streaming support
│ │ ├── mkv_utils.py : processing for .mkv video format
│ │ ├── noisereducev1.py : maintained version of the old `noisereduce` library
│ │ └── stft.py : implementations of various mel-spectrogram methods
│ ├── callbacks : callback management system
│ │ ├── __init__.py
│ │ ├── callback.py : base callback implementation
│ │ ├── displayer.py : display-related callbacks
│ │ ├── file_saver.py : file saving callbacks
│ │ └── function_callback.py : function-based callbacks
│ ├── databases : custom storage features
│ │ ├── vectors : vector storage
│ │ │ ├── faiss_index.py : vector index using `faiss`
│ │ │ ├── keras_index.py : vector index using `keras`
│ │ │ ├── numpy_index.py : vector index using `numpy`
│ │ │ ├── torch_index.py : vector index using `torch`
│ │ │ └── vector_index.py : abstract vector index
│ │ ├── database.py : abstract database
│ │ ├── database_wrapper.py : database wrapping another database
│ │ ├── json_dir.py : database storing each entry in a `.json` file
│ │ ├── json_file.py : basic implementation storing all data in a `dict`
│ │ ├── json.py : optimized `json`-based data storage
│ │ ├── ordered_database_wrapper.py : wrapper that keeps track of insertion order (like an `OrderedDict`)
│ │ └── vector_database.py : wrapper storing both data and vectors
│ ├── datasets : dataset utilities
│ │ ├── audio_datasets : audio dataset implementations
│ │ │ ├── common_voice.py : Mozilla Common Voice dataset
│ │ │ ├── libri_speech.py : LibriSpeech dataset
│ │ │ ├── processing.py : audio dataset processing
│ │ │ ├── siwis.py : SIWIS dataset
│ │ │ └── voxforge.py : VoxForge dataset
│ │ ├── builder.py : dataset building utilities
│ │ ├── loader.py : dataset loading utilities
│ │ └── summary.py : dataset summary tools
│ ├── image : image features
│ │ ├── bounding_box : features for bounding box manipulation
│ │ │ ├── combination.py : combines group of boxes
│ │ │ ├── converter.py : box format conversion
│ │ │ ├── filters.py : box filtering
│ │ │ ├── locality_aware_nms.py : LA-NMS implementation
│ │ │ ├── metrics.py : box metrics (IoU, etc.)
│ │ │ ├── non_max_suppression.py : NMS implementation
│ │ │ ├── processing.py : box processing
│ │ │ └── visualization.py : box extraction / drawing
│ │ ├── video : utilities for video I/O and stream
│ │ │ ├── ffmpeg_reader.py : video reader using `ffmpeg-python`
│ │ │ ├── http_screen_mirror.py : custom camera reading frames from the `HttpScreenMirror` app
│ │ │ ├── filters.py : box filtering
│ │ │ ├── streaming.py : camera streaming utilities
│ │ │ └── writer.py : video writers (`OpenCV` and `ffmpeg-python` are currently supported)
│ │ ├── image_io.py : image loading / writing
│ │ ├── image_normalization.py : normalization schema
│ │ └── image_processing.py : image processing utilities
│ ├── keras : keras and hardware acceleration utilities
│ │ ├── ops : operation interfaces for different backends
│ │ │ ├── builder.py : operation builder
│ │ │ ├── core.py : core operations
│ │ │ ├── execution_contexts.py : execution context management
│ │ │ ├── image.py : image operations
│ │ │ ├── linalg.py : linear algebra operations
│ │ │ ├── math.py : mathematical operations
│ │ │ ├── nn.py : neural network operations
│ │ │ ├── numpy.py : numpy-compatible operations
│ │ │ └── random.py : random operations
│ │ ├── runtimes : model runtime implementations
│ │ │ ├── onnx_runtime.py : ONNX runtime
│ │ │ ├── runtime.py : base runtime class
│ │ │ ├── saved_model_runtime.py : saved model runtime
│ │ │ ├── tensorrt_llm_runtime.py : TensorRT LLM runtime
│ │ │ └── tensorrt_runtime.py : TensorRT runtime
│ │ ├── compile.py : graph compilation features
│ │ └── gpu.py : GPU utilities
│ ├── text : text-related features
│ │ ├── abreviations
│ │ ├── parsers : document parsers (new implementation)
│ │ │ ├── combination.py : box combination for parsing
│ │ │ ├── docx_parser.py : DOCX document parser
│ │ │ ├── java_parser.py : Java code parser
│ │ │ ├── md_parser.py : Markdown parser
│ │ │ ├── parser.py : base parser implementation
│ │ │ ├── pdf_parser.py : PDF parser
│ │ │ ├── py_parser.py : Python code parser
│ │ │ └── txt_parser.py : text file parser
│ │ ├── cleaners.py : text cleaning methods
│ │ ├── ctc_decoder.py : CTC-decoding
│ │ ├── metrics.py : text evaluation metrics
│ │ ├── numbers.py : numbers cleaning methods
│ │ ├── paragraphs_processing.py : paragraphs processing functions
│ │ ├── sentencepiece_tokenizer.py : sentencepiece tokenizer interface
│ │ ├── text_processing.py : text processing functions
│ │ ├── tokenizer.py : tokenizer implementation
│ │ └── tokens_processing.py : token-level processing
│ ├── threading : threading utilities
│ │ ├── async_result.py : asynchronous result handling
│ │ ├── priority_queue.py : priority queue with order consistency
│ │ ├── process.py : process management
│ │ └── stream.py : data streaming implementation
│ ├── comparison_utils.py : convenient comparison features for various data types
│ ├── distances.py : distance and similarity metrics
│ ├── embeddings.py : embeddings saving / loading
│ ├── file_utils.py : data saving / loading
│ ├── generic_utils.py : generic features
│ ├── plot_utils.py : plotting functions
│ ├── sequence_utils.py : sequence manipulation
│ └── wrappers.py : function wrappers and decorators
├── example_audio.ipynb
├── example_custom_operations.ipynb
├── example_generic.ipynb
├── example_image.ipynb
├── example_text.ipynb
├── LICENSE
├── Makefile
├── README.md
└── requirements.txtThe loggers module is independant from the utils one, making it easily reusable / extractable.
See the installation guide for a step-by-step installation 😄
Here is a summary of the installation procedure, if you have a working python environment :
- Clone this repository :
git clone https://github.com/yui-mhcp/data_processing.git - Go to the root of this repository :
cd data_processing - Install requirements :
pip install -r requirements.txt - Open an example notebook and follow the instructions !
Important Notes :
- The
utils/{audio / image / text}modules are not loaded by default, meaning that it is not required to install the requirements for a given submodule if you do not want to use it. In this case, you can simply remove the submodule and run thepipreqscommand to compute a newrequirements.txtfile ! - The
kerasmodule is not imported by default, and most of the features are available without ever importing it ! 😄 - The
requirements.txtfile does not include any backend (i.e.,tensorflow,torch,jax, etc.), so make sure to manually install it if necessary !
- Make example for audio processing
- Make example for image processing
- Make example for text processing
- Make example for plot utils
- Make example for embeddings manipulation
- Make the code keras-3 compatible
- Remove
kerasfrom dependencies (i.e., features that do not requirekerasshould work even ifkerasis not installed) - Enable any backend to be aware of XLA/eager execution (i.e.,
executing_eagerlyfunction) - Enable
graph_compileto support all backends compilation-
tensorflowbackend (tf.function) -
torchbackend (torch.compile) -
jaxbackend (jax.jit) - Auto-detect
static_argnamesfor thejax.jitcompilation
-
- Allow
tf.functionwithgraph_compileregardless of thekerasbackend - Add GPU features for all backends
-
tensorflowbackend -
torchbackend -
jaxbackend
-
- Extract audio from videos
- Enables audio playing without
IPython.displayautoplay feature - Implement specific
Mel spectrogramimplementations - Enable the
read_audiofunction intf.datapipeline
- Add image loading / writing support
- Add video loading / writing support
- Add support for rotated bounding boxes
- Implement a keras 3 Non-Maximal Suppression (NMS)
- Implement the Locality-Aware NMS (LaNMS)
- Support text tokenization/encoding in
tf.datapipeline - Implement text cleaning
- Abreviation extensions
- Time / dollar / number extensions
- unicode convertion
- Support token-splitting instead of word-splitting in
Tokenizer - Support
transformerstokenizers convertion - Support
sentencepiecetokenizers - Extract text from documents
-
.txt -
.md -
.pdf -
.docx -
.html -
.epub
-
- Implement token-based logits masking
- Implement batch text encoding
- Add custom tokens to
Tokenizer - Add
CTC-decoding
- Make subplots easier to use via
argsandkwargs - Make custom plot functions usable with
plot_multiple - Add 3D plot / subplot support
- Implement custom plotting functions
- Spectrogram / attention weights
- Audio waveform
- Embeddings (d-dimensional vectors projected in 2D space)
- 3D volumes
- Classification result
- Confusion matrix (or any matrix)
-
The text cleaning module (
text.cleaners) is inspired from NVIDIA tacotron2 repository. Their implementation ofShort-Time Fourrier Transform (STFT)is also available inaudio/stft.py, adapted inkeras 3. -
The provided embeddings in
example_data/embeddings/embeddings_256_voxforge.csvhas been generated based on samples of the VoxForge dataset, and embedded with an AudioSiamese model (audio_siamese_256_mel_lstm).
Tutorials :
- The Keras 3 API which has been (partially) adapted in the
keras_utils/opsmodule to enablenumpybackend, andtf.datacompatibility - The tf.function guide
Contacts :
- Mail :
yui-mhcp@tutanota.com - Discord : yui0732
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.
This license allows you to use, modify, and distribute the code, as long as you include the original copyright and license notice in any copy of the software/source. Additionally, if you modify the code and distribute it, or run it on a server as a service, you must make your modified version available under the same license.
For more information about the AGPL-3.0 license, please visit the official website
If you find this project useful in your work, please add this citation to give it more visibility ! 😋
@misc{yui-mhcp
author = {yui},
title = {A Deep Learning projects centralization},
year = {2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/yui-mhcp}}
}