Awesome Machine Learning

This repository is a compilation of various sources of knowledge related to Data Science and Machine Learning. The list includes curated videos, blog posts, textbooks, GitHub repositories, and more. If you find this repository helpful, kindly consider giving it a star! Your support would aid me in improving and maintaining this project. Thank you for your time! 🌟🌟🌟🌟🌟

If you want to contribute to this list (please do), send me a pull request or contact me.

High-quality FREE courses on Youtube

NLP and Large Language Models

Visual Data Science & ML

❤️ Exploring transformer models
✅ A visual introduction to machine learning
✅ How GPT3 Works - Visualizations and Animations
✅ The illustrated Transformer
❤️ The Illustrated Retrieval Transformer
📌 Seeing Theory - A visual introduction to probability and statistics. Also, includes a textbook called "Seeing Theory"
✅ wevi: Word Embedding Visual Inspector
✅ AI/ML Cheatsheets for Stanford/MIT Courses
✅ CS 229 ― Machine Learning: A set of illustrated Machine Learning cheatsheets covering the content of the CS 229 class in Stanford

Useful Datasets for NLP Research (e.g., text classification and sentiment analysis)

✅B2W Reviews - A very large dataset of customer reviews released by Americanas S.A. is now available at the Hugging Face Hub
✅IMDb PT-BR - A version of IMDb translated to brazilian portuguese
✅Yelp 2018 in CSV
✅Amazon Reviews

Tutorials on Computer Vision & NLP

✅Document AI: LiLT a better language agnostic LayoutLM model

Personal Blogs

✅Distill.pub - A modern medium for presenting research that showcases AI/ML concepts in clear, dynamic and vivid form
✅Christopher Olah's Blog - A machine learning researcher who likes to understand things clearly, and explain them well
✅Jay Alammar - Visualizing machine learning one concept at a time

Blog Posts

✅Multiple Classifier Systems — a brief introduction
✅Automatic Text Summarization with Machine Learning — An overview
✅Automatic Text Summarization Made Simple with Python
✅Autoregressive Models in Deep Learning — a Brief Survey by George Ho
✅A Brief Timeline of NLP from Bag of Words to the Transformer Family
✅The Annotated Transformer
✅HuggingFace - Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models

Top Machine Learning Books 📚

A growing curated list of machine learning books.

Github Repositories

Neural Net Drawing Libraries

✅PlotNeuralNet - generates LaTeX code for drawing neural networks for publications and presentations
✅NN-SVG generates SVGs for neural net architecture schematics

Transformers and NLP

✅State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
📌BERTopic - A topic modeling technique that employs transformers and c-TF-IDF technique to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions
✅LSA-Text-Summarization - This code implements the summarization of text documents using Latent Semantic Analysis
✅extractive-text-summarization - Extractive text summarization based on word frequencies and spacy
✅SimCSE - Simple Contrastive Learning of Sentence Embeddings
✅Koan - A word2vec negative sampling implementation with correct CBOW update
✅Apache OpenNLP - a machine learning based toolkit for the processing of natural language text
✅sense2vec - Contextually-keyed word vectors
✅Mega - Moving Average Equipped Gated Attention. Mega is a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism.

Graph AI (Hot Topic 🔥)

Graph AI, which leverages machine learning methods to learn patterns on graph-structured data, has been a hot research topic. Graphs are a kind of data structure that models a set of objects (nodes) and their relationships (edges). The power of graph formalism lies both in its focus on relationships between points as well as in its generality. Recently, research in the graph domain with machine learning has received more and more attention because of the great expressive power of graphs. As a powerful non-Euclidean data structure for machine learning, graph draws attention to analyses that focus on node classification, link prediction, and clustering.

❤️ Pytorch Geometric - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)
❤️ DGL library - An easy-to-use, high performance and scalable Python package for deep learning on graphs
❤️ PyGOD - a Python library for graph outlier detection (anomaly detection)
❤️ Graph-MLPMixer A Generalization of ViT/MLP-Mixer to Graphs
❤️ StellarGraph A Python library for machine learning on graphs and networks which offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data
👁️ cuGraph - Represents a collection of packages focused on GPU-accelerated graph analytics
✅ igraph - a fast and open-source C library to manipulate and analyze graphs with interfaces in Python, R and C++
✅Karate Club - an unsupervised machine learning extension library for NetworkX. Karate Club consists of state-of-the-art methods to do unsupervised learning on graph-structured data. According to the authors, it is a Swiss Army knife for small-scale graph mining research.

Machine Learning in Rust 🦀

The Rust ML landscape is still young and better described as experimental. Nevertheless, Rust's performance, flexibility, and unique approach to abstractions make it a promising language for building backends for Machine Learning, which is nowadays dominated by C/C++.

✅huggingface/tokenizers - The core of tokenizers, written in Rust with a focus on performance and versatility
✅DimaKudosh/word2vec - Rust interface to word2vec
📌Linfa - Provide a comprehensive toolkit to build Machine Learning applications with Rust in spirit to Python's scikit-learn
🔥Burn - This library aims to be a comprehensive deep-learning framework in Rust that offers exceptional flexibility for both researchers and practitioners

Machine Learning in C++ 💪

A faster run time is essential in machine learning, which explains why C++ is suitable for machine learning and large-scale AI applications. Nowadays, C++ is powering most machine learning engines.

❤️ mlpack - an intuitive, fast, and flexible header-only C++ machine learning library with bindings to other languages
✅ ensmallen - a high-quality C++ library for non-linear numerical optimization, which provides many types of optimizers that can be used for virtually any numerical optimization task
✅Armadillo - C++ library for linear algebra & scientific computing. Useful for algorithm development directly in C++ or quick conversion of research code into production environments
📌NumCpp - A Templatized Header Only C++ Implementation of the Python NumPy Library
✅DLib - C++ toolkit containing machine learning algorithms used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments
✅Caffe - A deep learning framework developed with cleanliness, readability, and speed in mind
✅DyNet - A dynamic neural network library working well with networks that have dynamic structures that change for every training instance. Written in C++ with bindings in Python

High Performance Dataframes

✅cuDF - Built based on the Apache Arrow columnar memory format, and with a pandas-like API that will be familiar to data engineers, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
✅Polars: Blazingly fast DataFrames in Rust - Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as the memory model.

Statistical packages in Python

📌statannotations - Python package to optionally compute statistical test and add statistical annotations on plots generated with seaborn
❤️statsmodels - Python package that provides a complement to scipy for statistical computations, and allows users to explore data, estimate statistical models, perform statistical tests, and descriptive statistics

Time series 📈

✅sktime - a library for time series analysis in Python. It provides a unified interface for multiple time series learning tasks
📌Darts - a Python library for user-friendly forecasting and anomaly detection on time series
✅Prophet - A procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects.

Causal Inference methods

✅CausalML - a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms.
✅BiomedSciAI/causallib - Enables estimating the causal effect of an intervention on some outcome from real-world non-experimental observational data

Deep learning on Tabular Data

✅ SAINT - Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training.
✅ ARM-Net - Adaptive Relation Modeling Network for Structured Data.
✅ TabTransformer - A implementation in Keras of TabTansformer, an attention network for tabular data.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Machine Learning

High-quality FREE courses on Youtube

NLP and Large Language Models

Visual Data Science & ML

Useful Datasets for NLP Research (e.g., text classification and sentiment analysis)

Tutorials on Computer Vision & NLP

Personal Blogs

Blog Posts

Top Machine Learning Books 📚

Github Repositories

Neural Net Drawing Libraries

Transformers and NLP

Graph AI (Hot Topic 🔥)

Machine Learning in Rust 🦀

Machine Learning in C++ 💪

High Performance Dataframes

Statistical packages in Python

Time series 📈

Causal Inference methods

Deep learning on Tabular Data

Research Papers

Graph AI

About

Releases

Packages

luisfredgs/Awesome-machine-learning

Folders and files

Latest commit

History

Repository files navigation

Awesome Machine Learning

High-quality FREE courses on Youtube

NLP and Large Language Models

Visual Data Science & ML

Useful Datasets for NLP Research (e.g., text classification and sentiment analysis)

Tutorials on Computer Vision & NLP

Personal Blogs

Blog Posts

Top Machine Learning Books 📚

Github Repositories

Neural Net Drawing Libraries

Transformers and NLP

Graph AI (Hot Topic 🔥)

Machine Learning in Rust 🦀

Machine Learning in C++ 💪

High Performance Dataframes

Statistical packages in Python

Time series 📈

Causal Inference methods

Deep learning on Tabular Data

Research Papers

Graph AI

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages