my collected resources for data science and machine learning
cardinality is defined as the number of elements in a set or other grouping.
Pearson correlation coefficient PCC is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1 (as 1 would represent an unrealistically perfect correlation).
The Mechanics of Machine Learning
PyTorch image models, scripts, pretrained weights
EE 274: Data Compression, Theory and Applications
CS 189/289A: Introduction to Machine Learning
CS231n: Convolutional Neural Networks for Visual Recognition.
Machine Learning for 3D Data cs468 - Spring 2017
Normalization and Standardization in 2 Minutes Dimitris Effrosynidis-Normalization and Standardization
The Hitchhiker’s Guide to Python!
Markdown and Visual Studio Code
Awesome Machine Learning On Source Code
Label Encoder vs. One Hot Encoder in Machine Learning
Quick Introduction to Boosting Algorithms in Machine Learning A Gentle Introduction to Gradient Boosting
Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python
original Parallel Gradient Boosting Decision Trees
Complete Guide to Parameter Tuning in XGBoost with codes in Python
WHEN and WHY are batches used in machine learning ?
A book to cover most of math needed for machine learning
UC Irvine Machine Learning Repository
Data Science in VS Code tutorial
Introduction to Machine Learning
[https://cs231n.github.io/setup-instructions/]
Anaconda Installation guide on Windows for Data Science
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. TensorFlow was developed by the Google Brain team for internal Google use in research and production.
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open-source software released under the modified BSD license.
Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Up until version 2.3, Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit, Theano, and PlaidML.
An open source hyperparameter optimization framework to automate hyperparameter search
Ray Tune apart from the other hyperparameter optimization libraries. State of the art algorithms Maximize model performance and minimize training costs by using the latest algorithms such as PBT, HyperBAND, ASHA, and more.
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: Faster training speed and higher efficiency. Lower memory usage, Better accuracy, Support of parallel, distributed, and GPU learning, Capable of handling large-scale data.
pip install https://github.com/iCorv/tflite-runtime/raw/master/tflite_runtime-2.4.0-py3-none-any.whl
Compiled TensorFlow lite runtime repository
Pandas has both isna() and isnull(). I usually use isnull() to detect missing values and have never met the case so that I had to use other than that. So, when to use isna()?
isnull is an alias for isna. Literally in the code source of pandas:
isnull = isna
Indeed:
>>> pd.isnull
<function isna at 0x7fb4c5cefc80>
So I would recommend using isna. source
Use Pipenv or other tools is recommended for improving your development flow.
pip freeze > requirements.txt # Python3
Accuracy and precision
Precision and accuracy are two ways that scientists think about error. Accuracy refers to how close a measurement is to the true or accepted value. Precision refers to how close measurements of the same item are to each other. Precision is independent of accuracy.