Skip to content

my collected resources for data science and machine learning

Notifications You must be signed in to change notification settings

levankankadze/ml_ds_resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 

Repository files navigation

ml_ds_resources

my collected resources for data science and machine learning

definitions

cardinality is defined as the number of elements in a set or other grouping.

Pearson correlation coefficient PCC is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1 (as 1 would represent an unrealistically perfect correlation).

Books

The Mechanics of Machine Learning

Models

PyTorch image models, scripts, pretrained weights

Courses

EE 274: Data Compression, Theory and Applications

CS 189/289A: Introduction to Machine Learning

CS231n: Convolutional Neural Networks for Visual Recognition.

Machine Learning for 3D Data cs468 - Spring 2017

ML recipes

Huggingface Course

Various Tool

Deepnote

Codepen

Normalization and Standardization in 2 Minutes Dimitris Effrosynidis-Normalization and Standardization

Python

EPAM Python

The Hitchhiker’s Guide to Python!

Markdown

Markdown and Visual Studio Code

Data Science Interview

Data Science Interviews

Blogs

Rishabh Shukla

Glossaries

Machine Learning Glossary

Data Science

Awesome Machine Learning On Source Code

Label Encoding

Label Encoder vs. One Hot Encoder in Machine Learning

Boosting

Quick Introduction to Boosting Algorithms in Machine Learning A Gentle Introduction to Gradient Boosting

Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python

XGBoost

original Parallel Gradient Boosting Decision Trees

XGBoost Documentation

Complete Guide to Parameter Tuning in XGBoost with codes in Python

WHEN and WHY are batches used in machine learning ?

Maths

A book to cover most of math needed for machine learning

Data Websites

UC Irvine Machine Learning Repository

Tutorials

Data Science in VS Code tutorial

Introduction to Machine Learning

Technical things

[https://cs231n.github.io/setup-instructions/]

Anaconda Installation guide on Windows for Data Science


Software

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. TensorFlow was developed by the Google Brain team for internal Google use in research and production.

PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open-source software released under the modified BSD license.

Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Up until version 2.3, Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit, Theano, and PlaidML.

An open source hyperparameter optimization framework to automate hyperparameter search

Ray Tune apart from the other hyperparameter optimization libraries. State of the art algorithms Maximize model performance and minimize training costs by using the latest algorithms such as PBT, HyperBAND, ASHA, and more.

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages: Faster training speed and higher efficiency. Lower memory usage, Better accuracy, Support of parallel, distributed, and GPU learning, Capable of handling large-scale data.

Installing tflite-runtime

pip install https://github.com/iCorv/tflite-runtime/raw/master/tflite_runtime-2.4.0-py3-none-any.whl

Compiled TensorFlow lite runtime repository

Some questions

Pandas has both isna() and isnull(). I usually use isnull() to detect missing values and have never met the case so that I had to use other than that. So, when to use isna()?

Answer

isnull is an alias for isna. Literally in the code source of pandas:

isnull = isna

Indeed:

>>> pd.isnull
<function isna at 0x7fb4c5cefc80>

So I would recommend using isna. source


Use Pipenv or other tools is recommended for improving your development flow.

pip freeze > requirements.txt  # Python3

source

Accuracy and precision

Precision and accuracy are two ways that scientists think about error. Accuracy refers to how close a measurement is to the true or accepted value. Precision refers to how close measurements of the same item are to each other. Precision is independent of accuracy.

Alt text

About

my collected resources for data science and machine learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published