forked from trekhleb/homemade-machine-learning
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
10,379 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Datasets | ||
|
||
This is a list of datasets that are used for Jupyter Notebook demos in this repository. | ||
|
||
#### MNIST (Handwritten Digits) | ||
|
||
> [mnist-demo.csv](mnist-demo.csv) | ||
_Source: [Kaggle](https://www.kaggle.com/oddrationale/mnist-in-csv/home)_ | ||
|
||
A sample of original MNIST dataset in a CSV format. Instead of using full dataset with 60000 training examples the dataset consists of just 10000 examples. | ||
|
||
Each row in the dataset consists of 785 values: the first value is the label (a number from 0 to 9) and the remaining 784 values (28x28 pixels image) are the pixel values (a number from 0 to 255). | ||
|
||
#### World Happiness Report 2017 | ||
|
||
> [world-happiness-report-2017.csv](world-happiness-report-2017.csv) | ||
_Source: [Kaggle](https://www.kaggle.com/unsdsn/world-happiness#2017.csv)_ | ||
|
||
Happiness rank and scores by country, 2017. | ||
|
||
#### Iris Flowers | ||
|
||
> [iris.csv](iris.csv) | ||
_Source: [ics.uci.edu](http://archive.ics.uci.edu/ml/datasets/Iris)_ | ||
|
||
Iris data set data set consists of several samples from each of three species of Iris (`Iris setosa`, `Iris virginica` and `Iris versicolor`). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. | ||
|
||
#### NYC Taxi | ||
|
||
> [nyc_taxi.csv](nyc_taxi.csv) | ||
_Source: [Kaggle](https://www.kaggle.com/boltzmannbrain/nab)_ | ||
|
||
Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the [NYC Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets. | ||
|
||
#### Microchips Tests (Artificial) | ||
|
||
> [microchips-tests.csv](microchips-tests.csv) | ||
_Source: [Machine Learning at Coursera](https://www.coursera.org/learn/machine-learning)_ | ||
|
||
Artificial dataset in which `param_1` and `param_2` produce non-linear decision boundary. | ||
|
||
#### Non-Linear Y(X) Dependency (Artificial) | ||
|
||
> [non-linear-regression-x-y.csv](non-linear-regression-x-y.csv) | ||
_Source: [Machine Learning at Coursera](https://www.coursera.org/learn/machine-learning)_ | ||
|
||
Artificial dataset that contains non-linear y(x) dependency. |
Oops, something went wrong.