Skip to content

Commit

Permalink
Add datasets description.
Browse files Browse the repository at this point in the history
  • Loading branch information
trekhleb committed Dec 22, 2018
1 parent f8af24c commit eb16b6a
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This is a list of datasets that are used for Jupyter Notebook demos in this repository.

#### MNIST (Handwritten Digits)
### MNIST (Handwritten Digits)

> [mnist-demo.csv](mnist-demo.csv)
Expand All @@ -12,39 +12,39 @@ A sample of original MNIST dataset in a CSV format. Instead of using full datase

Each row in the dataset consists of 785 values: the first value is the label (a number from 0 to 9) and the remaining 784 values (28x28 pixels image) are the pixel values (a number from 0 to 255).

#### World Happiness Report 2017
### World Happiness Report 2017

> [world-happiness-report-2017.csv](world-happiness-report-2017.csv)
_Source: [Kaggle](https://www.kaggle.com/unsdsn/world-happiness#2017.csv)_

Happiness rank and scores by country, 2017.

#### Iris Flowers
### Iris Flowers

> [iris.csv](iris.csv)
_Source: [ics.uci.edu](http://archive.ics.uci.edu/ml/datasets/Iris)_

Iris data set data set consists of several samples from each of three species of Iris (`Iris setosa`, `Iris virginica` and `Iris versicolor`). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

#### NYC Taxi
### NYC Taxi

> [nyc_taxi.csv](nyc_taxi.csv)
_Source: [Kaggle](https://www.kaggle.com/boltzmannbrain/nab)_

Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the [NYC Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets.

#### Microchips Tests (Artificial)
### Microchips Tests (Artificial)

> [microchips-tests.csv](microchips-tests.csv)
_Source: [Machine Learning at Coursera](https://www.coursera.org/learn/machine-learning)_

Artificial dataset in which `param_1` and `param_2` produce non-linear decision boundary.

#### Non-Linear Y(X) Dependency (Artificial)
### Non-Linear Y(X) Dependency (Artificial)

> [non-linear-regression-x-y.csv](non-linear-regression-x-y.csv)
Expand Down

0 comments on commit eb16b6a

Please sign in to comment.