Skip to content

Commit

Permalink
Add datasets description.
Browse files Browse the repository at this point in the history
  • Loading branch information
trekhleb committed Dec 22, 2018
1 parent d5a0679 commit f8af24c
Show file tree
Hide file tree
Showing 4 changed files with 10,379 additions and 1 deletion.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,7 @@ After this Jupyter Notebook will be accessible by `http://localhost:8888`.
Each algorithm section contains demo links to [Jupyter NBViewer](http://nbviewer.jupyter.org/). This is fast online previewer for Jupyter notebooks where you may see demo code, charts and data right in your browser without installing anything locally. In case if you want to _change_ the code and _experiment_ with demo notebook you need to launch the notebook in [Binder](https://mybinder.org/). You may do it by simply clicking the _"Execute on Binder"_ link in top right corner of the NBViewer.

![](./images/binder-button-place.png)

## Datasets

The list of datasets that is being used for Jupyter Notebook demos may be found in [data folder](data).
53 changes: 53 additions & 0 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Datasets

This is a list of datasets that are used for Jupyter Notebook demos in this repository.

#### MNIST (Handwritten Digits)

> [mnist-demo.csv](mnist-demo.csv)
_Source: [Kaggle](https://www.kaggle.com/oddrationale/mnist-in-csv/home)_

A sample of original MNIST dataset in a CSV format. Instead of using full dataset with 60000 training examples the dataset consists of just 10000 examples.

Each row in the dataset consists of 785 values: the first value is the label (a number from 0 to 9) and the remaining 784 values (28x28 pixels image) are the pixel values (a number from 0 to 255).

#### World Happiness Report 2017

> [world-happiness-report-2017.csv](world-happiness-report-2017.csv)
_Source: [Kaggle](https://www.kaggle.com/unsdsn/world-happiness#2017.csv)_

Happiness rank and scores by country, 2017.

#### Iris Flowers

> [iris.csv](iris.csv)
_Source: [ics.uci.edu](http://archive.ics.uci.edu/ml/datasets/Iris)_

Iris data set data set consists of several samples from each of three species of Iris (`Iris setosa`, `Iris virginica` and `Iris versicolor`). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

#### NYC Taxi

> [nyc_taxi.csv](nyc_taxi.csv)
_Source: [Kaggle](https://www.kaggle.com/boltzmannbrain/nab)_

Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the [NYC Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets.

#### Microchips Tests (Artificial)

> [microchips-tests.csv](microchips-tests.csv)
_Source: [Machine Learning at Coursera](https://www.coursera.org/learn/machine-learning)_

Artificial dataset in which `param_1` and `param_2` produce non-linear decision boundary.

#### Non-Linear Y(X) Dependency (Artificial)

> [non-linear-regression-x-y.csv](non-linear-regression-x-y.csv)
_Source: [Machine Learning at Coursera](https://www.coursera.org/learn/machine-learning)_

Artificial dataset that contains non-linear y(x) dependency.
Loading

0 comments on commit f8af24c

Please sign in to comment.