Add datasets description.

Sonatrix · Dec 22, 2018 · f8af24c · f8af24c
1 parent d5a0679
commit f8af24c
Show file tree

Hide file tree

Showing 4 changed files with 10,379 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -119,3 +119,7 @@ After this Jupyter Notebook will be accessible by `http://localhost:8888`.
 Each algorithm section contains demo links to [Jupyter NBViewer](http://nbviewer.jupyter.org/). This is fast online previewer for Jupyter notebooks where you may see demo code, charts and data right in your browser without installing anything locally. In case if you want to _change_ the code and _experiment_ with demo notebook you need to launch the notebook in [Binder](https://mybinder.org/). You may do it by simply clicking the _"Execute on Binder"_ link in top right corner of the NBViewer.
 
 ![](./images/binder-button-place.png)
+
+## Datasets
+
+The list of datasets that is being used for Jupyter Notebook demos may be found in [data folder](data).
diff --git a/data/README.md b/data/README.md
@@ -0,0 +1,53 @@
+# Datasets
+
+This is a list of datasets that are used for Jupyter Notebook demos in this repository.
+
+#### MNIST (Handwritten Digits)
+
+> [mnist-demo.csv](mnist-demo.csv)
+
+_Source: [Kaggle](https://www.kaggle.com/oddrationale/mnist-in-csv/home)_
+
+A sample of original MNIST dataset in a CSV format. Instead of using full dataset with 60000 training examples the dataset consists of just 10000 examples.
+
+Each row in the dataset consists of 785 values: the first value is the label (a number from 0 to 9) and the remaining 784 values (28x28 pixels image) are the pixel values (a number from 0 to 255).
+
+#### World Happiness Report 2017
+
+> [world-happiness-report-2017.csv](world-happiness-report-2017.csv)
+
+_Source: [Kaggle](https://www.kaggle.com/unsdsn/world-happiness#2017.csv)_
+
+Happiness rank and scores by country, 2017.
+
+#### Iris Flowers
+
+> [iris.csv](iris.csv)
+
+_Source: [ics.uci.edu](http://archive.ics.uci.edu/ml/datasets/Iris)_
+
+Iris data set data set consists of several samples from each of three species of Iris (`Iris setosa`, `Iris virginica` and `Iris versicolor`). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.
+
+#### NYC Taxi
+
+> [nyc_taxi.csv](nyc_taxi.csv)
+
+_Source: [Kaggle](https://www.kaggle.com/boltzmannbrain/nab)_
+
+Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the [NYC Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets.
+
+#### Microchips Tests (Artificial)
+
+> [microchips-tests.csv](microchips-tests.csv)
+
+_Source: [Machine Learning at Coursera](https://www.coursera.org/learn/machine-learning)_
+
+Artificial dataset in which `param_1` and `param_2` produce non-linear decision boundary.
+
+#### Non-Linear Y(X) Dependency (Artificial)
+
+> [non-linear-regression-x-y.csv](non-linear-regression-x-y.csv)
+
+_Source: [Machine Learning at Coursera](https://www.coursera.org/learn/machine-learning)_
+
+Artificial dataset that contains non-linear y(x) dependency.