- Python 2.7
-
Download pip, a Python package manager (if it's not already installed):
$ sudo easy_install pip
-
Install iPython using pip install:
$ sudo pip install "ipython[notebook]"
This module uses requests and tabulate modules, both of which are available on pypi, the Python package index.
$ sudo pip install requests
$ sudo pip install tabulate
To use H2O in Python, follow the instructions on the Install in Python tab after selecting the H2O version on the H2O Downloads page.
Launch H2O outside of the iPython notebook. You can do this in the top directory of your H2O build download. The version of H2O running must match the version of the H2O Python module for Python to connect to H2O. To access the H2O Web UI, go to https://localhost:54321 in your web browser.
Open the prostate_gbm.ipynb file. The notebook contains a demo that starts H2O, imports a prostate dataset into H2O, builds a GBM model, and predicts on the training set with the recently built model. Use Shift+Return to execute each cell and proceed to the next cell in the notebook .
$ ipython notebook prostate_gbm.ipynb
All demos are available here:
To set up your Python environment to run these examples, download and install H2O from Python using the instructions above.
- Predict Airline Delays - Uses historical airlines flight data to build multiple classification models to label any flight as either delayed or not delayed.
- Chicago Crime Rate - Uses weather and city statistics to compare arrest rates with the total crimes for each category.
- NYC Citibike Demand with Weather - Takes monthly bike ride data (~10 million rows) for the past two years to predict bike demand at each bike share station. Weather data is also incorporated to better predict bike usage.
- NYC Citibike Demand with Weather - smaller dataset - Takes monthly bike ride data (~1 million rows) for the past two years to predict bike demand at each bike share station. Weather data is also incorporated to better predict bike usage.
- Confusion Matrix & ROC - Creates a GBM and GLM model using the airlines dataset, including confusion matrices, ROCs, and scoring histories.
- Imputation - Substitutes values for missing data (imputes) the airlines dataset.
- Not Equal Factor - Try to slice the airlines dataset using !=
factor_level
. - Airline Confusion Matrices - Uses the airlines dataset to generate confusion matrices for algorithm performance analysis.
- Deep Learning for Prostate Cancer Analysis - Uses the prostate dataset to build a Deep Learning model.
- Airlines Prep - Condition the airline dataset by filtering out NAs if the departure delay in the input dataset is unknown. Anything longer than
minutesOfDelayWeTolerate
is treated as delayed. - GBM model using prostate dataset - Creates a GBM model using the prostate dataset.
- Balance Classes - Imports the airlines dataset, parses it, displays a summary, and runs GLM with a binomial link function.
- Clustering with KMeans - Demonstrates kmeans clusters and different diagnostics for selecting the number of clusters. Link to data is provided in the notebook.
- EEG Eye State - Uses EEG data collected from an Emotiv Neuroheadset and classifies eye state (open vs closed) with a GBM.
- Tree fetch demo - Trains a basic GBM model based on Airlines dataset & fetches the tree behind the model. Exploration of the tree fetched is explained.
-
AirlinesTest and AirlinesTrain - Used in Confusion Matrix & ROC, Airline Confusion Matrices, Balance Classes and Balance Classes
-
Allyears2k_headers - Used in Predict Airline Delays, Imputation, Not Equal Factor, and Airlines Prep
- chicagoAllWeather, chicagoCensus, and chicagoCrimes10k - Used in Chicago Crime Rate
-
Used in NYC Citibike Demand with Weather
-
2013-10 - 193MB - Used in NYC Citibike Demand with Weather - smaller dataset
-
NYC Weather Data - Used in NYC Citibike Demand with Weather and NYC Citibike Demand with Weather - smaller dataset