Skip to content

CentML/DeepView.Predict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Habitat

License

A Runtime-Based Computational Performance Predictor for Deep Neural Network Training

Habitat is a tool that predicts a deep neural network's training iteration execution time on a given GPU. It currently supports PyTorch. To learn more about how Habitat works, please see our research paper.

Installation

To run Habitat, you need:

Currently, we have predictors for the following Nvidia GPUs:

GPU Generation Memory Mem. Type SMs
P4000 Pascal 8 GB GDDR5 14
P100 Pascal 16 GB HBM2 56
V100 Volta 16 GB HBM2 80
2070 Turing 8 GB GDDR6 36
2080Ti Turing 11 GB GDDR6 68
T4 Turing 16 GB GDDR6 40
3090 Ampere 24 GB GDDR6X 82

NOTE: Not implmented yet

python3 -m pip install habitat
python3 -c "import habitat"

Building from source

Prerequsites:

  • A system equiped with an Nvidia GPU with properly configured CUDA
  • CUDA Toolkit
  • cmake v3.17+
    • Note that Habitat does not build properly with cmake v3.24.0 and v3.24.1 due to a bug in cmake. This bug is fixed by this change which has been merged in v3.24.2
  • Git Large File Storage - which contains pre-trained habitat models
git clone https://github.com/CentML/habitat.git && cd habitat
git submodule init && git submodule update

Note: Habitat needs access to your GPU's performance counters, which requires special permissions if you are running with a recent driver (418.43 or later). If you encounter a CUPTI_ERROR_INSUFFICIENT_PRIVILEGES error when running Habitat, please follow the instructions here and in issue #5.

Building with Docker

Habitat has been tested to work on the latest version of NVIDIA NGC PyTorch containers.

  1. To build Habitat with Docker, first run the NGC container.
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:22.08-py3
  1. Inside the container, clone the repository then build and install the Habitat Python package:
git clone --recursive https://github.com/centml/habitat
./habitat/analyzer/install-dev.sh

Building without Docker

  1. Install CUPTI

CUPTI is a profiling interface required by Habitat. Select the correct version of CUDA here and following the instructions to add NVIDIA's repository. Then, install CUPTI with:

sudo apt-get install cuda-cupti-11-x

where 11-x represents the version of CUDA you have installed.

  1. Install CMake 3.17+.

Follow these steps to download and install a precompiled version of CMake:

wget https://github.com/Kitware/CMake/releases/download/v3.24.2/cmake-3.24.2-linux-x86_64.sh
chmod +x cmake-3.24.2-linux-x86_64.sh
mkdir /opt/cmake
sh cmake-3.24.2-linux-x86_64.sh --prefix=/opt/cmake --skip-license
ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake

You can verify the version of CMake you installed with the following:

cmake --version
  1. Build and install the Habitat Python package:
git clone https://github.com/centml/habitat
./habitat/analyzer/install-dev.sh

Usage example

You can verify your Habitat installation by running the simple usage example:

# example.py
import habitat
import torch
import torchvision.models as models

# Define model and sample inputs
model = models.resnet50().cuda()
image = torch.rand(8, 3, 224, 224).cuda()

# Measure a single inference
tracker = habitat.OperationTracker(device=habitat.Device.RTX2080Ti)
with tracker.track():
    out = model(image)

trace = tracker.get_tracked_trace()
print("Run time on source:", trace.run_time_ms)

# Perform prediction to a single target device
pred = trace.to_device(habitat.Device.V100)
print("Predicted time on V100:", pred.run_time_ms)
python3 example.py

See experiments/run_experiment.py for other examples of Habitat usage.

Development Environment Setup

Release Process

Release History

See Releases

License

The code in this repository is licensed under the Apache 2.0 license (see LICENSE and NOTICE), with the exception of the files mentioned below.

This software contains source code provided by NVIDIA Corporation. These files are:

  • The code under cpp/external/cupti_profilerhost_util/ (CUPTI sample code)
  • cpp/src/cuda/cuda_occupancy.h

The code mentioned above is licensed under the NVIDIA Software Development Kit End User License Agreement.

We include the implementations of several deep neural networks under experiments/ for our evaluation. These implementations are copyrighted by their original authors and carry their original licenses. Please see the corresponding README files and license files inside the subdirectories for more information.

Research Paper

Habitat began as a research project in the EcoSystem Group at the University of Toronto. The accompanying research paper appeared in the proceedings of USENIX ATC'21. If you are interested, you can read a preprint of the paper here.

If you use Habitat in your research, please consider citing our paper:

@inproceedings{habitat-yu21,
  author = {Yu, Geoffrey X. and Gao, Yubo and Golikov, Pavel and Pekhimenko,
    Gennady},
  title = {{Habitat: A Runtime-Based Computational Performance Predictor for
    Deep Neural Network Training}},
  booktitle = {{Proceedings of the 2021 USENIX Annual Technical Conference
    (USENIX ATC'21)}},
  year = {2021},
}

Contributing

Check out CONTRIBUTING.md for more information on how to help with Habitat.

About

🔮 Execution time predictions for deep neural network training iterations across different GPUs.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 8