Cloud, DevOps, Infra

System / Infra
Compute & Storage
Grid computing / Super computing
Cloud services
Tools
CPU
FPGA
GPU
TPU
IPU
Performance
Contributing

System / Infra

serveo.net - Serveo is an SSH server just for remote port forwarding. When a user connects to Serveo, they get a public URL that anybody can use to connect to their localhost server. See link for other SSH and related alternatives, useful to be able to serve resources across devices i.e. access GPU or other hardware accelerators from another device remotely. | How to forward my local port to public using Serveo? | Serveo on GitHub
Inlets by Alex Ellis | Get started | Video

Compute & Storage

Cray Computers | Artificial Intelligence | Accel AI | Cryp-em | Autonomous Vehicles | Geospatial AI
GraphCore's IPU
Lambda Labs
NGD Systems: Technology | Solutions - High Compute Storage, Scalable Computational Storage [deadlink] | NGD Systems: Ensuring AI Advancement with Intelligent Storage

Grid computing / Super computing

Grid Engine: wikipedia | Univa website | Datasheet
BOINC - High-Throughput Computing with BOINC | Tech Docs | Download BOINC | GitHub
Cray Computers - Supercomputing as a Service

Cloud services

vast.ai - GPU Sharing Economy. One simple interface to find the best cloud GPU rentals. Reduce cloud compute costs by 3X to 5X
paperspace - The first cloud built for the future. Powering next-generation applications and cloud ML/AI pipelines. Paperspace is built to scale with your team - pay as you go option for individuals.
valohai | docs | blogs | GitHub | Videos | Showcase | Slack | @valohaiai - Valohai is a machine learning platform. It runs your experiments in the cloud, tracks your experiment history and streamlines data science workflows. DEEP LEARNING MANAGEMENT PLATFORM. Machine Orchestration, Version Control and Pipeline Management for Deep Learning.
Lambda Cloud GPU Instances - GPU Instances for Deep Learning & Machine Learning
NavOps - Cloud Migration for HPC | Datasheet
Verne Global: HPC Cloud | NVidia DGX Ready
Weights and Biases | Learn more about WandB

Tools

snakemake - The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Slides | PyPi
plz - Plz (pronounced "please") runs your jobs storing code, input, outputs and results so that they can be queried programmatically.
valohai | docs | blogs | GitHub | Videos | Showcase | Slack - Valohai is a machine learning platform. It runs your experiments in the cloud, tracks your experiment history and streamlines data science workflows. DEEP LEARNING MANAGEMENT PLATFORM. Machine Orchestration, Version Control and Pipeline Management for Deep Learning.
Seldon - Model deployment platform, on kubernetes clusters. | docs | github | use-cases | blogs | videos
kedra | docs | Kedro-Viz | kedro-examples - Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned.
Lambda Stack - One-line installation of TensorFlow, Keras, Caffe, Caffe, CUDA, cuDNN, and NVIDIA Drivers for Ubuntu 16.04 and 18.04.
Apache Airflow - Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies.
Nextflow - Data-driven computational pipelines. Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.
StackHPC suites of repositories: AI, ML, DL, Cloud, HPC | StackHPC
cortex - Machine learning deployment platform: Deploy machine learning models to production
See also: Data > Programs and Tools

CPU

Probing the CPU (Linux/MacOS)
- libcpuid
- Zero overhead performance capturing: use /proc/interrupts and /proc/softirqs
- Non-zero overhead, less accurate: use the PMU (capture on- and off-core events)
Probing the CPU (Windows)
- perfview - general profiling on Windows
- perfview for .net - excellent overview by Sasha Goldshtein
Intel
- Intel® Developer Zone
- Intel® AI Developer Home Page
- Intel® AI Developer Webinar Series | All webinars listing
- The PlaidML Tensor Compiler - webinar
- nGraph - Unlocking next-generation performance with deep learning compilers: webinar | slides | homepage | github
- Also see Intel in Courses

Thanks to the great minds on the mechanical sympathy mailing list for their responses to my queries on CPU probing.

FPGA

Using FPGAs for Datacenter Acceleration | Windows AI | Intel® Distribution of OpenVINO™ Toolkit: Develop Multiplatform Computer Vision Solutions
Also see FPGA in Courses

GPU

Know your GPU
GPU Server 1 of 2 | GPU Server 2 of 2 | Applications of GPU servers - checkout the manufacturers
Embedded Vision Solutions for NVIDIA Jetson Series | Embedded Vision Family Brochure
Avermedia Box PC & Carrier (works with NVidia Jetson): 1 | 2

TPU

How to harness the Powers of the Cloud TPU
How-tos
All tutorials
Command-line interface
- https://cloud.google.com/sdk/gcloud/reference/compute/tpus/
- https://cloud.google.com/tpu/docs/custom-setup
Cloud TPU tools
Performance Guide
TPU Estimator API
Using BFloat
Advanced Guide to Inception V3 on Cloud TPU
Examples

IPU

GraphCore | Videos: Simon Knowles - More complex models and more powerful machines | Graphcore tech Concept | A new kind of hardware designed for machine intelligence - GraphCore Presentations | V‍ID‌EO‌‍: SCA‌LING‌‍ THRO‌UG‍HP‌‍U‌T P‍R‌O‍C‍ESSO‌‍RS FO‌‍R‍ MAC‌HINE INTELLIG‌ENC‌‍E

Performance

MLPerf - Fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services.
MLPerf introduces machine learning inference benchmark suite...
ONE DEEP LEARNING BENCHMARK TO RULE THEM ALL
mlbench: Distributed Machine Learning Benchmark - A public and reproducible collection of reference implementations and benchmark suite for distributed machine learning algorithms, frameworks and systems.
EEMBC MLMark Benchmark - The EEMBC MLMark benchmark is a machine-learning (ML) benchmark designed to measure the performance and accuracy of embedded inference.
DeepOBS: A Deep Learning Optimizer Benchmark Suite
PMLB - a large benchmark suite for machine learning evaluation and comparison
Deep Learning Benchmarking Suite | HPE Deep Learning Cookbook

Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.

Back to main page (table of contents)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cloud, DevOps, Infra

System / Infra

Compute & Storage

Grid computing / Super computing

Cloud services

Tools

CPU

FPGA

GPU

TPU

IPU

Performance

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cloud, DevOps, Infra

System / Infra

Compute & Storage

Grid computing / Super computing

Cloud services

Tools

CPU

FPGA

GPU

TPU

IPU

Performance

Contributing