This repo is defines two AWS Lambda functions and contains the Jupyter notebook used to train the model. The first Lambda is used to generate prediction tokens from an ML inference model trained on MIDI files hosted by the companion Github repo, /midi-archive. The second Lambda takes those tokens, converts it to MIDI, and saves the file to an S3 Bucket. These two Lambdas are chained together in prod via an AWS Step Function.
The ML model follows a simple decoder-only transformer model, built on the PyTorch library, and is based on Andrej Karpathy's Neural Net lecture series.
Created while I was at Recurse Center in the fall of 2023!
I built this project as an exploration of low-cost deployment of ML models. I decided to use Lambda (as opposed to Azure, Sagemaker, EC2, etc) because of the resource limitations and ease of scaling down uptime, and thought it would be interesting to use these limitations to explore tradeoffs in training the model. For example, the more sophisticated I make the model (increased number of parameters), the longer it takes to generate a sequence of tokens. My current plan is to only have the Lambda run once a day (staying within AWS's free tier), and since Lambda functions are limited to 15 minutes of execution time, that means a more sophisticated model may end up generating less than a minute of music. At the moment, I'm still balancing these tradeoffs.
For more sophisticated approaches to MIDI Machine Learning, check out this article on using GPT-2 with MIDI, or this huggingface model
Also, see MIDI Archive for more notes on the underlying training set used for this model. While current state of the art is focused on text, there's also already been considerable work done on MIDI, including OpenAI's Musenet and Magenta's Music Transformer. Both of these models use a MIDI training set called Maestro that often comes up as a canonic MIDI archive. However, developing the training set is a critical step in the design of ML systems, and I wanted to take the opportunity to both explore the design decisions that go into developing a ML model alongside an archive.
flowchart TD
A(Scrape MIDI files from early web) --> Z(Present model outout and interactive MIDI archive on Eleventy static site)
A --> B(Process MIDI into tokens for training neural net)
B --> C(Fetch tokens and tokenizer)
C --> D(Train neural net model)
D --> E(Export model to ONNX)
E --> F(Deploy to AWS Lambda)
F --> |Daily Cloudwatch Trigger|G(Generate MIDI sequence and save to S3)
G --> Z
Notebook for training and exporting the model in the repo at /training-notebook
, and on Colab
MIDI Archive
repo is responsible for furnishing the training set in the form of .midi files and tokenized .json files- In this repo, run the notebook to generate tokenizer config and tokenized dataset
- Assuming lack of access to local compute resources, run training on Colab, using pre-tokenized dataset
- After training, export the trained model to ONNX
- use the deploy script in
/midi-save
to push the model up to an AWS Lambda Layer - update the layer version in the Lambda function
- updated model is now deployed!
- Used
docker run --rm -v $(pwd):/package highpoints/aws-lambda-layer-zip-builder miditok
to generate .zip with miditok dependency (https://github.com/HighPoint/aws-lambda-onnx-model)