Skip to content

Yeezy Taught Me Text Generation. Training next character predictions RNN LSTM model with user input text corpus

Notifications You must be signed in to change notification settings

lucylow/Yeezy-Taught-Me

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yeezy Taught Me Text Generation

Status GitHub Issues GitHub Pull Requests License


Web application for machine learning training Next Character Predictions using Long Short Term Memory Model (LSTM) and Time Series Prediction. Train model to generate random text based on patterns in a given text corpus. As Kanye West said:

Lack of visual empathy, equates the meaning of L-O-V-E.


Table_of_Contents


Motivation

  • LSTM commonly used in industry by companies ike Google, Apple, Microsoft, and Amazon:

    • Time series prediction
    • Speech recognition
    • Music/rhythm learning
    • Handwriting recognition
    • Sign language translation
  • Bloomberg Business Week: “LSTM is arguably the most commercial AI achievement, used for everything from predicting diseases to composing music."


Yeezy_Taught_Me_Application

Web application for artifical intelligence model training and text generation:

Picture of program

Image. Screenshot of the web demo at https://lucylow.github.io/Yeezy-Taught-Me/


Theory: Artifical_Neural_Network

RNN and LSTM and derivatives use mainly sequential processing over time

  • Recurrent Neural Network [RNN]:

    • Used for classifying, processing, and making predictions based on time-series with time, sequence, or anything with a temporal dimension.
    • The decision a recurrent net reached at time step t - 1 affects the decision it will reach one moment later at time step t.
    • RNNs are computationally intensive - recommendation to script on GPU
    • they accept a fixed-sized vector as input (e.g. an image) and produce a fixed-sized vector as output (e.g. probabilities of different classes).
    • RNNS allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both.
  • Long Short Term Memory [LSTM]:

    • Special kind of RNN, capable of learning long-term dependencies that works slightly better in practice than RNN due to its more powerful update equation and backpropagation dynamics.
    • LSTM + Vanilla RNN solve the vanishing gradient problem since units allow gradient flows to be unchanged
      • Vanishing Gradient Problem : Long term information has to sequentially travel through all cells before getting to the present processing cell. This means it can be easily corrupted by being multiplied many time by small numbers < 0.
    • Neural network operates on different scales of time at once and information can be stored in, written to, or read from a cell.
    • Gates are analog with element-wise multiplication by sigmoids, which are all in the range of 0-1. Refer to diagram under " LSTM Model".

RNN and LSTM models

Image. Explanations of how the RNN and LSTM models work.


Theory: LSTM_Model

Written down as a set of equations, LSTMs look pretty intimidating.

[equations for the gates here]

LSTM Unit Map:

  • Cell (value over time interval)
  • Input gate
  • Output gate
  • Forget gate

LSTM cells from https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/recurrent_neural_networks.html

Image. LSTM cells where information can be stored in, written to, or read.


Theory: Text_Generation_Model

The LSTM model operates at the character level. It takes a tensor of shape [numExamples, sampleLen, charSetSize] as the input. The input text data is from "./data" file.

The input is a one-hot encoding of sequences of sampleLen characters. The characters belong to a set of charSetSize unique characters. With the input, the model outputs a tensor of shape [numExamples, charSetSize], which represents the model's predicted probabilites of the character that follows the input sequence.

This process is repeated in order to generate a character sequence of a given length hence the "text generation" part of the project. The randomness (diversity) is controlled by a temperature parameter.

At least 20 epochs (20 cases of the full training set) are required before the generated text starts sounding coherent.


Technical: Input Data for Text Generation

Potential text datasets to test model: https://cs.stanford.edu/people/karpathy/char-rnn/

If Yeezy Taught Me is run on new data, make sure corpus has at least ~100k characters. Ideal situation is ~1M characters.


Technical: Text_Parameters

  • 'Name of the text dataset’ for input file
  • Path to the trained next-char prediction model saved on disk
  • Length of the text to generate
  • Temperature value to use of text generation. Higher values for more random-looking results
  • CUDA GPU for training
  • Step length for how many characters to skip between one example to next

Usage

The web demo supports model training and text generation. The machine model training in done in browser, inference in browser, and the save-load operations are done with an API call to the IndexDB database.

To launch the demo, do:

yarn && yarn watch

If you try this script on new data, make sure your corpus has at least ~100k characters. ~1M is better.


Conclusion

Yeezy taught me well.


References