Through the use of natural language processing and deep learning methods, I made a model that is capable of predicting the next word of a particular sentence. From the reading of the text from the book Metamorphosis by Franz Kafka found in Project Gutenberg, the construction of a deep neural network is used with recurrent neural networks (LSTM). The objective of this project is to have an approach to NLP through Deep Learning.
In the nxtWordPrediction file I carry out the construction of the model, but first I carry out a data preprocessing with the following characteristics:
- Reading the text with utf8 encoding
- Use of non-repeating words
- Tokenization of the text for use in the model
- Sequential model inputs
- Categorical model outputs
The model architecture is the following:
- Input Embedding layer
- Two LSTM layers
- A fully connected layer and,
- An output layer
The model has 15,7M training parameters aprox and you can download it here
For the prediction of the model I created a notebook that has the following characteristics:
- Model and tokenizer file uploading
- Use of tokenizer for the input sentences for which we should make the predictions on
- Make predictions on the input sentence by using the saved model
Made by @yepedraza