Chatbot using PyTorch with NLTK(Natural Language Toolkit)
Reads from json file and compares user inputs with specifically trained questions in different categories to determine the correct category it most closely falls into, then replies with a random answer from a number of preset responses according to the category of question
Currently set up as a chat bot for an online shop that sells coffee and tea with minimal data trained with
python chat.pyType quit to exit
Create new environment in Anaconda
conda create --name pytorch python=3.8In Anaconda console
conda activate pytorch
pip install nltk
Download nltk tokenizer package Uncomment comment in nltk_utils.py
python nltk_utils.pypython train.pyData saved in data.pth
Our NLP Preprocessing Pipeline
"Is anyone there?"
-> tokenize
["Is","anyone","there","?"]
-> lower + stem
["is","anyon","there","?"]
-> exclude punctuation characters
["is","anyon","there"]
-> bag of words
[0,0,0,1,0,1,0,1]\
- Theory + NLP concepts (stemming, tokenization, bag of words)
- Create training data
- PyTorch model and training
- Save/load model and implement the chat
Feed Forward Neural Network
- Two layers
- Takes bag of words input, pass through layer with number of patterns as input size, then a hidden layer, then the output size must be number of classes, then apply softmax probability to each classes
Bag of words
Tokenization: Splitting a string into meaningful units (e.g. words, punctuation characters, numbers) "what would you do with $1,000,000?" ->["what","would","you","do","with","$","1,000,000","?"]
Stemming: Generate the root form of the words. Crude heuristic that chops off the ends of words "organize","organizes","organizing" ->["organ","organ","organ"]