Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
-
Updated
Aug 13, 2023 - Python
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
In this project, I worked with a small corpus consisting of simple sentences. I tokenized the words using n-grams from the NLTK library and performed word-level and character-level one-hot encoding. Additionally, I utilized the Keras Tokenizer to tokenize the sentences and implemented word embedding using the Embedding layer. For sentiment analysis
Add a description, image, and links to the word-level-language-model topic page so that developers can more easily learn about it.
To associate your repository with the word-level-language-model topic, visit your repo's landing page and select "manage topics."