Skip to content

Commit

Permalink
improve text similarity and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
huseinzol05 committed Jun 22, 2019
1 parent e964b2f commit 4545fbf
Show file tree
Hide file tree
Showing 12 changed files with 4,694 additions and 1,979 deletions.
48 changes: 43 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Text classification](text-classification)

Trained on [English sentiment dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/text-classification/data)

1. Basic cell RNN
2. Bidirectional RNN
3. LSTM cell RNN
Expand Down Expand Up @@ -141,6 +143,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Chatbot](chatbot)

Trained on [Cornell Movie Dialog corpus](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/chatbot/dataset.tar.gz)

1. Seq2Seq-manual
2. Seq2Seq-API Greedy
3. Bidirectional Seq2Seq-manual
Expand Down Expand Up @@ -214,7 +218,9 @@ I will attached github repositories for models that I not implemented from scrat

</details>

### [Neural Machine Translation (English to Vietnam)](neural-machine-translation)
### [Neural Machine Translation](neural-machine-translation)

Trained on [500 English-Vietnam](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/neural-machine-translation/vietnam-train)

1. Seq2Seq-manual
2. Seq2Seq-API Greedy
Expand Down Expand Up @@ -287,6 +293,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Embedded](embedded)

Trained on [English sentiment dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/text-classification/data)

1. Word Vector using CBOW sample softmax
2. Word Vector using CBOW noise contrastive estimation
3. Word Vector using skipgram sample softmax
Expand All @@ -301,6 +309,8 @@ I will attached github repositories for models that I not implemented from scrat

### [POS-Tagging](pos-tagging)

Trained on [CONLL POS](https://cogcomp.org/page/resource_view/81)

1. Bidirectional RNN + CRF, test accuracy 92%
2. Bidirectional RNN + Luong Attention + CRF, test accuracy 91%
3. Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 91%
Expand All @@ -312,6 +322,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Entity-Tagging](entity-tagging)

Trained on [CONLL NER](https://cogcomp.org/page/resource_view/81)

1. Bidirectional RNN + CRF, test accuracy 96%
2. Bidirectional RNN + Luong Attention + CRF, test accuracy 93%
3. Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 95%
Expand All @@ -323,6 +335,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Dependency-Parser](dependency-parser)

Trained on [CONLL English Dependency](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/dependency-parser/dev.conll.txt)

1. Bidirectional RNN + Bahdanau Attention + CRF
2. Bidirectional RNN + Luong Attention + CRF
3. Residual Network + Bahdanau Attention + CRF
Expand All @@ -331,13 +345,17 @@ I will attached github repositories for models that I not implemented from scrat

### [Question-Answers](question-answer)

Trained on [bAbI Dataset](https://research.fb.com/downloads/babi/)

1. End-to-End Memory Network + Basic cell
2. End-to-End Memory Network + GRU cell
3. End-to-End Memory Network + LSTM cell
4. Dynamic Memory

### [Stemming](stemming)

Trained on [English Lemmatization](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/stemming/lemmatization-en.txt)

1. LSTM + Seq2Seq + Beam
2. GRU + Seq2Seq + Beam
3. LSTM + BiRNN + Seq2Seq + Beam
Expand All @@ -347,6 +365,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Abstractive Summarization](abstractive-summarization)

Trained on [India news](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/abstractive-summarization/dataset)

1. LSTM Seq2Seq using topic modelling
2. LSTM Seq2Seq + Luong Attention using topic modelling
3. LSTM Seq2Seq + Beam Decoder using topic modelling
Expand All @@ -361,6 +381,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Extractive Summarization](extractive-summarization)

Trained on [random books](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/extractive-summarization/books)

1. Skip-thought Vector
2. Residual Network using Atrous CNN
3. Residual Network using Atrous CNN + Bahdanau Attention
Expand All @@ -371,6 +393,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Speech to Text](speech-to-text)

Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487)

1. Tacotron, https://github.com/Kyubyong/tacotron_asr
2. Bidirectional RNN + Greedy CTC
3. Bidirectional RNN + Beam CTC
Expand All @@ -386,6 +410,8 @@ I will attached github repositories for models that I not implemented from scrat

### [Text to Speech](text-to-speech)

Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487)

1. Tacotron, https://github.com/Kyubyong/tacotron
2. Fairseq + Dilated CNN vocoder
3. Seq2Seq + Bahdanau Attention
Expand All @@ -397,10 +423,14 @@ I will attached github repositories for models that I not implemented from scrat

### [Old-to-Young Vocoder](vocoder)

Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487)

1. Dilated CNN

### [Generator](generator)

Trained on [Shakespeare dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/generator/shakespeare.txt)

1. Character-wise RNN + LSTM
2. Character-wise RNN + Beam search
3. Character-wise RNN + LSTM + Embedding
Expand All @@ -419,20 +449,28 @@ I will attached github repositories for models that I not implemented from scrat

### [Topic Generator](topic-generator)

Trained on [Malaysia news](https://github.com/huseinzol05/Malaya-Dataset/raw/master/news/news.zip)

1. TAT-LSTM
2. TAV-LSTM
3. MTA-LSTM

### [Language-detection](language-detection)

Trained on [Tatoeba dataset](http://downloads.tatoeba.org/exports/sentences.tar.bz2)

1. Fast-text Char N-Grams

### [Text Similarity](text-similarity)

1. Character wise similarity + LSTM + Bidirectional
2. Word wise similarity + LSTM + Bidirectional
3. Character wise similarity Triplet loss + LSTM
4. Word wise similarity Triplet loss + LSTM
Trained on [First Quora Dataset Release: Question Pairs](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs)

1. BiRNN + Contrastive loss, test accuracy 76.50%
2. Dilated CNN + Contrastive loss, test accuracy 72.98%
3. Transformer + Contrastive loss, test accuracy 73.48%
4. Dilated CNN + Cross entropy, test accuracy 72.27%
5. Transformer + Cross entropy, test accuracy 71.1%
6. Transfer learning BERT base + Cross entropy, test accuracy 90%

### [Text Augmentation](text-augmentation)

Expand Down
Loading

0 comments on commit 4545fbf

Please sign in to comment.