Tensorflow implementation of ESIM model presented in paper Enhanced LSTM for Natural Language Inference
mask: The calculation of attention weights and pooling layer should mask the padding words.
num_classes: For simpliticy, we set the number of label classes as 2. Of cause, you can change it to satisfy your own data.
Please let me know, if you encounter any problems.