Translate some languages as Japanese, Korean, Chinese, English, French, German, Vietnamses from text, website use to Python, C
Load dataset:
def read_words(inputfile):
with open(inputfile, 'r') as f:
while True:
buf = f.read(10240)
if not buf:
break
# The word end on a space (word boundary)
while not str.isspace(buf[-1]):
ch = f.read(1)
if not ch:
break
buf += ch
# default split a string by spaces
words = buf.split()
for word in words:
yield word
yield '' # handle the scene that the file is empty
if __name__ == "__main__":
for word in read_words('./very_large_file.txt'):
process(word)
Flightstar's dataset: https://www.kaggle.com/flightstar/datasets
Japanese to English Machine Translation using Preordering and Compositional Distributed Semantics, http://www.aclweb.org/anthology/W14-7008
Japanese-to-English Machine Translation Using Recurrent Neural Networks, https://cs224d.stanford.edu/reports/GreensteinEric.pdf
Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy, https://arxiv.org/pdf/1704.04520.pdf
Neural Machine Translation (seq2seq), https://github.com/tensorflow/nmt
Effective Domain Mixing for Neural Machine Translation, https://www-cs.stanford.edu/~rpryzant/data/papers/translation_2017.pdf
Translating Phrases in Neural Machine Translation, http://aclweb.org/anthology/D17-1149
Domain Adaptation and Attention-Based Unknown Word Replacement in Chinese-to-Japanese Neural Machine Translation, http://www.aclweb.org/anthology/W16-4605
Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary, http://xingshi.me/data/pdf/ACL2017short.pdf
Effective Approaches to Attention-based Neural Machine Translation, https://arxiv.org/pdf/1508.04025.pdf