This is a reading list for Bilingual Lexicon Induction (BLI), also known as Word Translation, Bilingual Lexicon Extraction, Bilingual Dictionary Induction, and so forth, closely related to the topic of Cross-Lingual Word Embeddings (CLWEs). The list mainly includes 2018-2023 publications. Frequently updated. Pull requests and discussions are welcome!
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation (NAACL 2015)
Chao Xing, Dong Wang, Chao Liu, Yiye Lin
[Paper]
Comments: Beginners could also refer to Procrustes on Wikipedia and our sample code ./SampleCode.py.
Word Translation Without Parallel Data (ICLR 2018)
Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
[Paper]
[Code]
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion (EMNLP 2018)
Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, Edouard Grave
[Paper]
[Code]
A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings (ACL 2018)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper]
[Code]
Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations (AAAI 2018)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper]
[Code]
Comments: VecMap supports unsupervised (its ACL 2018 paper), semi-supervised and supervised (its AAAI 2018 paper) BLI settings.
Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022)
Yaoyiran Li, Fangyu Liu, Nigel Collier, Anna Korhonen, Ivan Vulić
[Paper]
[Code]
Comments: New (2022) state-of-the-art method for semi-supervised and supervised BLI!
On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023)
Yaoyiran Li, Anna Korhonen, Ivan Vulić
[Paper]
[Code]
Comments: Prompt multilingual LLMs for BLI. Achieves new (2023) state-of-the-art BLI performance on many language pairs!
Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces (ACL 2019)
Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, Graham Neubig
[Paper]
[Code]
Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach (TACL 2019)
Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra
[Paper]
[Code]
LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space (EMNLP 2020)
Tasnim Mohiuddin, M Saiful Bari, Shafiq Joty
[Paper]
[Code]
Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces (ACL 2020)
Goran Glavaš, Ivan Vulić
[Paper]
[Code]
Combining Static Word Embeddings and Contextual Representations for Bilingual Lexicon Induction (Findings of ACL 2021)
Jinpeng Zhang, Baijun Ji, Nini Xiao, Xiangyu Duan, Min Zhang, Yangbin Shi, Weihua Luo
[Paper]
[Code]
It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT (BlackboxNLP Workshop 2020)
Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg
[Paper]
[Code]
Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization (ACL 2019)
Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber
[Paper]
[Code]
Normalization of Language Embeddings for Cross-Lingual Alignment (ICLR 2022)
Prince Osei Aboaggye, Yan Zheng, Junpeng Wang, Michael Yeh, Wei Zhang, Liang Wang, Hao Yang, Jeff M. Phillips
[Paper]
[Code]
Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework (ICLR 2020)
Zirui Wang+, Jiateng Xie+, Ruochen Xu, Yiming Yang, Graham Neubig, Jaime Carbonell (+: equal contribution)
[Paper]
[Code]
Filtered Inner Product Projection for Crosslingual Embedding Alignment (ICLR 2021)
Vin Sachidananda, Ziyi Yang, Chenguang Zhu
[Paper]
[Code]
Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction (ACL 2020)
Mladen Karan, Ivan Vulić, Anna Korhonen, Goran Glavaš
[Paper]
[Code]
Visual Grounding in Video for Unsupervised Word Translation (CVPR 2020)
Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh, Lucas Smaira, Mateusz Malinowski, Joao Carreira, Phil Blunsom, Andrew Zisserman
[Paper]
[Code]
A Relaxed Matching Procedure for Unsupervised BLI (ACL 2020)
Xu Zhao, Zihao Wang, Yong Zhang, Hao Wu
[Paper]
[Code]
A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction (ACL 2020)
Shuo Ren, Shujie Liu, Ming Zhou, Shuai Ma
[Paper]
Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing (NAACL 2019)
Tal Schuster, Ori Ram, Regina Barzilay, Amir Globerson
[Paper]
[Code]
Multilingual Alignment of Contextual Word Representations (ICLR 2020)
Steven Cao, Nikita Kitaev, Dan Klein
[Paper]
Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment (ACL 2021)
Haoyue Shi, Luke Zettlemoyer, Sida I. Wang
[Paper]
[Code]
Bilingual Lexicon Induction through Unsupervised Machine Translation (ACL 2019)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper]
[Code]
Unsupervised Alignment of Embeddings with Wasserstein Procrustes (AISTATS 2019)
Edouard Grave, Armand Joulin, Quentin Berthet
[Paper]
[Code]
Gromov-Wasserstein Alignment of Word Embedding Spaces (EMNLP 2018)
David Alvarez-Melis, Tommi Jaakkola
[Paper]
[Code]
Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation (NAACL 2021)
Xutan Peng, Chenghua Lin, Mark Stevenson
[Paper]
[Code]
A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction (COLING 2020)
Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, Shujian Huang, Tong Xiao, Jingbo Zhu
[Paper]
Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions (Preprint 2021)
Haoran Xu, Philipp Koehn
[Paper]
[Code]
Learning a Reversible Embedding Mapping using Bi-Directional Manifold Alignment (Findings of ACL 2021)
Ashwinkumar Ganesan, Francis Ferraro, Tim Oates
[Paper]
[Code]
Interactive Refinement of Cross-Lingual Word Embeddings (EMNLP 2020)
Michelle Yuan+, Mozhi Zhang+, Benjamin Van Durme, Leah Findlater, Jordan Boyd-Graber (+: equal contribution)
[Paper]
[Code]
Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022)
Yaoyiran Li, Fangyu Liu, Ivan Vulić+, Anna Korhonen+ (+: equal contribution)
[Paper]
[Code]
IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces (EMNLP 2022)
Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn
[Paper]
[Code]
Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction (TASLP 2023)
Hailong Cao, Liguo Li, Conghui Zhu, Muyun Yang, Tiejun Zhao
[Paper]
[Code]
CD-BLI: Confidence-Based Dual Refinement for Unsupervised Bilingual Lexicon Induction (NLPCC 2023)
Shenglong Yu, Wenya Guo, Ying Zhang, Xiaojie Yuan
[Paper]
RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction (EMNLP 2022)
Zhoujin Tian, Chaozhuo Li, Shuo Ren, Zhiqiang Zuo, Zengxuan Wen, Xinyue Hu, Xiao Han, Haizhen Huang, Denvy Deng, Qi Zhang, Xing Xie
[Paper]
[Code]
ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting (IJCNLP-AACL 2023)
Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismail Berrada, Ahmed Khoumsi
[Paper]
[Code]
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions (ACL 2019)
Goran Glavaš, Robert Litschko, Sebastian Ruder, Ivan Vulić
[Paper]
[Code]
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings? (EMNLP 2019)
Ivan Vulić, Goran Glavaš, Roi Reichart, Anna Korhonen
[Paper]
[Code]
On the Limitations of Unsupervised Bilingual Dictionary Induction (ACL 2018)
Anders Søgaard, Sebastian Ruder, Ivan Vulić
[Paper]
Are All Good Word Vector Spaces Isomorphic? (EMNLP 2020)
Ivan Vulić, Sebastian Ruder, Anders Søgaard
[Paper]
[Code]
A Survey of Cross-Lingual Word Embedding Models (JAIR 2019)
Sebastian Ruder, Ivan Vulić, Anders Søgaard
[Paper]
Should All Cross-Lingual Embeddings Speak English? (ACL 2020)
Antonios Anastasopoulos, Graham Neubig
[Paper]
[Code]
Understanding Linearity of Cross-Lingual Word Embedding Mappings (TMLR 2022)
Xutan Peng, Mark Stevenson, Chenghua Lin, Chen Li
[Paper]
[Code]
Comments: