Bilingual Lexicon Induction Reading List

This is a reading list for Bilingual Lexicon Induction (BLI), also known as Word Translation, Bilingual Lexicon Extraction, Bilingual Dictionary Induction, and so forth, closely related to the topic of Cross-Lingual Word Embeddings (CLWEs). The list mainly includes 2018-2023 publications. Frequently updated. Pull requests and discussions are welcome!

Classical Methods (Frequently Used As Baselines)

1. Procrustes

Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation (NAACL 2015)
Chao Xing, Dong Wang, Chao Liu, Yiye Lin
[Paper]

Comments: Beginners could also refer to Procrustes on Wikipedia and our sample code ./SampleCode.py.

2. MUSE

Word Translation Without Parallel Data (ICLR 2018)
Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
[Paper] [Code]

3. RCSLS

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion (EMNLP 2018)
Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, Edouard Grave
[Paper] [Code]

4. VecMap

A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings (ACL 2018)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper] [Code]

Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations (AAAI 2018)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper] [Code]

Comments: VecMap supports unsupervised (its ACL 2018 paper), semi-supervised and supervised (its AAAI 2018 paper) BLI settings.

Recent Progress in BLI: Methodologies

1. ContrastiveBLI

Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022)
Yaoyiran Li, Fangyu Liu, Nigel Collier, Anna Korhonen, Ivan Vulić
[Paper] [Code]

Comments: New (2022) state-of-the-art method for semi-supervised and supervised BLI!

2. Prompt4BLI

On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023)
Yaoyiran Li, Anna Korhonen, Ivan Vulić
[Paper] [Code]

Comments: Prompt multilingual LLMs for BLI. Achieves new (2023) state-of-the-art BLI performance on many language pairs!

3. BLISS

Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces (ACL 2019)
Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, Graham Neubig
[Paper] [Code]

4. GeoMM

Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach (TACL 2019)
Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra
[Paper] [Code]

5. LNMap

LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space (EMNLP 2020)
Tasnim Mohiuddin, M Saiful Bari, Shafiq Joty
[Paper] [Code]

6. InstaMap

Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces (ACL 2020)
Goran Glavaš, Ivan Vulić
[Paper] [Code]

7. CSCBLI

Combining Static Word Embeddings and Contextual Representations for Bilingual Lexicon Induction (Findings of ACL 2021)
Jinpeng Zhang, Baijun Ji, Nini Xiao, Xiangyu Duan, Min Zhang, Yangbin Shi, Weihua Luo
[Paper] [Code]

8. mBERT

It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT (BlackboxNLP Workshop 2020)
Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg
[Paper] [Code]

9. IterNorm

Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization (ACL 2019)
Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber
[Paper] [Code]

10. SpecNorm

Normalization of Language Embeddings for Cross-Lingual Alignment (ICLR 2022)
Prince Osei Aboaggye, Yan Zheng, Junpeng Wang, Michael Yeh, Wei Zhang, Liang Wang, Hao Yang, Jeff M. Phillips
[Paper] [Code]

11. JointAlign

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework (ICLR 2020)
Zirui Wang⁺, Jiateng Xie⁺, Ruochen Xu, Yiming Yang, Graham Neubig, Jaime Carbonell (+: equal contribution)
[Paper] [Code]

12. FIPP

Filtered Inner Product Projection for Crosslingual Embedding Alignment (ICLR 2021)
Vin Sachidananda, Ziyi Yang, Chenguang Zhu
[Paper] [Code]

13. ClassyMap

Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction (ACL 2020)
Mladen Karan, Ivan Vulić, Anna Korhonen, Goran Glavaš
[Paper] [Code]

14. MUVE

Visual Grounding in Video for Unsupervised Word Translation (CVPR 2020)
Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh, Lucas Smaira, Mateusz Malinowski, Joao Carreira, Phil Blunsom, Andrew Zisserman
[Paper] [Code]

15. Bidirectional-RMP

A Relaxed Matching Procedure for Unsupervised BLI (ACL 2020)
Xu Zhao, Zihao Wang, Yong Zhang, Hao Wu
[Paper] [Code]

16. (Methodology)

A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction (ACL 2020)
Shuo Ren, Shujie Liu, Ming Zhou, Shuai Ma
[Paper]

17. CrossLingualELMo

Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing (NAACL 2019)
Tal Schuster, Ori Ram, Regina Barzilay, Amir Globerson
[Paper] [Code]

18. (Methodology)

Multilingual Alignment of Contextual Word Representations (ICLR 2020)
Steven Cao, Nikita Kitaev, Dan Klein
[Paper]

19. Bitext-Lexind

Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment (ACL 2021)
Haoyue Shi, Luke Zettlemoyer, Sida I. Wang
[Paper] [Code]

20. Monoses

Bilingual Lexicon Induction through Unsupervised Machine Translation (ACL 2019)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
[Paper] [Code]

21. (Methodology)

Unsupervised Alignment of Embeddings with Wasserstein Procrustes (AISTATS 2019)
Edouard Grave, Armand Joulin, Quentin Berthet
[Paper] [Code]

22. OTAlign

Gromov-Wasserstein Alignment of Word Embedding Spaces (EMNLP 2018)
David Alvarez-Melis, Tommi Jaakkola
[Paper] [Code]

23. L1-Refinement

Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation (NAACL 2021)
Xutan Peng, Chenghua Lin, Mark Stevenson
[Paper] [Code]

24. (Methodology)

A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction (COLING 2020)
Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, Shujian Huang, Tong Xiao, Jingbo Zhu
[Paper]

25. ContextualMapping

Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions (Preprint 2021)
Haoran Xu, Philipp Koehn
[Paper] [Code]

26. BDMA

Learning a Reversible Embedding Mapping using Bi-Directional Manifold Alignment (Findings of ACL 2021)
Ashwinkumar Ganesan, Francis Ferraro, Tim Oates
[Paper] [Code]

27. CLIME

Interactive Refinement of Cross-Lingual Word Embeddings (EMNLP 2020)
Michelle Yuan⁺, Mozhi Zhang⁺, Benjamin Van Durme, Leah Findlater, Jordan Boyd-Graber (+: equal contribution)
[Paper] [Code]

28. BLICEr

Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022)
Yaoyiran Li, Fangyu Liu, Ivan Vulić⁺, Anna Korhonen⁺ (+: equal contribution)
[Paper] [Code]

29. IsoVec

IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces (EMNLP 2022)
Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn
[Paper] [Code]

30. Dual-BLI

Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction (TASLP 2023)
Hailong Cao, Liguo Li, Conghui Zhu, Muyun Yang, Tiejun Zhao
[Paper] [Code]

31. CD-BLI

CD-BLI: Confidence-Based Dual Refinement for Unsupervised Bilingual Lexicon Induction (NLPCC 2023)
Shenglong Yu, Wenya Guo, Ying Zhang, Xiaojie Yuan
[Paper]

32. RAPO

RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction (EMNLP 2022)
Zhoujin Tian, Chaozhuo Li, Shuo Ren, Zhiqiang Zuo, Zengxuan Wen, Xinyue Hu, Xiao Han, Haizhen Huang, Denvy Deng, Qi Zhang, Xing Xie
[Paper] [Code]

33. ProMap

ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting (IJCNLP-AACL 2023)
Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismail Berrada, Ahmed Khoumsi
[Paper] [Code]

Recent Progress in BLI: Datasets, Benchmarks, Analyses & Surveys

1. XLING (Dataset)

How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions (ACL 2019)
Goran Glavaš, Robert Litschko, Sebastian Ruder, Ivan Vulić
[Paper] [Code]

2. Panlex-BLI (Dataset)

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings? (EMNLP 2019)
Ivan Vulić, Goran Glavaš, Roi Reichart, Anna Korhonen
[Paper] [Code]

3. (Analysis)

On the Limitations of Unsupervised Bilingual Dictionary Induction (ACL 2018)
Anders Søgaard, Sebastian Ruder, Ivan Vulić
[Paper]

4. ISO-Study

Are All Good Word Vector Spaces Isomorphic? (EMNLP 2020)
Ivan Vulić, Sebastian Ruder, Anders Søgaard
[Paper] [Code]

5. (Survey)

A Survey of Cross-Lingual Word Embedding Models (JAIR 2019)
Sebastian Ruder, Ivan Vulić, Anders Søgaard
[Paper]

6. Embeddings (Dataset)

Should All Cross-Lingual Embeddings Speak English? (ACL 2020)
Antonios Anastasopoulos, Graham Neubig
[Paper] [Code]

7. xANLG (Analysis)

Understanding Linearity of Cross-Lingual Word Embedding Mappings (TMLR 2022)
Xutan Peng, Mark Stevenson, Chenghua Lin, Chen Li
[Paper] [Code]

Pull Requests Are Welcome:

No.

** ()**
**
[Paper] [Code]

Comments:

Files

README.md

Latest commit

History