Skip to content

rlitschk/UnsupCLIR

Repository files navigation

Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only

This project is the codebase for our paper "Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only" (UnsupCLIR) accepted at SIGIR'18. We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all. Our experiments use the standard CLEF CLIR collections and outperform baselines that utilize cross-lingual embeddings relying on word- and document-level alignments.

Preprint: https://arxiv.org/abs/1805.00879

Getting Started

In order to get started follow these steps:

Expected directory sturcture:

└── UnsupCLIR [HOME]
    ├── Data
    │   └── CLEF
    │       ├── DocumentData
    │       │   ├── dutch
    │       │   │   ├── algemeen_dagblad
    │       │   │   └── nrc_handelsblad
    │       │   ├── finnish
    │       │   │   └── aamu
    │       │   └── italian
    │       │       ├── la_stampa
    │       │       ├── sda_italian_94
    │       │       └── sda_italian_95
    │       ├── RelAssess
    │       │   ├── 2001
    │       │   ├── 2002
    │       │   └── 2003
    │       └── Topics
    │           ├── 2001
    │           ├── 2002
    │           └── 2003
    ├── Embeddings
    │   ├── Conneau
    │   │   ├── enfi
    │   │   ├── enit
    │   │   └── ennl
    │   ├── Smith
    │   │   ├── enfi
    │   │   ├── enit
    │   │   └── ennl
    │   └── Vulic
    │       ├── enfi
    │       ├── enit
    │       ├── ennl
    └── Results

Reference

Reference to cite when you use UnsupCLIR in a research paper:

@inproceedings{LGPV18,
  title={Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only},
  author={Litschko, Robert and Glava\v{s}, Goran and Ponzetto, Simone Paolo and Vuli\'c, Ivan},
  booktitle={Proceedings of SIGIR},
  year={2018},
}

Licence

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Releases

No releases published

Packages

No packages published

Languages