-
University of Amsterdam
- Amsterdam
- https://davidstap.github.io
- @davidstap
Stars
Solve puzzles. Improve your pytorch.
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
MTEB: Massive Text Embedding Benchmark
Minimum Bayes Risk Decoding for Hugging Face Transformers
DSPy: The framework for programming—not prompting—language models
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
Code for paper "Nearest Neighbor Knowledge Distillation for Neural Machine Translation" by Zhixian Yang, Renliang Sun, and Xiaojun Wan. This paper is accepted by NAACL 2022 Main Conference.
Gale-Church sentence aligner with options for variable parameters
A template repo for Python packages
Well documented, unit tested, type checked and formatted implementation of a vanilla transformer - for educational purposes.
A tool that locates, downloads, and extracts machine translation corpora
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
A Zotero plugin for syncing items and notes into Notion
Create a Notion collection, synced with Zotero.
A Python library for working with and comparing language codes.
Style guides for Google-originated open-source projects
Useful localization tools with Python API for building localization & translation systems
NLQuAD: A Non-Factoid Long Question Answering Data Set. To be published at EACL2021
Python port of Moses tokenizer, truecaser and normalizer
PRML algorithms implemented in Python
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch