Stars
A quick guide (especially) for trending instruction finetuning datasets
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
Experiments with generating opensource language model assistants
Aligning pretrained language models with instruction data generated by themselves.
COYO-700M: Large-scale Image-Text Pair Dataset
A framework for few-shot evaluation of language models.
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
Must-read papers on prompt-based tuning for pre-trained language models.
Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
This repository is a sub branch of AI Knowledge Tree, mainly focus on Natural Language Processing.
A collection of papers of neural-symbolic AI (mainly focus on NLP applications)
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
Translating natural language questions to a structured query language
Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.
Aspect Based Sentiment Analysis, PyTorch Implementations. 基于方面的情感分析,使用PyTorch实现。
Implementation of additional Masked Language Model (MLM) pretraining for BERT
Unofficial implementation of ConveRT model from PolyAI with no pre-trained encoder
A tensorflow implementation of GAN ( exactly InfoGAN or Info GAN ) to one dimensional ( 1D ) time series data.