Parameter counts of several recently released pretrained language models.
source: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
모두의 연구소 풀잎스쿨 11기 밑바닥부터 더 딥하게 배워보자 딥러닝 서브스터디
매주 1회 진행 (20/5/28~7/10)
- Understanding LSTM Networks (blog post overview)
- The Unreasonable Effectiveness of Recurrent Neural Networks (blog post overview)
모두의 연구소 풀잎스쿨 11.5기 beyondBERT
매주 1회 진행 (20/06/20~8/29)
- The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
- How multilingual is Multilingual BERT?
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- Data Augmentation using Pre-trained Transformer Models
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models
- Unsupervised Data Augmentation for Consistency Training -> Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
- You Impress Me: Dialogue Generation via Mutual Persona Perception
- Recipes for building an open-domain chatbot
- ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues
- A Simple Language Model for Task-Oriented Dialogue
- ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
- FastBERT: a Self-distilling BERT with Adaptive Inference Time
- PoWER-BERT: Accelerating BERT inference for Classification Tasks
- TinyBERT: Distilling BERT for Natural Language Understanding
- GPT3: Language Models are Few-Shot Learners
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
NLP paper reading 및 model implementation 스터디
매주 1회 (20/07/20~현재 진행중)
* Transformer
- 구현 언어:
- 구현 레러펀스:
- code:
* ELMO
- 구현 언어:
- 구현 레러펀스:
- code:
* GPT
- 구현 언어:
- 구현 레러펀스:
- code:
* BERT
- 구현 언어:
- 구현 레러펀스:
- code:
* GPT2
- 구현 언어:
- 구현 레러펀스:
- code:
- Study Planning
- Transformer: architecture
- Transformer: label smoothing/beam search
- Transformer: trainning/multi-GPU/experiment
- ELMo paper review
- ELMo char-CNN layer
- model
- model
- model
- model
- model
- model
- model
- model
- model
- model
- model
- model
- BERT
- BERT
- GPT2 paper discussion(1) (~2.2 Input Representation)
- GPT2 paper discussion(2) (3. Experiments~)
- GPT2 paper discussion(2) (3. Experiments~)
- model
- model
- model
- model
- model
- model
- model