Open
Description
Dear Prof. Chen Liang,
We're working on a project involving language model compression and your work with TED has been insightful. We are now trying to compare TED with KD and LWD techniques, and evaluate TED on GLUE tasks.
We are having difficulty reproducing the GLUE benchmark results as reported in your paper. If possible, could you share the baseline KD and LWD frameworks code and the code for TED evaluation on GLUE?
If sharing the code is not feasible, could you please provide the hyperparameters used in your experiments? This would greatly assist our research.
Best,
Chengfei Liu
Metadata
Metadata
Assignees
Labels
No labels