Skip to content

Conversation

RonalddMatias
Copy link
Contributor

Description

This Pull Request adds support for the DeepSeek R1 Distill Llama 8B and DeepSeek Code Instruct 6.7B models, which are open-source and available on the Hugging Face platform. These additions expand the available options for various NLP and code generation tasks.

Main Changes

  • Added DeepSeek R1 Distill Llama 8B and DeepSeek Code Instruct 6.7B to the list of supported models.
  • Updated configuration files to accommodate the specific parameters of these new models.

Benchmarks Executed

The DeepSeek R1 Distill Llama 8B model was evaluated on NLP tasks such as ENEM Challenge and TweetSent, while the DeepSeek Code Instruct 6.7B model was tested on HumanEval and APPS for code generation. These models demonstrated competitive performance within their respective domains.

By adding these models, we enhance flexibility in choosing state-of-the-art solutions for NLP and code generation tasks.

Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@yifanmai yifanmai merged commit 4ce5078 into stanford-crfm:main Feb 11, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants