Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perplexity : faster Winogrande via batching #5024

Merged
merged 3 commits into from
Jan 19, 2024
Merged

Conversation

ggerganov
Copy link
Owner

Similar to #5017 but for the Winogrande evaluation. Here the performance gain is bigger since the tasks are smaller and we can fit more in a batch.

I'm not 100% sure about the correctness of this implementation - the final value is different from master:

# master
Final Winogrande score(1267 tasks): 67.6401 +/- 1.3149

# PR
Final Winogrande score(1267 tasks): 69.1397 +/- 1.2982

It's possible that it fixes some existing issue, but it is also possible that it introduces a bug. The softmax optimization is not applied yet - it can be applied in a follow-up PR

@ggerganov ggerganov requested a review from ikawrakow January 18, 2024 19:39
examples/perplexity/perplexity.cpp Show resolved Hide resolved
batch.logits + i,
0, 0, 0, // unused
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The llama_kv_cache_seq_rm call is no longer needed here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed because we clear the entire KV cache before each batch:

llama_kv_cache_clear(ctx);

In the old implementation, it was reusing tokens from a previous batch, so the llama_kv_cache_seq_rm was used to evict the unused ones (i.e. the second sentence).

@ikawrakow ikawrakow added the performance Speed related topics label Jan 19, 2024
@ggerganov
Copy link
Owner Author

The score with Mistral 7B has changed in the "positive" direction as well, closer to the HF leaderboard results:

# master
Final Winogrande score(1267 tasks): 73.6385 +/- 1.2383

# PR
Final Winogrande score(1267 tasks): 74.5856 +/- 1.2236

@ggerganov ggerganov merged commit 8b20858 into master Jan 19, 2024
33 of 46 checks passed
@ggerganov ggerganov deleted the gg/winogrande-batched branch January 19, 2024 08:45
@ikawrakow
Copy link
Contributor

The score with Mistral 7B has changed in the "positive" direction as well, closer to the HF leaderboard results:

Yes, I saw this change, but could not understand the reason.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
* perplexity : faster Winogrande via batching

ggml-ci

* perplexity : remove unused function

* perplexity : only tokenize selected tasks for Winogrande
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* perplexity : faster Winogrande via batching

ggml-ci

* perplexity : remove unused function

* perplexity : only tokenize selected tasks for Winogrande
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Speed related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants