-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update Bark FA2 docs #27400
update Bark FA2 docs #27400
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
Overall looks good but the comparison values should be updated to make this more useful for readers.
docs/source/en/model_doc/bark.md
Outdated
Flash Attention 2 is also consistently faster than Better Transformer, and its performance improves even more as batch sizes increase. | ||
|
||
To put this into perspective, you can generate 17 times more text and still be 2s faster than the unoptimized version. At batch size 8, Flash Attention 2 is also 10% faster than Better Transformer, and at batch size 16, 25%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really a comparison: to do this the specific tests, hardware and runtime numbers should be shown to be useful. "17 times more text" is underfined - do you mean tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right oc, I'll be more specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult to be specific without taking too much time since the tokenizer by default pads the input to 256 tokens, but I'll be more specific in my terms whatsoever!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated! what do you think of this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better - but still needs some more information and structure.
For example the graphs we see here for mistral. At the moment you're writing a sentence which is giving one data point on the graph without the context of the test being run, and so can't be used to inform any decisions.
It's difficult to be specific without taking too much time since the tokenizer by default pads the input to 256 tokens, but I'll be more specific in my terms whatsoever!
Could you explain this further? What do you mean by too much time? As in to get the numbers? I don't understand the relation to the max length either. Couldn't you run on a dummy model of small context length and benchmark on that? The generations don't have to be good (they can be nonsense) this is just about providing experimental values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically wanted to avoid making the readme more packed than it is already but a graph will do the trick!
Could you explain this further? What do you mean by too much time? As in to get the numbers? I don't understand the relation to the max length either. Couldn't you run on a dummy model of small context length and benchmark on that? The generations don't have to be good (they can be nonsense) this is just about providing experimental values.
I just meant that I've already run the benchmarks and that counting tokens (i.e getting the average context length) wasn't straightforward due to how the the tokenizer works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor suggestions: would re-order the structure a bit, and maybe explain a bit more where the 17x number comes from.
Think for a high-level comparison these numbers are enough; the docs should serve as an indication of the performance gain we should expect to get, but they're not academic benchmarks, so don't have to be fully water tight.
docs/source/en/model_doc/bark.md
Outdated
<img src="https://huggingface.co/datasets/ylacombe/benchmark-comparison/resolve/main/Bark%20Optimization%20Benchmark.png"> | ||
</div> | ||
|
||
To put this into perspective, on an NVIDIA A100, you can get 17 times the [throughput](https://huggingface.co/blog/optimizing-bark#throughput) and still be 2 seconds faster than the unoptimized, non-batch version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really follow where this 17x number comes from? Are we comparing FA2 batched vs un-optimised non-batched?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, I'll make it clearer
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The graph is great 📈 - thanks for adding!
Thanks! Merging! |
* update Bark FA2 docs * update benchmark section * Update bark.md * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * rephrase --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
What does this PR do?
Following @ArthurZucker's review in #27634, I've added a section on FA2 in Bark readme and a mention of Bark FA2 support in a section on FA2!
Note that this comment #27364 (comment) is already addressed since
self.dropout
is already a float!Before submitting
cc @amyeroberts and @ArthurZucker