-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling batch for huggingface handler #857
Conversation
@@ -22,6 +22,7 @@ | |||
from djl_python.inputs import Input | |||
from djl_python.outputs import Output | |||
from djl_python.streaming_utils import StreamingUtils | |||
from djl_python.rolling_batch import RollingBatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not using HuggingFace handler, since you already introducing many other options here. Just create a new rolling entrypoint. @frankfliu thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is all traditional pieces stay here, where rolling batch comes to a new chapter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huggingface.py
should be the entrypoint, but we need consider refactor huggingface.py
to be more modularized and handle more complicated cases.
@@ -22,6 +22,7 @@ | |||
from djl_python.inputs import Input | |||
from djl_python.outputs import Output | |||
from djl_python.streaming_utils import StreamingUtils | |||
from djl_python.rolling_batch import RollingBatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huggingface.py
should be the entrypoint, but we need consider refactor huggingface.py
to be more modularized and handle more complicated cases.
self.sampling = False | ||
self.topp = 0.92 | ||
self.temperature = 1 | ||
self.eos_token_id = kwargs.get('eos_token_id', 50256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not hard code eos_token_id, it should based on eos_token char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is configurable by the concumer of the scheduler. Currently the seqBatchScheduler part doesn't have any tokenizer. So the consumer of the scheduler tokenize the eos_token and set it in this searchConfig.
self.topp = 0.92 | ||
self.temperature = 1 | ||
self.eos_token_id = kwargs.get('eos_token_id', 50256) | ||
self.pad_token_id = kwargs.get('pad_token_id', 50256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why padding id the same as eos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just default value, which can be configurable. Also, the padding is an only placeholder; it won't affect the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we should do is provide a default eos and padding char, and based on tokenizer to figure out pad_token_id and eos_token_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will note this down,and do these in the next PRs frank
engines/python/setup/djl_python/rolling_batch/scheduler_rolling_batch.py
Outdated
Show resolved
Hide resolved
* Rolling batch for huggingface handler
* Rolling batch for huggingface handler
Description
Brief description of what this PR is about