Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for stopwords in huggingface handler #1118

Merged
merged 1 commit into from
Oct 4, 2023

Conversation

ydm-amazon
Copy link
Contributor

Description

This adds support for stopwords (also known as stop sequences) to the HuggingFace handler.

@ydm-amazon ydm-amazon requested review from zachgk, frankfliu and a team as code owners September 27, 2023 20:02
@lanking520
Copy link
Contributor

lanking520 commented Sep 29, 2023

class StopSequenceCriteria:
    def __init__(self, stop_sequence: str):
        stop_sequence = re.escape(stop_sequence)
        self.regex = re.compile(f".*{stop_sequence}$")

    def __call__(self, output: str) -> bool:
        if self.regex.findall(output):
            return True
        return False

reference implementation in LMI_Dist

@lanking520
Copy link
Contributor

@KexinFeng could you review

@KexinFeng
Copy link
Contributor

Is there any unittest code that tests this feature?

@KexinFeng
Copy link
Contributor

Need to rebase to the master to get the patch that helps pass the unit test

@ydm-amazon
Copy link
Contributor Author

ydm-amazon commented Oct 3, 2023

Need to rebase to the master to get the patch that helps pass the unit test

Done

Is there any unittest code that tests this feature?

No, but I am starting to work on it. I have added a subtask to Asana. For now, here is an example of how I am testing:

serving.properties:

engine=MPI
option.entryPoint=huggingface.py
option.model_id=bigscience/bloom-560m
option.tensor_parallel_degree=1
option.task=text-generation
option.paged_attention=true
option.dtype=fp16
option.stop_sequence=["<User>", "See you later"]

Model server:

docker run -it --runtime=nvidia --gpus "device=0" --shm-size 4g -v /home/ubuntu/rds:/opt/ml/model  -v /home/ubuntu/model_server_logs:/opt/djl/logs  -p 8080:8080  deepjavalibrary/djl-serving:0.23.0-deepspeed

Example request:

curl http://127.0.0.1:8080/invocations -X POST -d '{"inputs":"<Assistant>: Hi! What can I do for you? <User>: Why apple is red?","parameters":{"max_new_tokens":50}}' -H "Content-type: application/json"

Result:

[
  {
    "generated_text":"<Assistant>: Hi! What can I do for you? <User>: Why apple is red? <Assistant>: I have a problem with the apple. <User>:"
  }
]

Another example request:

curl http://127.0.0.1:8080/invocations -X POST -d '{"inputs":"When User says See you tomorrow, Assistant replies See you later. Assistant: Hi! What can I do for you? User: See you tomorrow","parameters":{"max_new_tokens":50}}' -H "Content-type: application/json"

Result:

[
  {
    "generated_text":"When User says See you tomorrow, Assistant replies See you later. Assistant: Hi! What can I do for you? User: See you tomorrow. User: See you later"
  }
]

@lanking520 lanking520 merged commit 7c9ea81 into deepjavalibrary:master Oct 4, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants