Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle 0 dimension output for generate endpoint #6833

Merged
merged 3 commits into from
Jan 29, 2024
Merged

Conversation

krishung5
Copy link
Contributor

@krishung5 krishung5 commented Jan 25, 2024

For the TRT-LLM backend, it is possible that the output tensor has the 0 dim shape if the end token is predicted at the first step. This PR fixed the handle for 0 dim output, and added test case for it.

Before the fix, when using generate/generate_stream endpoint for zero dim tensors:
generate

root@a2826b5-lcedt:/opt/tritonserver/tensorrtllm_backend# curl -X POST localhost:8000/v2/models/ensemble/generate --data-binary @sample.txt
{"error":"attempt to access non-existing array index '0'"}

generate_stream

root@a2826b5-lcedt:/opt/tritonserver/tensorrtllm_backend# curl -X POST localhost:8000/v2/models/ensemble/generate_stream --data-binary @sample
.txt
data: {"error":"attempt to access non-existing array index '0'"}

After the fix:
generate

root@a2826b5-lcedt:/opt/tritonserver/tensorrtllm_backend# curl -X POST localhost:8000/v2/models/ensemble/generate --data-binary @sample.txt
{"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":[]}

generate_stream

root@a2826b5-lcedt:/opt/tritonserver/tensorrtllm_backend# curl -X POST localhost:8000/v2/models/ensemble/generate_stream --data-binary @sample
.txt
data: {"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":[]}

sample.txt looks like this:

{
  "text_input": "I have to work on an AI project. So now, I will explain the project I have to do first :The data is divided into three parts. \n\n1. Training data: train.csv \n2. Test data: test.csv \n3. Submission file: sample\\_submission.csv \n\nThe train.csv file consists of 3 columns of id, text and label, and the test.csv file consists of 2 columns of id and text. The sample\\_submission.csv file consists of id and label columns second: There are 8 labels in total. The data is news article. I want to make a model to classify this. First of all, I want to know the number of classes in the train.csv file. I mean, I want to make sure the class is in balance. I'm talking about EDA. Can you make this code first? and at this code please add about Visualization and also printing each class count.",
  "max_tokens": 1024,
  "bad_words": "",
  "stop_words": "",
  "end_id": 40
}

@krishung5 krishung5 changed the title Krish generate Handle 0 dimension output for generate endpoint Jan 25, 2024
@krishung5 krishung5 merged commit f0d788b into main Jan 29, 2024
3 checks passed
@krishung5 krishung5 deleted the krish-generate branch January 29, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants