Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Retrieve prompt template in /props #8337

Merged
merged 5 commits into from
Jul 7, 2024

Conversation

bviksoe
Copy link
Contributor

@bviksoe bviksoe commented Jul 6, 2024

This PR adds the following:

  • Expose the actual Jinja2 prompt template of the model in the /props endpoint.
  • Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in llm_load_print_meta function, allowing front-end to read a plethora of model properties.

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.
@ngxson
Copy link
Collaborator

ngxson commented Jul 6, 2024

I'm not very confident about this change. Even when frontend can get the jinja template, how it will be used? I suppose it requires a jinja parser on frontend?

Another way I can think of is that backend can format a chat template with some pre-defined placeholders, like {{user_message}}, and the JS code can do simple string manipulation to get the correct template.

examples/server/server.cpp Show resolved Hide resolved
@bviksoe
Copy link
Contributor Author

bviksoe commented Jul 6, 2024

Even when frontend can get the jinja template, how it will be used? I suppose it requires a jinja parser on frontend?

Yes. Adding a real Jinja parser in c++ has already been discussed I think. It's not likely to be added. But lots of Jinja exist for Python and JavaScript.

The /chat/completions endpoint will surely help standard usage of server, and it does its assumptions about the template content. But for those on the /completion endpoint will want to have access to all model properties, including the template - so they can ensure any model prompt formats correctly.

examples/server/server.cpp Outdated Show resolved Hide resolved
@ngxson ngxson merged commit cb4d86c into ggerganov:master Jul 7, 2024
53 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 8, 2024
* server: Retrieve prompt template in /props

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.

* Make string buffer dynamic

* Add doc and better string handling

* Using chat_template naming convention

* Use intermediate vector for string assignment
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 8, 2024
* server: Retrieve prompt template in /props

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.

* Make string buffer dynamic

* Add doc and better string handling

* Using chat_template naming convention

* Use intermediate vector for string assignment
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 11, 2024
* server: Retrieve prompt template in /props

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.

* Make string buffer dynamic

* Add doc and better string handling

* Using chat_template naming convention

* Use intermediate vector for string assignment
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 11, 2024
* server: Retrieve prompt template in /props

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.

* Make string buffer dynamic

* Add doc and better string handling

* Using chat_template naming convention

* Use intermediate vector for string assignment
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024
* server: Retrieve prompt template in /props

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.

* Make string buffer dynamic

* Add doc and better string handling

* Using chat_template naming convention

* Use intermediate vector for string assignment
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024
* server: Retrieve prompt template in /props

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.

* Make string buffer dynamic

* Add doc and better string handling

* Using chat_template naming convention

* Use intermediate vector for string assignment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants