Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor chat template API #6822

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 22, 2024

Based on the discussion from #6391 (comment)

The main idea is to get prefix/postfix of each message based on the role. For example: Chatml uses "<|im_start|>" + role + "\n" as prefix (role is dynamic based on current message); <|im_end|>\n is the postfix.

These things are being considered when I made this proposal:

  1. It allows using multiple chat templates (which is introduced in Support converting models with multiple chat templates #6588)
  2. Prefix/postfix and content can be tokenized separately. This mitigate the risk of injecting special tokens into message content. While there's no API currently using this logic, but we can easily add one in the future.
  3. Since each chat template now having its own enum value, users can extend their logic by using value given by llama_chat_get_typed_template. No arbitrary templates are allowed (user must write their own logic if they want)

Introducing an enum llama_chat_template for templates and a family of functions:

    /// Get the Jinja model saved inside given model
    LLAMA_API int32_t llama_chat_get_model_template(
                const struct llama_model * model,
                              const char * name,
                                    char * buf,
                                 int32_t   length);

    /// Get the enum llama_chat_template based on Jinja template
    LLAMA_API llama_chat_template llama_chat_get_typed_template(const char * tmpl);

    /// Get the format prefix for a given message
    LLAMA_API int32_t llama_chat_get_prefix(
                const llama_chat_template   tmpl,
                               const char * role,
                               const char * prev_role,
                                     char * buf,
                                  int32_t   length);

    /// Get the format postfix for a given message
    LLAMA_API int32_t llama_chat_get_postfix(
                const llama_chat_template   tmpl,
                               const char * role,
                               const char * prev_role,
                                     char * buf,
                                  int32_t   length);

    /// Check if a given template support system message or not
    LLAMA_API bool llama_chat_support_system_message(const llama_chat_template tmpl);

@ngxson
Copy link
Collaborator Author

ngxson commented Apr 22, 2024

@teleprint-me @hanishkvc I finally made this to work. The code is not super clean, but I pay more attention to the API design as it's what we need to bring chat template support to main. Feel free to let me know if something (in API design level) that can be improved.

@ngxson
Copy link
Collaborator Author

ngxson commented Apr 22, 2024

Before moving further, @ggerganov could you please take a look on the API design to see if that's OK for you? Thanks.

@ggerganov
Copy link
Owner

The API seems OK. If you think this is the right way, let's do it

@teleprint-me
Copy link
Contributor

Given the context and circumstances, I think it's a start.

I can absolutely see this getting out of control, though, as I've previously stated, if not done with caution and forethought. This is going to be challenging to manage in the long term simply because there is no way to know or predict what templates will arise, or become preferred, over time.

Overall, I think it's okay as well. It obviously needs work, as that's how all things start. Hopefully we can identify a pattern and then determine how to smooth things out over time. It's better than nothing for now.

@ggerganov
Copy link
Owner

I agree this can get easily over-engineered. I don't have capacity atm to think deeply into this, so we should try to take into account feedback from people using chat templates and at the same time don't try to support all sorts of edge cases that one can think of. Just aim for the stuff that is used most of the time and makes sense. And try to keep the API and implementation separated from the rest of the functionality as much as possible so that it can be easily adapted / replaced in the future if necessary

@hanishkvc
Copy link
Contributor

@ngxson do have a look at the new PR

#6834

which I have uploaded, it uses a simple json to load the expected/supported handshake-template as well as flag to control whether any BoS is prefixed when a user message immidiately follows the system message. Inturn the chat-template-apply which I have added in common/chaton.hpp, handles the same to try provide the required flow in a simple generic way.

Also the json which should work wrt some of the models is in examples/chaton_meta.json

NOTE: Among these, the 1 or 2 model which requires avoiding special tags between system message and 1st user message seems to treat BoS + RoleTagPrefix as a single bunch and expect both to be treated the same way. However some other models may require BoS to be handled specially while RoleTagPrefix to be handled the same always, for that in my logic I will have to add a seperate Begin/BoS entry other than the Prefix entry. and inturn do that selective inserting for Begin.

@hanishkvc
Copy link
Contributor

@ngxson do have a look at the new PR

#6834

which I have uploaded, it uses a simple json to load the expected/supported handshake-template as well as flag to control whether any BoS is prefixed when a user message immidiately follows the system message. Inturn the chat-template-apply which I have added in common/chaton.hpp, handles the same to try provide the required flow in a simple generic way.

Also the json which should work wrt some of the models is in examples/chaton_meta.json

NOTE: Among these, the 1 or 2 model which requires avoiding special tags between system message and 1st user message seems to treat BoS + RoleTagPrefix as a single bunch and expect both to be treated the same way. However some other models may require BoS to be handled specially while RoleTagPrefix to be handled the same always, for that in my logic I will have to add a seperate Begin/BoS entry other than the Prefix entry. and inturn do that selective inserting for Begin.

Have updated my PR, with support for seperate Begin(BoS) and Prefix (RoleIdTag) wrt User role. And in the json one can individually control whether either of them get prepended to 1st user message following the system message. Monarch seems to need it from your server related chat-apply-template, and the same is supported now. Looking at the entries wrt Llama2, Monarch and Llama3, one can see how to configure the entries in the json file, to achive the 3 different possibilities wrt these 3 models.

@ngxson
Copy link
Collaborator Author

ngxson commented Apr 24, 2024

@teleprint-me @ggerganov Thanks for your feedback. I understand that this part can get complicated easily in the future, so these things are being considered when I made this proposal:

  1. It allows using multiple chat templates (which is introduced in Support converting models with multiple chat templates #6588)
  2. Prefix/postfix and content can be tokenized separately. This mitigate the risk of injecting special tokens into message content. While there's no API currently using this logic, but we can easily add one in the future.
  3. Since each chat template now having its own enum value, users can extend their logic by using value given by llama_chat_get_template_type. No arbitrary templates are allowed (user must write their own logic if they want)

The only downside is that now the code is no longer linear. That means adding new template now requires a bit of "brain gym" to convert from jinja to prefix/postfix. Still, it is better than tricking llama_chat_apply_template to output the correct thing (as demo in #6810)

Edit: Please also note that though multiple issue on the subject of chat templates, I've seen many proposals related to having postfix/prefix based on role. This PR will be the first one to bring that idea into the core API.

@ngxson
Copy link
Collaborator Author

ngxson commented Apr 24, 2024

@ggerganov @phymbert I got a weird issue on the CI workflow where the master branch get merged automatically to the code on CI. Do you have some clue about that? Thanks. https://github.com/ggerganov/llama.cpp/actions/runs/8817445642/job/24203848022?pr=6822

@ngxson ngxson marked this pull request as ready for review April 24, 2024 16:18
@ngxson ngxson changed the title Refactor chat template API (WIP) Refactor chat template API Apr 24, 2024
@ngxson ngxson requested a review from ggerganov April 24, 2024 16:22
Copy link
Contributor

github-actions bot commented Apr 24, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 204 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=23519.97ms p(95)=44773.36ms fails=, finish reason: stop=82 truncated=122
  • Prompt processing (pp): avg=277.25tk/s p(95)=809.3tk/s
  • Token generation (tg): avg=19.06tk/s p(95)=25.73tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=xsn/chat_template_prefix_postfix commit=476d319fde0ae6c6a2ed9cfe54e548ad812fe5a5

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 204 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1713981570 --> 1713982200
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 257.93, 257.93, 257.93, 257.93, 257.93, 265.42, 265.42, 265.42, 265.42, 265.42, 352.97, 352.97, 352.97, 352.97, 352.97, 446.55, 446.55, 446.55, 446.55, 446.55, 472.12, 472.12, 472.12, 472.12, 472.12, 484.58, 484.58, 484.58, 484.58, 484.58, 484.25, 484.25, 484.25, 484.25, 484.25, 483.04, 483.04, 483.04, 483.04, 483.04, 484.11, 484.11, 484.11, 484.11, 484.11, 505.07, 505.07, 505.07, 505.07, 505.07, 526.65, 526.65, 526.65, 526.65, 526.65, 529.27, 529.27, 529.27, 529.27, 529.27, 549.82, 549.82, 549.82, 549.82, 549.82, 559.64, 559.64, 559.64, 559.64, 559.64, 563.95, 563.95, 563.95, 563.95, 563.95, 564.42, 564.42, 564.42, 564.42, 564.42, 563.66, 563.66, 563.66, 563.66, 563.66, 563.73, 563.73, 563.73, 563.73, 563.73, 565.79, 565.79, 565.79, 565.79, 565.79, 568.42, 568.42, 568.42, 568.42, 568.42, 576.1, 576.1, 576.1, 576.1, 576.1, 576.0, 576.0, 576.0, 576.0, 576.0, 586.45, 586.45, 586.45, 586.45, 586.45, 586.48, 586.48, 586.48, 586.48, 586.48, 588.24, 588.24, 588.24, 588.24, 588.24, 580.94, 580.94, 580.94, 580.94, 580.94, 582.75, 582.75, 582.75, 582.75, 582.75, 583.04, 583.04, 583.04, 583.04, 583.04, 598.11, 598.11, 598.11, 598.11, 598.11, 597.59, 597.59, 597.59, 597.59, 597.59, 607.22, 607.22, 607.22, 607.22, 607.22, 617.07, 617.07, 617.07, 617.07, 617.07, 619.97, 619.97, 619.97, 619.97, 619.97, 619.05, 619.05, 619.05, 619.05, 619.05, 623.89, 623.89, 623.89, 623.89, 623.89, 625.73, 625.73, 625.73, 625.73, 625.73, 630.39, 630.39, 630.39, 630.39, 630.39, 630.67, 630.67, 630.67, 630.67, 630.67, 627.19, 627.19, 627.19, 627.19, 627.19, 624.68, 624.68, 624.68, 624.68, 624.68, 623.9, 623.9, 623.9, 623.9, 623.9, 623.79, 623.79, 623.79, 623.79, 623.79, 624.4, 624.4, 624.4, 624.4, 624.4, 625.99, 625.99, 625.99, 625.99, 625.99, 626.49, 626.49, 626.49, 626.49, 626.49, 627.43, 627.43, 627.43, 627.43, 627.43, 627.3, 627.3, 627.3, 627.3, 627.3, 626.37, 626.37, 626.37, 626.37, 626.37, 633.31, 633.31, 633.31, 633.31, 633.31, 638.2, 638.2, 638.2, 638.2, 638.2, 638.56, 638.56, 638.56, 638.56, 638.56, 638.01, 638.01, 638.01, 638.01, 638.01, 636.61, 636.61, 636.61, 636.61, 636.61, 639.75, 639.75, 639.75, 639.75, 639.75, 641.03, 641.03, 641.03, 641.03, 641.03, 640.85, 640.85, 640.85, 640.85, 640.85, 640.86, 640.86, 640.86, 640.86, 640.86, 644.06, 644.06, 644.06, 644.06, 644.06, 644.37, 644.37, 644.37, 644.37, 644.37, 645.09, 645.09, 645.09, 645.09, 645.09, 645.03, 645.03, 645.03, 645.03, 645.03, 645.03]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 204 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1713981570 --> 1713982200
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.02, 33.02, 33.02, 33.02, 33.02, 31.51, 31.51, 31.51, 31.51, 31.51, 26.1, 26.1, 26.1, 26.1, 26.1, 26.1, 26.1, 26.1, 26.1, 26.1, 24.63, 24.63, 24.63, 24.63, 24.63, 23.05, 23.05, 23.05, 23.05, 23.05, 20.95, 20.95, 20.95, 20.95, 20.95, 17.3, 17.3, 17.3, 17.3, 17.3, 17.4, 17.4, 17.4, 17.4, 17.4, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.9, 18.9, 18.9, 18.9, 18.9, 19.1, 19.1, 19.1, 19.1, 19.1, 19.09, 19.09, 19.09, 19.09, 19.09, 18.96, 18.96, 18.96, 18.96, 18.96, 18.83, 18.83, 18.83, 18.83, 18.83, 18.71, 18.71, 18.71, 18.71, 18.71, 18.79, 18.79, 18.79, 18.79, 18.79, 19.05, 19.05, 19.05, 19.05, 19.05, 19.16, 19.16, 19.16, 19.16, 19.16, 19.29, 19.29, 19.29, 19.29, 19.29, 19.45, 19.45, 19.45, 19.45, 19.45, 19.46, 19.46, 19.46, 19.46, 19.46, 19.48, 19.48, 19.48, 19.48, 19.48, 19.52, 19.52, 19.52, 19.52, 19.52, 19.56, 19.56, 19.56, 19.56, 19.56, 19.66, 19.66, 19.66, 19.66, 19.66, 19.78, 19.78, 19.78, 19.78, 19.78, 19.83, 19.83, 19.83, 19.83, 19.83, 19.8, 19.8, 19.8, 19.8, 19.8, 19.79, 19.79, 19.79, 19.79, 19.79, 19.73, 19.73, 19.73, 19.73, 19.73, 19.66, 19.66, 19.66, 19.66, 19.66, 19.38, 19.38, 19.38, 19.38, 19.38, 19.3, 19.3, 19.3, 19.3, 19.3, 19.26, 19.26, 19.26, 19.26, 19.26, 19.16, 19.16, 19.16, 19.16, 19.16, 19.1, 19.1, 19.1, 19.1, 19.1, 18.8, 18.8, 18.8, 18.8, 18.8, 18.63, 18.63, 18.63, 18.63, 18.63, 18.61, 18.61, 18.61, 18.61, 18.61, 18.09, 18.09, 18.09, 18.09, 18.09, 18.0, 18.0, 18.0, 18.0, 18.0, 17.87, 17.87, 17.87, 17.87, 17.87, 17.85, 17.85, 17.85, 17.85, 17.85, 17.85, 17.85, 17.85, 17.85, 17.85, 17.86, 17.86, 17.86, 17.86, 17.86, 17.93, 17.93, 17.93, 17.93, 17.93, 17.98, 17.98, 17.98, 17.98, 17.98, 17.98, 17.98, 17.98, 17.98, 17.98, 17.93, 17.93, 17.93, 17.93, 17.93, 17.91, 17.91, 17.91, 17.91, 17.91, 17.8, 17.8, 17.8, 17.8, 17.8, 17.73, 17.73, 17.73, 17.73, 17.73, 17.72, 17.72, 17.72, 17.72, 17.72, 17.73, 17.73, 17.73, 17.73, 17.73, 17.83, 17.83, 17.83, 17.83, 17.83, 17.85, 17.85, 17.85, 17.85, 17.85, 17.86, 17.86, 17.86, 17.86, 17.86, 17.88, 17.88, 17.88, 17.88, 17.88, 17.88, 17.88, 17.88, 17.88, 17.88, 17.93]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 204 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1713981570 --> 1713982200
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.32, 0.32, 0.32, 0.32, 0.32, 0.45, 0.45, 0.45, 0.45, 0.45, 0.53, 0.53, 0.53, 0.53, 0.53, 0.42, 0.42, 0.42, 0.42, 0.42, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.26, 0.26, 0.26, 0.26, 0.26, 0.24, 0.24, 0.24, 0.24, 0.24, 0.26, 0.26, 0.26, 0.26, 0.26, 0.29, 0.29, 0.29, 0.29, 0.29, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.27, 0.27, 0.27, 0.27, 0.27, 0.23, 0.23, 0.23, 0.23, 0.23, 0.29, 0.29, 0.29, 0.29, 0.29, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.28, 0.28, 0.28, 0.28, 0.28, 0.41, 0.41, 0.41, 0.41, 0.41, 0.46, 0.46, 0.46, 0.46, 0.46, 0.44, 0.44, 0.44, 0.44, 0.44, 0.47, 0.47, 0.47, 0.47, 0.47, 0.45, 0.45, 0.45, 0.45, 0.45, 0.3, 0.3, 0.3, 0.3, 0.3, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.23, 0.23, 0.23, 0.23, 0.23, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.33, 0.33, 0.33, 0.33, 0.33, 0.37, 0.37, 0.37, 0.37, 0.37, 0.39, 0.39, 0.39, 0.39, 0.39, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.19, 0.19, 0.19, 0.19, 0.19, 0.25, 0.25, 0.25, 0.25, 0.25, 0.23, 0.23, 0.23, 0.23, 0.23, 0.19, 0.19, 0.19, 0.19, 0.19, 0.24]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 204 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1713981570 --> 1713982200
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0]
                    
Loading

@ngxson ngxson added the demo Demonstrate some concept or idea, not intended to be merged label May 4, 2024
@ngxson ngxson removed the request for review from ggerganov May 4, 2024 08:42
@ngxson ngxson marked this pull request as draft May 4, 2024 08:43
@ngxson
Copy link
Collaborator Author

ngxson commented May 4, 2024

I'm changing this PR to "demo" since I'm still not very confident to make the chat template system become more complicated. Maybe we will re-visit this in the future. This PR is mostly useful for adding chat templates to main.cpp, but atm it's not a priority.

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 9, 2024
@mofosyne mofosyne added the refactoring Refactoring label Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants