Skip to content

fix: chat API logprobs format #1788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 6, 2024
Merged

Conversation

domdomegg
Copy link
Contributor

@domdomegg domdomegg commented Oct 6, 2024

Summary

The OpenAI compatible server should match the response structure of the OpenAI API for chat completions. Unfortunately there is a discrepancy with the format of logprobs: we return the logprobs format for the completions API, rather than the chat completions API.

This PR:

  • updates the types to match the OpenAI API
  • adds a function _convert_text_completion_logprobs_to_chat which is used in the chat completion responses to convert the logprobs to the new API format
  • updates the documentation on running the server locally, as I discovered this was outdated when I went to test things out

Issues fixed

Fixes #1787

@domdomegg
Copy link
Contributor Author

Demo

Request
{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "logprobs": true,
  "top_logprobs": 10,
  "max_tokens": 5
}
Response
{
  "id": "chatcmpl-1898ccce-2bf6-431c-b9e0-2a82e90a9604",
  "object": "chat.completion",
  "created": 1728184671,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "The capital of France is",
        "role": "assistant"
      },
      "logprobs": {
        "content": [
          {
            "token": "The",
            "logprob": -0.008244173601269722,
            "bytes": null,
            "top_logprobs": [
              {
                "token": "The",
                "logprob": -0.008244173601269722,
                "bytes": null
              },
              {
                "token": "Paris",
                "logprob": -5.3227219581604,
                "bytes": null
              },
              {
                "token": "Sure",
                "logprob": -5.770838260650635,
                "bytes": null
              },
              {
                "token": "Answer",
                "logprob": -9.54023265838623,
                "bytes": null
              },
              {
                "token": "Yes",
                "logprob": -9.896768569946289,
                "bytes": null
              },
              {
                "token": "France",
                "logprob": -10.62641429901123,
                "bytes": null
              },
              {
                "token": " The",
                "logprob": -11.367059707641602,
                "bytes": null
              },
              {
                "token": "According",
                "logprob": -11.45943546295166,
                "bytes": null
              },
              {
                "token": "**",
                "logprob": -11.586193084716797,
                "bytes": null
              },
              {
                "token": " Paris",
                "logprob": -11.59852409362793,
                "bytes": null
              }
            ]
          },
          {
            "token": " capital",
            "logprob": -0.0005453529884107411,
            "bytes": null,
            "top_logprobs": [
              {
                "token": " capital",
                "logprob": -0.0005453529884107411,
                "bytes": null
              },
              {
                "token": " Capital",
                "logprob": -7.571288108825684,
                "bytes": null
              },
              {
                "token": " city",
                "logprob": -11.57780647277832,
                "bytes": null
              },
              {
                "token": " current",
                "logprob": -12.473557472229004,
                "bytes": null
              },
              {
                "token": " correct",
                "logprob": -12.674555778503418,
                "bytes": null
              },
              {
                "token": "  ",
                "logprob": -12.77519416809082,
                "bytes": null
              },
              {
                "token": " answer",
                "logprob": -12.833593368530273,
                "bytes": null
              },
              {
                "token": " French",
                "logprob": -13.656529426574707,
                "bytes": null
              },
              {
                "token": " Paris",
                "logprob": -13.73013687133789,
                "bytes": null
              },
              {
                "token": " **",
                "logprob": -13.916248321533203,
                "bytes": null
              }
            ]
          },
          {
            "token": " of",
            "logprob": -0.019254328683018684,
            "bytes": null,
            "top_logprobs": [
              {
                "token": " of",
                "logprob": -0.019254328683018684,
                "bytes": null
              },
              {
                "token": " city",
                "logprob": -3.9625728130340576,
                "bytes": null
              },
              {
                "token": " and",
                "logprob": -10.33055305480957,
                "bytes": null
              },
              {
                "token": "  ",
                "logprob": -12.015106201171875,
                "bytes": null
              },
              {
                "token": " is",
                "logprob": -12.049043655395508,
                "bytes": null
              },
              {
                "token": " City",
                "logprob": -12.161520957946777,
                "bytes": null
              },
              {
                "token": " o",
                "logprob": -12.770393371582031,
                "bytes": null
              },
              {
                "token": " cities",
                "logprob": -14.372736930847168,
                "bytes": null
              },
              {
                "token": " của",
                "logprob": -14.63923454284668,
                "bytes": null
              },
              {
                "token": " ",
                "logprob": -14.65132999420166,
                "bytes": null
              }
            ]
          },
          {
            "token": " France",
            "logprob": -0.0000252720492426306,
            "bytes": null,
            "top_logprobs": [
              {
                "token": " France",
                "logprob": -0.0000252720492426306,
                "bytes": null
              },
              {
                "token": " the",
                "logprob": -11.084362030029297,
                "bytes": null
              },
              {
                "token": "  ",
                "logprob": -12.06197738647461,
                "bytes": null
              },
              {
                "token": "France",
                "logprob": -12.9952974319458,
                "bytes": null
              },
              {
                "token": " French",
                "logprob": -13.759483337402344,
                "bytes": null
              },
              {
                "token": " is",
                "logprob": -15.239158630371094,
                "bytes": null
              },
              {
                "token": " **",
                "logprob": -15.40572452545166,
                "bytes": null
              },
              {
                "token": " france",
                "logprob": -15.767807960510254,
                "bytes": null
              },
              {
                "token": " ",
                "logprob": -16.346908569335938,
                "bytes": null
              },
              {
                "token": " Frankreich",
                "logprob": -17.035612106323242,
                "bytes": null
              }
            ]
          },
          {
            "token": " is",
            "logprob": -0.000060437283536884934,
            "bytes": null,
            "top_logprobs": [
              {
                "token": " is",
                "logprob": -0.000060437283536884934,
                "bytes": null
              },
              {
                "token": "  ",
                "logprob": -9.920828819274902,
                "bytes": null
              },
              {
                "token": ",",
                "logprob": -12.151354789733887,
                "bytes": null
              },
              {
                "token": " was",
                "logprob": -13.53709602355957,
                "bytes": null
              },
              {
                "token": " ",
                "logprob": -14.004632949829102,
                "bytes": null
              },
              {
                "token": " in",
                "logprob": -14.70918083190918,
                "bytes": null
              },
              {
                "token": " **",
                "logprob": -14.768845558166504,
                "bytes": null
              },
              {
                "token": " the",
                "logprob": -14.776985168457031,
                "bytes": null
              },
              {
                "token": " ",
                "logprob": -14.940979957580566,
                "bytes": null
              },
              {
                "token": " Is",
                "logprob": -14.942352294921875,
                "bytes": null
              }
            ]
          }
        ],
        "refusal": null
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 5,
    "total_tokens": 34
  }
}

uvicorn --factory llama.server:app --host ${HOST} --port ${PORT}
python llama_cpp/server --model ${MODEL}
Copy link
Contributor

@lukestanley lukestanley Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? It is unrelated to logprobs? @domdomegg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the current instructions work, and this is how I got it working.

Happy to split this out to a separate PR, or for challenge that the uvicorn command does work.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err yes @domdomegg that's a typo, it should be uvicorn --factory llama.server.app:create_app --host ${HOST} --port ${PORT}, I'll merge this and fix the Makefile after.

@abetlen abetlen merged commit 4f0ec65 into abetlen:main Dec 6, 2024
@domdomegg domdomegg deleted the chat-api-logprobs branch December 22, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: chat completions returns wrong logprobs model
3 participants