-
Couldn't load subscription status.
- Fork 13.4k
server/public_simplechat web search tool call support and bearer token wrt simpleproxy handshake #16791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
server/public_simplechat web search tool call support and bearer token wrt simpleproxy handshake #16791
Conversation
Enable streaming by default, to check the handshake before going on to change the code, given that havent looked into this for more than a year now and have been busy with totally different stuff. Also updated the user messages used for testing a bit
Define the meta that needs to be passed to the GenAi Engine. Define the logic that implements the tool call, if called. Implement the flow/structure such that a single tool calls implementation file can define multiple tool calls.
Make tooljs structure and flow more generic Add a simple_calculator tool/function call logic Add initial skeleton wrt the main tools.mjs file.
Changed latestResponse type to an object instead of a string. Inturn it contains entries for content, toolname and toolargs. Added a custom clear logic due to the same and used it to replace the previously simple assigning of empty string to latestResponse. For now in all places where latestReponse is used, I have replaced with latestReponse.content. Next need to handle identifying the field being streamed and inturn append to it. Also need to add logic to call tool, when tool_call triggered by genai.
Update response_extract_stream to check for which field is being currently streamed ie is it normal content or tool call func name or tool call func args and then return the field name and extracted value. Previously it was always assumed that only normal content will be returned. Currently it is assumed that the server will only stream one of the 3 supported fields at any time and not more than one of them at the same time. TODO: Have to also add logic to extract the reasoning field later, ie wrt gen ai models which give out their thinking. Have updated append_response to expect both the key and the value wrt the latestResponse object, which it will be manipualted. Previously it was always assumed that content is what will be got and inturn appended.
I was wrongly checking for finish_reason to be non null, before trying to extract the genai content/toolcalls, have fixed this oversight with the new flow in progress. I had added few debug logs to identify the above issue, need to remove them later. Note: given that debug logs are disabled by replacing the debug function during this program's initialisation, which I had forgotten about, I didnt get the debug messages and had to scratch my head a bit, before realising this and the other issue ;) Also either when I had originally implemented simplechat 1+ years back, or later due to changes on the server end, the streaming flow sends a initial null wrt the content, where it only sets the role. This was not handled in my flow on the client side, so a null was getting prepended to the chat messages/responses from the server. This has been fixed now in the new generic flow.
Make latestResponse into a new class based type instance wrt ai assistant response, which is what it represents. Move clearing, appending fields' values and getting assistant's response info (irrespective of a content or toolcall response) into this new class and inturn use the same.
Switch oneshot handler to use AssistantResponse, inturn currenlty only handle the normal content in the response. TODO: If any tool_calls in the oneshot response, it is currently not handled. Inturn switch the generic/toplevel handle response logic to use AssistantResponse class, given that both oneshot and the multipart/streaming flows use/return it. Inturn add trimmedContent member to AssistantResponse class and make the generic handle response logic to save the trimmed content into this. Update users of trimmed to work with this structure.
As there could be failure wrt getting the response from the ai server some where in between a long response spread over multiple parts, the logic uses the latestResponse to cache the response as it is being received. However once the full response is got, one needs to transfer it to a new instance of AssistantResponse class, so that latestResponse can be cleared, while the new instance can be used in other locations in the flow as needed. Achieve the same now.
Previously if content was empty, it would have always sent the toolcall info related version even if there was no toolcall info in it. Fixed now to return empty string, if both content and toolname are empty.
The implementations of javascript and simple_calculator now use provided helpers to trap console.log messages when they execute the code / expression provided by GenAi and inturn store the captured log messages in the newly added result key in tc_switch This should help trap the output generated by the provided code or expression as the case maybe and inturn return the same to the GenAi, for its further processing.
Checks for toolname to be defined or not in the GenAi's response If toolname is set, then check if a corresponding tool/func exists, and if so call the same by passing it the GenAi provided toolargs as a object. Inturn the text generated by the tool/func is captured and put into the user input entry text box, with tool_response tag around it.
As output generated by any tool/function call is currently placed into the TextArea provided for End user (for their queries), bcas the GenAi (engine/LLM) may be expecting the tool response to be sent as a user role data with tool_response tag surrounding the results from the tool call. So also now at the end of submit btn click handling, the end user input text area is not cleared, if there was a tool call handled, for above reasons. Also given that running a simple arithmatic expression in itself doesnt generate any output, so wrap them in a console.log, to help capture the result using the console.log trapping flow that is already setup.
and inform the GenAi/LLM about the same
Should hopeful ensure that the GenAi/LLM will generate appropriate code/expression as the argument to pass to these tool calls, to some extent.
ie in vs code with ts-check
Move tool calling logic into tools module. Try trap async promise failures by awaiting results of tool calling and putting full thing in an outer try catch. Have forgotten the nitty gritties of JS flow, this might help, need to check.
So that when tool handler writes the result to the tc_switch, it can make use of the same, to write to the right location. NOTE: This also fixes the issue with I forgetting to rename the key in js_run wrt writing of result.
to better describe how it will be run, so that genai/llm while creating the code to run, will hopefully take care of any naunces required.
Also as part of same, wrap the request details in the assistant block using a similar tagging format as the tool_response in user block.
Instead of automatically calling the requested tool with supplied arguments, rather allow user to verify things before triggering the tool. NOTE: User already provided control over tool_response before submitting it to the ai assistant.
Instead of automatically calling any requested tool by the GenAi / llm, that is from the tail end of the handle user submit btn click, Now if the GenAi/LLM has requested any tool to be called, then enable the Tool Run related UI elements and fill them with the tool name and tool args. In turn the user can verify if they are ok with the tool being called and the arguments being passed to it. Rather they can even fix any errors in the tool usage like the arithmatic expr to calculate that is being passed to simple_calculator or the javascript code being passed to run_javascript_function_code If user is ok with the tool call being requested, then trigger the same. The results if any will be automatically placed into the user query text area. User can cross verify if they are ok with the result and or modify it suitabley if required and inturn submit the same to the GenAi/LLM.
Also avoid showing Tool calling UI elements, when not needed to be shown.
The tagging of messages wrt ValidateUrl and UrlReq Also dump req Move check for --allowed.domains to ValidateUrl NOTE: Also with mimicing of user agent etal from got request to the generated request, yahoo search/news is returning results now, instead of the bland error before.
mimicing got req in generated req helps with duckduckgo also and not just yahoo. also update allowed.domains to allow a url generated by ai when trying to access the bing's news aggregation url
Use DOMParser parseFromString in text/html mode rather than text/xml as it makes it more relaxed without worrying about special chars of xml like & etal
ie during proxying
Instead of simple concatenating of tool call id, name and result now use browser's dom logic to create the xml structure used for now to store these within content field. This should take care of transforming / escaping any xml special chars in the result, so that extracting them later for putting into different fields in the server handshake doesnt have any problem.
bing raised a challenge for chrome triggered search requests after few requests, which were spread few minutes apart, while still seemingly allowing wget based search to continue (again spread few minutes apart). Added a simple helper to trace this, use --debug True to enable same.
avoid logically duplicate debug log
Instead of enforcing always explicit user triggered tool calling, now user is given the option whether to use explicit user triggered tool calling or to use auto triggering after showing tool details for a user specified amount of seconds. NOTE: The current logic doesnt account for user clicking the buttons before the autoclick triggers; need to cancel the auto clicks, if user triggers before autoclick, ie in future.
also cleanup the existing toolResponseTimeout timer to be in the same structure and have similar flow convention.
identified by llama.cpp editorconfig check * convert tab to spaces in json config file * remove extra space at end of line
Add missing newline to ending bracket line of json config file
include info about the auto option within tools. use nonwrapped text wrt certain sections, so that the markdown readme can be viewed properly wrt the structure of the content in it.
So as to split browser js webworker based tool calls from web related tool calls.
Remove the unneed (belonging to the other file) stuff from tooljs and toolweb files. Update tools manager to make use of the new toolweb module
Initial go at implementing a web search tool call, which uses the existing UrlText support of the bundled simpleproxy.py. It allows user to control the search engine to use, by allowing them to set the search engine url template. The logic comes with search engine url template strings for duckduckgo, brave, bing and google. With duckduckgo set by default.
Avoid code duplication, by creating helpers for setup and toolcall. Also send indication of the path that will be used, when checking for simpleproxy.py server to be running at runtime setup.
If using wikipedia or so, remember to have sufficient context window in general wrt the ai engine as well as wrt the handshake / chat end point.
Moved it into Me->tools, so that end user can modify the same as required from the settings ui. TODO: Currently, if tc response is got after a tool call timed out and user submitted default timed out error response, the delayed actual response when it is got may overwrite any new content in user query box, this needs to be tackled.
Now both follow a similar mechanism and do the following * exit on finding any issue, so that things are in a known state from usage perspective, without any confusion/overlook * check if the cmdlineArgCmd/configCmd being processed is a known one or not. * check value of the cmd is of the expected type * have a generic flow which can accomodate more cmds in future in a simple way
Ensure load_config gets called on encountering --config in cmdline, so that the user has control over whether cmdline or config file will decide the final value of any given parameter. Ensure that str type values in cmdline are picked up directly, without running them through ast.literal_eval, bcas otherwise one will have to ensure throught the cmdline arg mechanism that string quote is retained for literal_eval Have the """ function note/description below def line immidiately so that it is interpreted as a function description.
Add a config entry called bearer.insecure which will contain a token used for bearer auth of http requests Make bearer.insecure and allowed.domains as needed configs, and exit program if they arent got through cmdline or config file.
As noted in the comments in code, this is a very insecure flow for now.
Next will be adding a proxyAuth field also to tools.
User can configure the bearer token to send
instead of using the shared bearer token as is, hash it with current year and use the hash. keep /aum path out of auth check. in future bearer token could be transformed more often, as well as with additional nounce/dynamic token from server got during initial /aum handshake as also running counter and so ... NOTE: All these circus not good enough, given that currently the simpleproxy.py handshakes work over http. However these skeletons put in place, for future, if needed. TODO: There is a once in a bluemoon race when the year transitions between client generating the request and server handling the req. But other wise year transitions dont matter bcas client always creates fresh token, and server checks for year change to genrate fresh token if required.
|
Have added support for a simple minded bearer token handshake between the client ui and web tool calls related helper simple proxy server, just in case. NOTE: Currently the handshake between client ui and the included simpleproxy server is limited to http, in order to avoid needing additional https key pair setup etal by end user, given that this is mainly for testing/use by people on their own machine or so, which allows more system level control mechanisms, independent of any logic here, to be sufficient normally. |
When interacting with llama-server using the alternate minimal (and exploratory for devel purpose and so, without any external js modules depedency etal) public_simplechat client ui, this allows json-tool-schema aware ai models to trigger search web tool call without worrying about the underlying search url template etal.
By default it is configured to use duckduckgo html search url internally, with the option for user to modify the same through the settings page in the ui. The search url templates for bing, brave and google are also provided in the code, if needed. Chances are google one wont work, while bing may work if running the client ui from firefox and not chrome or so. While duckduckgo and brave seem to allow search requests from both browsers.
This builds on the previously submitted #PR16563 (and the fetchweburltext tool call from that). ie #16563 which added basic tool calling functionality to the alternate minimal public_simplechat webui along with support for
calculator and js code interpreter tool calls directly in browser through a web worker
fetch web page related tool calls through a bundled simpleproxy.py server logic
Look into the readme.md or that PR for details wrt the tool call support added to public_simplechat web client ui.
NOTE: Given that, that PR isnt merged into main branch of llama.cpp, this PR seems to refer to (bring in) the full commit list including the ones belonging to that PR.