Feature: Support HuggingFace Text Generation including meta-llama2 model#266
Conversation
…refactor huggingface filter logistic
- Add the "TextGenerationFilter". - support huggingface filters could pass "endpoint" keyword arguments when using different filter task. - add test cases of "TextGenerationFilter".
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
7c0cc16 to
e192990
Compare
|
@cyyeh, Please assist me to check Document content - HuggingFace Text Generation, thanks! Btw, I added the |
onlyjackfrost
left a comment
There was a problem hiding this comment.
Besides some comments, others LGTM
| if (!(typeof args === 'object') || !has(args, 'query')) | ||
| throw new InternalError('Must provide "query" keyword argument'); | ||
| if (!args['query']) | ||
| throw new InternalError('The "query" argument must have value'); |
There was a problem hiding this comment.
Curious about why we removed this query check.
There was a problem hiding this comment.
Thanks for asking the question. I added back the logistic for checking "query" has value or not with test cases in 5e22e51
|
|
||
| Using the `huggingface_text_generation` filter. The result will be a string from `huggingface_text_generation`. | ||
|
|
||
| **Notice**: The **Text Generation** default model is **gpt2**, If you would like to use the [Meta LLama2](https://huggingface.co/meta-llama) models, you have two method to do: |
There was a problem hiding this comment.
"If you would like to use the [Meta LLama2] models, you have two method to do"
Check the grammar.
There was a problem hiding this comment.
Thanks for finding the grammar issue, I have fixed the method to methods at 5e22e51
| 2. Select one of the [Meta LLama2](https://huggingface.co/meta-llama) Models and deploy it to the [Inference Endpoint](https://huggingface.co/inference-endpoints). Set the endpoint URL using the `endpoint` keyword argument in `huggingface_text_generation`. | ||
|
|
||
| ```sql | ||
| SELECT {{ data | huggingface_text_generation(query="Which university is the top-ranked university?", endpoint='xxx.yyy.zzz.huggingface.cloud') }} as result |
There was a problem hiding this comment.
Maybe we can merge these to code snippet and use "comment" to describe the detail.
I think it will be more readable.
There was a problem hiding this comment.
According to the code snippet, the marked code is older code. After discussion with @onlyjackfrost and checking, no need to change.
| HuggingFaceTableQuestionAnsweringFilterBuilder, | ||
| HuggingFaceTableQuestionAnsweringFilterRunner, | ||
| TextGenerationFilterBuilder, | ||
| TextGenerationFilterRunner, |
There was a problem hiding this comment.
Please ensure that the naming is aligned with HuggingFace, either by using it as a prefix or without it.
There was a problem hiding this comment.
Thanks for finding the naming issue, it has been fixed at 5e22e51
| 100 * 1000 | ||
| ); | ||
|
|
||
| // Skip the test case because the "meta-llama/Llama-2-13b-chat-hf" model need to upgrade your huggingface account to Pro Account by paying $9 per month |
There was a problem hiding this comment.
Is there any model that is free and can be used for testing?
If the structure of API response payload is the same, I think it could be used for testing.
There was a problem hiding this comment.
After discussion, the free model has been added to the test cases and I renamed to Should not throw when passing the "query" argument by dynamic parameter through HuggingFace default recommended "gpt2" model 5e22e51
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #266 +/- ##
===========================================
- Coverage 90.25% 86.25% -4.01%
===========================================
Files 346 8 -338
Lines 5931 80 -5851
Branches 794 19 -775
===========================================
- Hits 5353 69 -5284
+ Misses 421 7 -414
+ Partials 157 4 -153
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4c445cd to
8c6a02c
Compare
…has value with test cases for huggingface filter - fix grammar in README. - fix the section of document . - add logistic for checking query has value with test cases
8c6a02c to
5e22e51
Compare
Description
The Text Generation is one of the Natural Language Processing tasks supported by Hugging Face.
VulcanSQL supports the Using Text Generation by using the
huggingface_text_generationfilter. The result will be a string fromhuggingface_text_generation.📢 Notice: The Text Generation default model is gpt2, If you would like to use the Meta LLama2 models, you have two method to do:
modelkeyword argument inhuggingface_text_generation, e.g:meta-llama/Llama-2-13b-chat-hf.endpointkeyword argument inhuggingface_text_generation.Sample 1 - Subscribe to the Pro Account:
{% set data = [ { "rank": 1, "institution": "Massachusetts Institute of Technology (MIT)", "location code":"US", "location":"United States" }, { "rank": 2, "institution": "University of Cambridge", "location code":"UK", "location":"United Kingdom" }, { "rank": 3, "institution": "Stanford University" "location code":"US", "location":"United States" } -- other universities..... ] %} SELECT {{ data | huggingface_text_generation(query="Which university is the top-ranked university?", model="meta-llama/Llama-2-13b-chat-hf") }} as resultSample 1 - Response:
[ { "result": "Answer: Based on the provided list, the top-ranked university is Massachusetts Institute of Technology (MIT) with a rank of 1." } ]Sample 2 - Using Inference Endpoint:
{% req universities %} SELECT rank,institution,"location code", "location" FROM read_csv_auto('2023-QS-World-University-Rankings.csv') {% endreq %} SELECT {{ universities.value() | huggingface_text_generation(query="Which university located in the UK is ranked at the top of the list?", endpoint='xxx.yyy.zzz.huggingface.cloud') }} as resultSample 2 - Response:
[ { "result": "Answer: Based on the list provided, the top-ranked university in the UK is the University of Cambridge, which is ranked at number 2." } ]Screenshot
SQL and API Schema

Question 1 - Which university is the top-ranked university?

Question 2 - Which university located in the UK is ranked at the top of the list?

Additional Context
TableQuestionAnsweringFilterfunction logistic to keep simple and readable.endpointfield for making users could use their HuggingFace Inference Endpoint when usinghuggingface_xxxfilter.requestmethod to therequest.tsand support try-catch to bypass the Axios error message.test-datafolder for reusing data and usedescribe.model.tsto define the common type or const value.llama2model, because usingllama2model needs to subscribe Pro Account and pay $9/month.