@@ -48,36 +48,34 @@ You can run the SNF evaluation using various backends.
48
48
### OpenAI Compatible Servers
49
49
50
50
``` bash
51
- repoqa.search_needle_function --model " gpt4-turbo" --caching -- backend openai
51
+ repoqa.search_needle_function --model " gpt4-turbo" --backend openai
52
52
# 💡 If you use customized server such vLLM:
53
53
# repoqa.search_needle_function --base-url "http://url.to.vllm.server/v1" \
54
- # --model "gpt4-turbo" --caching -- backend openai
54
+ # --model "gpt4-turbo" --backend openai
55
55
```
56
56
57
57
### Anthropic Compatible Servers
58
58
59
59
``` bash
60
- repoqa.search_needle_function --model " claude-3-haiku-20240307" --caching -- backend anthropic
60
+ repoqa.search_needle_function --model " claude-3-haiku-20240307" --backend anthropic
61
61
```
62
62
63
63
### vLLM
64
64
65
65
``` bash
66
- repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" \
67
- --caching --backend vllm
66
+ repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" --backend vllm
68
67
```
69
68
70
69
### HuggingFace transformers
71
70
72
71
``` bash
73
- repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" \
74
- --caching --backend hf --trust-remote-code
72
+ repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code
75
73
```
76
74
77
75
### Google Generative AI API (Gemini)
78
76
79
77
``` bash
80
- repoqa.search_needle_function --model " gemini-1.5-pro-latest" --caching -- backend google
78
+ repoqa.search_needle_function --model " gemini-1.5-pro-latest" --backend google
81
79
```
82
80
83
81
> [ !Tip]
@@ -93,7 +91,7 @@ repoqa.search_needle_function --model "gemini-1.5-pro-latest" --caching --backen
93
91
> - ` --backend ` : ` vllm ` (default) or ` openai `
94
92
> - ` --base-url ` : OpenAI API base URL
95
93
> - ` --code-context-size ` (default: 16384): #tokens (by DeepSeekCoder tokenizer) of repository context
96
- > - ` --caching ` (default: True): accelerate subsequent runs by caching tokenization and chuncking results
94
+ > - ` --caching ` (default: True): accelerate subsequent runs by caching preprocessing; ` --nocaching ` to disable
97
95
> - ` --max-new-tokens ` (default: 1024): Maximum #new tokens to generate
98
96
> - ` --system-message ` (default: None): system message (note it's not supported by some models)
99
97
> - ` --tensor-parallel-size ` : #GPUS for doing tensor parallelism (only for vLLM)
0 commit comments