🚀 Installation • 🏁 Search Needle Function • 📚 Read More
# without vLLM (can run openai, anthropic, and huggingface backends)
pip install --upgrade repoqa
# with vLLM
pip install --upgrade "repoqa[vllm]"
⏬ Install nightly version :: click to expand ::
pip install --upgrade "git+https://github.com/evalplus/repoqa.git" # without vLLM
pip install --upgrade "repoqa[vllm] @ git+https://github.com/evalplus/repoqa@main" # with vLLM
⏬ Using RepoQA as a local repo? :: click to expand ::
git clone https://github.com/evalplus/repoqa.git
cd repoqa
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt
repoqa.search_needle_function --model "gpt4-turbo" --caching --backend openai
# 💡 If you use customized server such vLLM:
# repoqa.search_needle_function --base-url "http://url.to.vllm.server/v1" \
# --model "gpt4-turbo" --caching --backend openai
repoqa.search_needle_function --model "claude-3-haiku-20240307" --caching --backend anthropic
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" \
--caching --backend vllm
repoqa.search_needle_function --model "gpt2" "Qwen/CodeQwen1.5-7B-Chat" \
--caching --backend hf --trust-remote-code
Tip
- Input:
--model
: Hugging-Face model ID, such asise-uiuc/Magicoder-S-DS-6.7B
--backend
:vllm
(default) oropenai
--base-url
: OpenAI API base URL--code-context-size
(default: 16384): Number of tokens (using DeepSeekCoder tokenizer) of code in the long context--caching
(default: False): if enabled, the tokenization and chuncking results will be cached to accelerate subsequent runs--max-new-tokens
(default: 1024): Maximum number of new tokens to generate--system-message
(default: None): if given, the model use a system message (but note some models don't support system message)--tensor-parallel-size
: Number of tensor parallelism (only for vLLM)--languages
(default: None): List of languages to evaluate (None means all)--result-dir
(default: "results"): Directory to save the model outputs and evaluation results
- Output:
results/ntoken_{code-context-size}/{model}.jsonl
: Model generated outputsresults/ntoken_{code-context-size}/{model}-SCORE.json
: Evaluation scores (also see Compute Scores)
By default, the repoqa.search_needle_function
command will also compute scores after producing model outputs.
However, you can also compute scores separately using the following command:
repoqa.compute_score --model-output-path={model-output}.jsonl
Tip
- Input: Path to the model generated outputs.
- Output: The evaluation scores would be stored in
{model-output}-SCORES.json