Closed
Description
EvalPlus version
v0_1_0_hf
Output of running ls ~/.cache/bigcodebench
BigCodeBench-v0.1.0_hf.jsonl
Task ID of the programming task
BigCodeBench/211, BigCodeBench/215, probably some others as well
The original test
(All tests)
mock_response = MagicMock()
mock_response.content = MOCK_CONTENT
mock_requests_get.return_value = mock_response
Your proposed new test
mock_response = MagicMock()
mock_response.content = MOCK_CONTENT
mock_response.status_code = 200
mock_requests_get.return_value = mock_response
Description
The LLM sometimes (reasonably!) generates code like:
if r.status_code != 200:
print("Error: Failed to download file from URL.")
return None
(Rest of code solves task correctly)
But fails the test
Other context
No response