Summary
tunix/rl/agentic/rewards/reward.py implements calculate_reward() using Python eval() on a string derived from task["question"]. If an untrusted task/question is processed with this reward enabled, arbitrary Python code can execute in the context of the running process.
Location
- File:
tunix/rl/agentic/rewards/reward.py
- Function:
calculate_reward
- Line:
correct_value = eval(expression)
Why this matters
Many RL/agentic workflows consume tasks from external datasets/benchmarks. If those inputs are not fully trusted, eval() introduces a code-execution risk.
Reproduction (safe)
Set the task question to a harmless payload like:
__import__('os').system('echo PWNED')
Then execute it through TaskEnvironment(..., reward_fn=calculate_reward).
Suggested remediation
- Replace
eval() with a safe math expression evaluator (AST allowlist), or
- gate this behind an explicit “unsafe” flag / move to tests-only code paths.
Summary
tunix/rl/agentic/rewards/reward.pyimplementscalculate_reward()using Pythoneval()on a string derived fromtask["question"]. If an untrusted task/question is processed with this reward enabled, arbitrary Python code can execute in the context of the running process.Location
tunix/rl/agentic/rewards/reward.pycalculate_rewardcorrect_value = eval(expression)Why this matters
Many RL/agentic workflows consume tasks from external datasets/benchmarks. If those inputs are not fully trusted,
eval()introduces a code-execution risk.Reproduction (safe)
Set the task question to a harmless payload like:
__import__('os').system('echo PWNED')Then execute it through
TaskEnvironment(..., reward_fn=calculate_reward).Suggested remediation
eval()with a safe math expression evaluator (AST allowlist), or