Skip to content

Improve expression evaluation and update .gitignore#10

Draft
phbrgnomo wants to merge 5 commits intomasterfrom
rce-fix
Draft

Improve expression evaluation and update .gitignore#10
phbrgnomo wants to merge 5 commits intomasterfrom
rce-fix

Conversation

@phbrgnomo
Copy link
Copy Markdown
Owner

@phbrgnomo phbrgnomo commented Feb 15, 2026

Vulnerability Fixes:

Remote code execution (RCE) from untrusted input evaluated by eval

Risk: RCE lets attackers execute arbitrary code, access sensitive data, pivot the environment, or fully compromise the process when untrusted input reaches eval.

Cause: eval executes strings as code. Passing data derived from external sources without strict validation or a strict allowlist enables arbitrary code injection.

Fix
Remove eval usage. If parsing literals, use ast.literal_eval(). For calculations or logic, implement explicit handlers or a whitelisted function map. Validate inputs strictly. If isolation is unavoidable, use a sandbox like RestrictedPython with minimal, immutable globals.

Note
Dynamic expression execution will be removed or restricted; inputs that previously executed arbitrary expressions may be rejected or behave differently.

Remote code execution (RCE) from external input evaluated by exec

Risk: Attackers could execute arbitrary code on the server, exfiltrate data, modify state, or fully compromise the host process.

Cause: User-controlled strings are passed to exec without strict validation or isolation, enabling injected code to run with application privileges.

Fix
Remove exec usage. Replace dynamic code execution with explicit functions or a dispatch map. For data-only evaluation, use ast.literal_eval. If expression evaluation is required, implement a strict whitelist parser and never pass user input to exec.

Note
If the application relied on executing arbitrary expressions, those scripts will no longer run; only explicitly allowed operations will execute.

Summary by Sourcery

Harden expression evaluation and remove unsafe dynamic code execution from the codebase.

Bug Fixes:

  • Replace template expression eval() usage with a restricted AST-based evaluator to prevent remote code execution from untrusted expressions.
  • Eliminate eval() in API docs generation when resolving default values for selected global objects, using an explicit whitelist mapping instead.

Enhancements:

  • Introduce a conservative allowlist for callable attributes (e.g., np.prod) that templates are permitted to access or invoke during expression evaluation.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Feb 15, 2026

Reviewer's Guide

Replaces unsafe eval-based expression evaluation with a restricted AST interpreter, removes eval from API docs generation by using an explicit whitelist of known globals, and tightens template attribute/call access via an allowlist while also updating ignore rules.

Class diagram for restricted AST-based evaluator in template module

classDiagram
    class Sub {
        +mapping
        +eval(mapping)
        +__str__()
    }

    class RepEval {
        +expression
        +mapping
        +eval(mapping) tp.Any
        +__str__() str
    }

    class TEMPLATE_ALLOWED_ATTRS {
        <<module_constant>>
        +np : prod
        %% original type/value notation: {prod}
    }

    class ast {
        <<external_module>>
    }

    class merge_dicts {
        <<function>>
        +merge_dicts(base_mapping, override_mapping) Mapping
    }

    RepEval ..> ast : uses
    RepEval ..> TEMPLATE_ALLOWED_ATTRS : consults
    RepEval ..> merge_dicts : calls

    RepEval : +_eval_node(node) tp.Any
    RepEval :   supports ast.Constant
    RepEval :   supports ast.Name
    RepEval :   supports ast.BinOp (Add, Sub, Mult, Div, FloorDiv, Mod, Pow)
    RepEval :   supports ast.UnaryOp (USub, UAdd, Not)
    RepEval :   supports ast.BoolOp (And, Or)
    RepEval :   supports ast.Compare (Eq, NotEq, Lt, LtE, Gt, GtE)
    RepEval :   supports ast.Attribute (only allowed attrs on mapped names)
    RepEval :   supports ast.Call (only allowed mapped attributes)
    RepEval :   supports ast.Subscript (indices, slices, tuples)
    RepEval :   supports ast.List, ast.Tuple, ast.Dict
    RepEval :   raises ValueError on unsupported nodes
Loading

Flow diagram for restricted AST-based expression evaluation

flowchart TD
    A[RepEval.eval called] --> B[Merge default mapping and provided mapping using merge_dicts]
    B --> C[Parse expression with ast.parse in eval mode]
    C --> D[Call _eval_node on parsed.body]

    D --> E{Node type}

    E --> F[ast.Constant
return node.value]
    E --> G[ast.Name
lookup in mapping or raise NameError]
    E --> H[ast.BinOp
recursively eval left and right
apply supported arithmetic op or raise ValueError]
    E --> I[ast.UnaryOp
recursively eval operand
apply USub, UAdd, Not or raise ValueError]
    E --> J[ast.BoolOp
recursively eval values
apply And/Or or raise ValueError]
    E --> K[ast.Compare
recursively eval chain of comparisons
support Eq, NotEq, Lt, LtE, Gt, GtE]
    E --> L[ast.Attribute
only allow attribute on top level mapped name
check TEMPLATE_ALLOWED_ATTRS
getattr or raise error]
    E --> M[ast.Call
only allow call of allowed attribute on mapped name
resolve function
recursively eval args and kwargs
call or raise error]
    E --> N[ast.Subscript
recursively eval value and slice
handle Slice, Tuple, or single index]
    E --> O[ast.List
recursively eval elements and return list]
    E --> P[ast.Tuple
recursively eval elements and return tuple]
    E --> Q[ast.Dict
recursively eval keys and values and return dict]
    E --> R[Other node type
raise ValueError unsupported expression]

    F --> S[Return to caller]
    G --> S
    H --> S
    I --> S
    J --> S
    K --> S
    L --> S
    M --> S
    N --> S
    O --> S
    P --> S
    Q --> S
    R --> S
Loading

Flow diagram for safe_default_value global resolution without eval

flowchart TD
    A[safe_default_value called with parameter p] --> B[Get default value from p]
    B --> C{value is inspect.Parameter.empty?}
    C -- Yes --> D[Return p unchanged]
    C -- No --> E[Build _safe_globals map
os.environ, sys.stdin, sys.stdout, sys.stderr]

    E --> F[Iterate over _safe_globals items
find name where obj is value]
    F --> G{Found matching object?}
    G -- Yes --> H[Set replacement to matching name string]
    G -- No --> I[If value is CPUDispatcher use value.py_func.__name__
else leave replacement as None]

    H --> J[Return parameter info with safe replacement]
    I --> J[Return parameter info with possibly transformed default]
Loading

File-Level Changes

Change Details Files
Harden template expression evaluation by replacing eval with a restricted AST-based interpreter and introducing an explicit attribute/call allowlist.
  • Import ast and define TEMPLATE_ALLOWED_ATTRS mapping for allowed attributes on specific mapped names (currently only np.prod).
  • Replace direct eval of self.expression with a custom _eval_node visitor that interprets a limited subset of Python AST nodes (constants, names, arithmetic and boolean operators, comparisons, subscripts, lists/tuples/dicts).
  • Restrict attribute access to allowed attributes on top-level mapped names and restrict function calls to those allowed attributes, rejecting any other attribute or call patterns.
  • Support safe indexing/slicing semantics (including slices and multi-dimensional indexing) within the restricted evaluator.
vectorbt/utils/template.py
Remove eval from API docs default-value handling by resolving a small whitelist of globals explicitly.
  • Introduce a _safe_globals dict mapping string names to their corresponding global objects (os.environ, sys.stdin, sys.stdout, sys.stderr).
  • Replace eval-based identity comparison with iteration over _safe_globals to find a matching object and return its name.
docs/generate_api.py
Adjust VCS ignore configuration.
  • Update the .gitignore file (details not shown in diff snippet, likely to ignore additional build or environment artifacts).
.gitignore

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The ast.Subscript handling currently sets idx = None for ast.Slice, which will attempt val[None]; either implement proper slice evaluation (start, stop, step) or explicitly raise for slice subscripts to avoid surprising runtime errors.
  • The new evaluator allows arbitrary attribute access (ast.Attribute) and method calls on any mapped object, which can still expose unsafe surface area depending on what is placed in mapping; consider constraining allowed attributes/callables via an explicit allowlist or wrapper objects instead of using getattr directly.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `ast.Subscript` handling currently sets `idx = None` for `ast.Slice`, which will attempt `val[None]`; either implement proper slice evaluation (start, stop, step) or explicitly raise for slice subscripts to avoid surprising runtime errors.
- The new evaluator allows arbitrary attribute access (`ast.Attribute`) and method calls on any mapped object, which can still expose unsafe surface area depending on what is placed in `mapping`; consider constraining allowed attributes/callables via an explicit allowlist or wrapper objects instead of using `getattr` directly.

## Individual Comments

### Comment 1
<location> `vectorbt/utils/template.py:195-197` </location>
<code_context>
+                    kwargs = {kw.arg: _eval_node(kw.value) for kw in node.keywords}
+                    return func(*args, **kwargs)
+                raise ValueError("only calls to mapped attributes are allowed")
+            if isinstance(node, ast.Subscript):
+                val = _eval_node(node.value)
+                idx = _eval_node(node.slice) if not isinstance(node.slice, ast.Slice) else None
+                return val[idx]
+            if isinstance(node, ast.List):
</code_context>

<issue_to_address>
**issue (bug_risk):** Subscript handling drops slices, leading to incorrect indexing behavior.

For slice syntax (`a[1:3]`, `a[::2]`), `node.slice` is an `ast.Slice`, but here `idx` becomes `None` and you end up doing `val[None]`. Instead, for `ast.Slice`, recursively evaluate `lower`, `upper`, and `step` and use `val[slice(lower, upper, step)]`. If relevant, also handle multi-dimensional cases (`ast.ExtSlice` / tuple indices).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@phbrgnomo phbrgnomo closed this Feb 15, 2026
@phbrgnomo phbrgnomo reopened this Feb 15, 2026
@phbrgnomo phbrgnomo marked this pull request as draft February 15, 2026 15:26
@phbrgnomo phbrgnomo marked this pull request as ready for review February 15, 2026 15:28
Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • Consider wrapping ast.parse(self.expression, mode="eval") in a try/except to convert SyntaxError into a clearer, template-specific error message rather than bubbling up a raw parse error.
  • The inline _eval_node implementation inside RepEval.eval is quite large and complex; extracting it (and the operator/allowed-attr logic) into a dedicated helper/module would improve readability and reuse while keeping eval itself small.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider wrapping `ast.parse(self.expression, mode="eval")` in a try/except to convert `SyntaxError` into a clearer, template-specific error message rather than bubbling up a raw parse error.
- The inline `_eval_node` implementation inside `RepEval.eval` is quite large and complex; extracting it (and the operator/allowed-attr logic) into a dedicated helper/module would improve readability and reuse while keeping `eval` itself small.

## Individual Comments

### Comment 1
<location> `vectorbt/utils/template.py:151-155` </location>
<code_context>
+                if isinstance(node.op, ast.Not):
+                    return not operand
+                raise ValueError(f"unsupported unary operator: {node.op}")
+            if isinstance(node, ast.BoolOp):
+                values = [_eval_node(v) for v in node.values]
+                if isinstance(node.op, ast.And):
+                    return all(values)
+                if isinstance(node.op, ast.Or):
+                    return any(values)
+                raise ValueError(f"unsupported boolean operator: {node.op}")
</code_context>

<issue_to_address>
**issue (bug_risk):** Boolean operations do not short-circuit, which can change semantics and cost for expressions with function calls.

Here all operands are eagerly evaluated (`values = [...]`) before applying `all`/`any`, which changes the behavior of `and`/`or`—particularly for operands with side effects, expensive calls, or potential errors that would normally be skipped by short-circuiting. Please evaluate operands one by one in a loop and return as soon as the result is determined, to preserve Python’s short-circuit semantics.
</issue_to_address>

### Comment 2
<location> `vectorbt/utils/template.py:117` </location>
<code_context>
         mapping = merge_dicts(self.mapping, mapping)
-        return eval(self.expression, {}, mapping)
+        # Use a restricted AST evaluator to avoid arbitrary code execution
+        def _eval_node(node):
+            if isinstance(node, ast.Constant):
+                return node.value
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring the new AST evaluator by centralizing operator mappings and extracting attribute, call, and subscript handling into helpers to keep the recursive dispatcher smaller and clearer.

You can keep the new security behavior but reduce complexity and duplication by factoring out operator tables and some helpers, while still using a single recursive dispatcher.

For example:

1. **Centralize operator handling**

Instead of repeating `if isinstance(node.op, ast.X)` across `BinOp`, `UnaryOp`, `BoolOp`, `Compare`, define small lookup tables once and use them:

```python
import operator as _op

_BIN_OPS = {
    ast.Add: _op.add,
    ast.Sub: _op.sub,
    ast.Mult: _op.mul,
    ast.Div: _op.truediv,
    ast.FloorDiv: _op.floordiv,
    ast.Mod: _op.mod,
    ast.Pow: _op.pow,
}

_UNARY_OPS = {
    ast.UAdd: _op.pos,
    ast.USub: _op.neg,
    ast.Not: _op.not_,
}

_BOOL_OPS = {
    ast.And: all,
    ast.Or: any,
}
```

Then inside `_eval_node`:

```python
if isinstance(node, ast.BinOp):
    left = _eval_node(node.left)
    right = _eval_node(node.right)
    op_func = _BIN_OPS.get(type(node.op))
    if op_func is None:
        raise ValueError(f"unsupported binary operator: {node.op}")
    return op_func(left, right)

if isinstance(node, ast.UnaryOp):
    operand = _eval_node(node.operand)
    op_func = _UNARY_OPS.get(type(node.op))
    if op_func is None:
        raise ValueError(f"unsupported unary operator: {node.op}")
    return op_func(operand)

if isinstance(node, ast.BoolOp):
    values = (_eval_node(v) for v in node.values)
    agg = _BOOL_OPS.get(type(node.op))
    if agg is None:
        raise ValueError(f"unsupported boolean operator: {node.op}")
    return agg(values)
```

For `Compare`, you can do something similar for per‑op functions to avoid the long `elif` chain while keeping the “chain comparison” semantics:

```python
_CMP_OPS = {
    ast.Eq: _op.eq,
    ast.NotEq: _op.ne,
    ast.Lt: _op.lt,
    ast.LtE: _op.le,
    ast.Gt: _op.gt,
    ast.GtE: _op.ge,
}

if isinstance(node, ast.Compare):
    left = _eval_node(node.left)
    for op, comparator in zip(node.ops, node.comparators):
        right = _eval_node(comparator)
        cmp_func = _CMP_OPS.get(type(op))
        if cmp_func is None:
            raise ValueError(f"unsupported comparison operator: {op}")
        if not cmp_func(left, right):
            return False
        left = right
    return True
```

2. **Extract attribute/call policy into helpers**

Move the security policy checks out of the main dispatcher to keep `_eval_node` focused on traversal:

```python
def _resolve_allowed_attr(mapping, base_name: str, attr: str):
    if base_name not in mapping:
        raise NameError(f"name '{base_name}' is not defined")
    allowed = TEMPLATE_ALLOWED_ATTRS.get(base_name, set())
    if attr not in allowed:
        raise ValueError(f"access to attribute '{attr}' of '{base_name}' is not allowed")
    return getattr(mapping[base_name], attr)

def _eval_call_attr(mapping, func_node, args, keywords, eval_node):
    if not (isinstance(func_node, ast.Attribute) and isinstance(func_node.value, ast.Name)):
        raise ValueError("only calls to mapped attributes are allowed")
    func = _resolve_allowed_attr(mapping, func_node.value.id, func_node.attr)
    if not callable(func):
        raise ValueError(f"object '{func_node.attr}' of '{func_node.value.id}' is not callable")
    eval_args = [eval_node(a) for a in args]
    eval_kwargs = {kw.arg: eval_node(kw.value) for kw in keywords}
    return func(*eval_args, **eval_kwargs)
```

Then:

```python
if isinstance(node, ast.Attribute):
    if not isinstance(node.value, ast.Name):
        raise ValueError("attribute access is only allowed on top-level mapped names")
    return _resolve_allowed_attr(mapping, node.value.id, node.attr)

if isinstance(node, ast.Call):
    return _eval_call_attr(mapping, node.func, node.args, node.keywords, _eval_node)
```

3. **Factor out subscript handling**

Subscript/slice handling can be a short helper to shorten `_eval_node`:

```python
def _eval_subscript(eval_node, value_node, slice_node):
    val = eval_node(value_node)
    s = slice_node
    if isinstance(s, ast.Slice):
        lower = eval_node(s.lower) if s.lower is not None else None
        upper = eval_node(s.upper) if s.upper is not None else None
        step = eval_node(s.step) if s.step is not None else None
        return val[slice(lower, upper, step)]
    if isinstance(s, ast.Tuple):
        idx = tuple(eval_node(elt) for elt in s.elts)
        return val[idx]
    idx = eval_node(s)
    return val[idx]
```

Then simply:

```python
if isinstance(node, ast.Subscript):
    return _eval_subscript(_eval_node, node.value, node.slice)
```

These changes keep the same behavior and security guarantees but flatten the long `if isinstance` sections, centralize operator semantics in one place, and separate policy concerns into small helpers that are easier to audit and extend.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant