Skip to content

BUG: Expression xxxx has forbidden control characters - caused by new release of numexpr #54542

Closed
@alexmloveless

Description

@alexmloveless

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

$ pip install numexpr=2.8.5
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
a = 8
df.query("A == a@", engine="numexpr")

Issue Description

Traceback (most recent call last):
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-34-7f773e498449>", line 1, in <module>
    df.query("A == @a", engine="numexpr")
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\frame.py", line 4060, in query
    res = self.eval(expr, **kwargs)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\frame.py", line 4191, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\computation\eval.py", line 353, in eval
    ret = eng_inst.evaluate()
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\computation\engines.py", line 80, in evaluate
    res = self._evaluate()
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\computation\engines.py", line 121, in _evaluate
    return ne.evaluate(s, local_dict=scope)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\numexpr\necompiler.py", line 943, in evaluate
    raise e
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\numexpr\necompiler.py", line 851, in validate
    _names_cache[expr_key] = getExprNames(ex, context)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\numexpr\necompiler.py", line 714, in getExprNames
    ex = stringToExpression(text, {}, context)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\numexpr\necompiler.py", line 274, in stringToExpression
    raise ValueError(f'Expression {s} has forbidden control characters.')
ValueError: Expression (A) == (__pd_eval_local_a) has forbidden control characters.

So this is actually an issue with numexpr release 2.8.5 which went live on Sunday 6th August 2023:

  • As an addendum to the use of NumExpr for parsing user inputs, is that NumExpr
    calls eval on the inputs. A regular expression is now applied to help sanitize
    the input expression string, forbidding '__', ':', and ';'. Attribute access

Not sure if this qualifies as a bug over there, but it breaks pandas if you have numexpr==2.8.5 installed

Expected Behavior

df.query("A == 8", engine="numexpr")

correctly queries the df and produces a valid response. So this is an issue with using @ variables in the query which produces those dunder variables, although I guess it may manifest elsewhere.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 66e3805 python : 3.9.12.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United Kingdom.1252 pandas : 1.3.5 numpy : 1.24.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.4 setuptools : 61.2.0 Cython : 3.0.0 pytest : None hypothesis : None sphinx : 6.1.3 blosc : None feather : None xlsxwriter : 3.0.3 lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.7.0 pandas_datareader: 0.10.0 bs4 : 4.11.1 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.6.2 numexpr : 2.8.5 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 11.0.0 pyxlsb : None s3fs : None scipy : 1.9.3 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : 2.0.1 xlwt : None numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions