Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rephrase ast.literal_eval() to remove any security warranty #95588

Closed
vstinner opened this issue Aug 3, 2022 · 23 comments
Closed

Rephrase ast.literal_eval() to remove any security warranty #95588

vstinner opened this issue Aug 3, 2022 · 23 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Aug 3, 2022

Currently, ast.literal_eval() documentation gives multiple security warranties:

  • Safely evaluate
  • This can be used for safely evaluating strings containing Python values from untrusted sources

IMO that's plain wrong if you read the following RED WARNING:

It is possible to crash the Python interpreter (...)

The documentation should be rephrased to only described the purpose of the function and make it very clear that it must NOT be used on untrusted sources.

We can follow the phrasing of the pickle documentation: https://docs.python.org/dev/library/pickle.html

The pickle module is not secure. Only unpickle data you trust.

@vstinner vstinner added the type-bug An unexpected behavior, bug, or error label Aug 3, 2022
@vstinner
Copy link
Member Author

vstinner commented Aug 3, 2022

cc @gvanrossum @pablogsal

@njsmith
Copy link
Contributor

njsmith commented Aug 3, 2022

Comparing literal_eval to unpickle seems really misleading... unpickling is straight up arbitrary code execution, which is a completely different threat than "might cause a crash".

Also, I've seen lots of folks use literal_eval on untrusted data, because that's always been its whole reason for existence. (If you have trusted data, you can just eval :-).) So I don't think just yanking that guarantee out from under people's feet is likely to fly with the community... it's effectively a major backwards-incompatible change.

If you're really worried about this, maybe it should get a max_input_length=10000 argument or something, or whatever would make it safe enough for you? Do we even know whether it's actually unsafe right now?

@tiran
Copy link
Member

tiran commented Aug 3, 2022

We could remove the safety warranties from our documentation, but we cannot drop the security promise for Python 3.11 and earlier. The API was documented as safe when Python 3.10 and earlier came out. We cannot just void this promise mid-air.

@tiran
Copy link
Member

tiran commented Aug 3, 2022

If you're really worried about this, maybe it should get a max_input_length=10000 argument or something, or whatever would make it safe enough for you? Do we even know whether it's actually unsafe right now?

I made a similar suggestion in our internal chat. We may have to restrict both input string length and amount of AST nodes. Py_CompileString* would need to track nodes with an internal counter.

@pablogsal
Copy link
Member

If you're really worried about this, maybe it should get a max_input_length=10000 argument or something, or whatever would make it safe enough for you? Do we even know whether it's actually unsafe right now?

I made a similar suggestion in our internal chat. We may have to restrict both input string length and amount of AST nodes. Py_CompileString* would need to track nodes with an internal counter.

I am generally opposed to that kind of thing because is very difficult to predict correctly what the effect will be. For example, the parser creates AST nodes on the go as it parses incorrect constructs and those will not be ultimately used unless the end in a cache. What is worse, changing the grammar will have very visible effects so something that didn't crash before now will reach the limit.

Given how tricky is to maintain the parser in general I advice strongly against more complexity than we already have, which is a lot.

Every special mode of the parser can be a pain to maintain when extending error messages or doing more grammar rules

@vstinner
Copy link
Member Author

vstinner commented Aug 3, 2022

So I don't think just yanking that guarantee out from under people's feet is likely to fly with the community... it's effectively a major backwards-incompatible change.

The documentation is wrong, misleading and should be fixed. A function which is known to crash must not be used with an untrusted string.

IMO making literal_eval() safer is a new and separated feature request. If you have a concrete idea how to make literal_eval() safer, please open a separated issue.

Also, fixing a bunch of parser issues is always welcomed, but it's also a separated issue.

@vstinner
Copy link
Member Author

vstinner commented Aug 3, 2022

IMO making literal_eval() safer is a new and separated feature request. If you have a concrete idea how to make literal_eval() safer, please open a separated issue.

See issue #83340 for example.

@tiran
Copy link
Member

tiran commented Aug 3, 2022

The documentation is wrong, misleading and should be fixed. A function which is known to crash must not be used with an untrusted string.

The documentation is a promise to our users. We must do our best to solve the problem -- at least for common and simple cases. Just dropping the security promise from the documentation is blame shifting. It does not help our user base who is already using the feature.

@nascheme
Copy link
Member

nascheme commented Aug 4, 2022

Writing a more minimal expression parser as Raymond suggests is likely the best way to make literal_eval() safe. As for a promise to our users, there are some real challenges fulfilling that and it's best to be honest if we can't fulfill it. E.g. we probably can't fix literal_eval() to be 100% bulletproof in bug fix release. OTOH, I think it's a fairly heavily used feature so probably we should try to fix it. Even though the minimal expression parser likely adds a fair bit of maintenance work.

@seberg
Copy link
Contributor

seberg commented Aug 12, 2022

Is there no way to figure out an acceptable solution complexity wise? Or just to build more confidence in what we have? Maybe by adding a fuzzer test for it (there seems to be a test for json here, which is probably quite similar in many ways).
A reduced complexity/more minimal expression parser version seems best in principle, but I don't know how feasible it is.

@gpshead
Copy link
Member

gpshead commented Aug 12, 2022

We do fuzz test ast.literal_eval via Google's oss-fuzz. The entry point is https://github.com/python/cpython/blob/main/Modules/_xxtestfuzz/fuzzer.c#L396. It has revealed numerous compiler and parser issues that have been fixed. But is often stuck re-triggering known non-trivial issues such as stack overflows and computational timeouts at this point.

"The" problem with writing a minimal expression parser to replace literal_eval's implementation is the complexity. literal_eval does a lot more than I suspect many users actually need or understand as the range of things Python literals encompass is large and infinitely nested.

Ex: {(15, (9.25e-16+9j)): '''str''', "8": [{(),}, ..., None, False, ()]} is a Python literal. Did your program really need to be able to accept that as input? I doubt it.

We could remove the safety warranties from our documentation, but we cannot drop the security promise for Python 3.11 and earlier. The API was documented as safe when Python 3.10 and earlier came out. We cannot just void this promise mid-air.

We can and must. Because we'll never fix literal_eval in anything acceptable to backport to a release branch. Doing that requires an entire new non-recursive not Python compiler based implementation. The bug we can fix is in our documentation which claims it is safe to crash the process by parsing untrusted data. It isn't.

@seberg
Copy link
Contributor

seberg commented Aug 12, 2022

I doubt many will need the full power here, but things like dicts containing a list maybe, in fact a dict with a list containing tuples is the exact, untrusted, use-case, I have.

If you update the docs, maybe mention json which has support for a large range of use-cases? For the use-case I have in mind, unfortunaely supporting tuples literals rather than only lists is the issue that json does not cover.

@njsmith
Copy link
Contributor

njsmith commented Aug 12, 2022

So the problem with literal_eval is entirely with the parser, right? the "eval" part looks pretty trivial/safe AFAICT? I think literal_eval is kind of a red herring... the much more important question is whether we document ast.parse as performing arbitrary code execution. Because tons of things use ast.parse besides literal_eval.

Concretely: suppose you have an IDE where whenever you open a .py file, it does some static introspection, part of which involves ast.parse. Is this a supported use case? Or does it need to be handled with the same radioactive tongs as unpickling and eval?

@gpshead
Copy link
Member

gpshead commented Aug 12, 2022

Likely? ast.parse and ast.literal_eval both crash on the same thing with a stack overflow:

$ ulimit -s 256
$ python
Python 3.10.5 (main, Jun  8 2022, 09:26:22) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> n = 200
>>> def s():
...   return '(' * n + '0' + ',)' * n
... 
>>> import ast
>>> ast.parse(s())
Segmentation fault

Under a more common default stack size or >= 2048KiB our internal checks for too much nesting being > 200 fire and lead to a SyntaxError: too many nested parentheses instead. So many of these crashes are likely prevented in common environments but we don't do enough to make a guarantee.

We cannot control the C stack size - and introspecting it is not always plausible - so we cannot make guarantees so long as we're C stack recursive. #91079 is related to helping out here.

Fundamentally this is a problem with using recursion in a language that doesn't have a dynamic stack (read: C and C++). Input based recursion is not wise for parsing untrusted inputs in such an environment.

@gvanrossum
Copy link
Member

How infeasible is it on the most common platforms (Linux, Windows, macOS, various BSD) to introspect the stack size? We're at least partially in this mess because back in 1990 I didn't know how to do that using portable C code, but hopefully things have improved since then, and there is at least a non-portable way? At least for the parser a single pointer compare shouldn't break the bank.

@pablogsal
Copy link
Member

At least for the parser a single pointer compare shouldn't break the bank.

At least in Linux we can ask the OS for the stack limit and then we could compare addresses as they cannot move. I can try to implement a prototype of this.

@Kodiologist
Copy link
Contributor

This seems like more of a semantics issue than anything else. I for one see no contradiction in the original documentation: the kind of "safety" being offered is "not executing arbitrary code", not "never crashing". This is sufficient for untrusted input in many situations. Making sure a function never crashes, even with pathological input, is a tall order. I can see the value in being able to make this additional guarantee, but in the meantime, the original issue can be resolved simply by wording the existing guarantee more explicitly.

gpshead added a commit that referenced this issue Oct 2, 2022
It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 2, 2022
…ythonGH-95919)

It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
(cherry picked from commit 8baef8a)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 2, 2022
…ythonGH-95919)

It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
(cherry picked from commit 8baef8a)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
@gpshead
Copy link
Member

gpshead commented Oct 2, 2022

More explicit wording added.

@gpshead gpshead closed this as completed Oct 2, 2022
miss-islington added a commit that referenced this issue Oct 2, 2022
It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
(cherry picked from commit 8baef8a)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington added a commit that referenced this issue Oct 2, 2022
It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
(cherry picked from commit 8baef8a)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
serhiy-storchaka pushed a commit to serhiy-storchaka/cpython that referenced this issue Oct 2, 2022
…ython#95919)

It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
@vstinner
Copy link
Member Author

vstinner commented Oct 3, 2022

Fixed by PR #95919. Thank you!

carljm added a commit to carljm/cpython that referenced this issue Oct 3, 2022
* main: (2069 commits)
  pythongh-96512: Move int_max_str_digits setting to PyConfig (python#96944)
  pythongh-94808: Coverage: Check picklablability of calliter (python#95923)
  pythongh-94808: Add test coverage for PyObject_HasAttrString (python#96627)
  pythongh-94732: Fix KeyboardInterrupt race in asyncio run_forever() (python#97765)
  Fix typos in `bltinmodule.c`. (pythonGH-97766)
  pythongh-94808: `_PyLineTable_StartsLine` was not used (pythonGH-96609)
  pythongh-97681: Remove Tools/demo/ directory (python#97682)
  Fix typo in unittest docs (python#97742)
  pythongh-97728: Argument Clinic: Fix uninitialized variable in the Py_UNICODE converter (pythonGH-97729)
  pythongh-95913: Fix PEP number in PEP 678 What's New ref label (python#97739)
  pythongh-95913: Copyedit/improve New Modules What's New section (python#97721)
  pythongh-97740: Fix bang in Sphinx C domain ref target syntax (python#97741)
  pythongh-96819: multiprocessing.resource_tracker: check if length of pipe write <= 512 (python#96890)
  pythongh-97706: multiprocessing tests: Delete unused variable `rand` (python#97707)
  pythonGH-85447: Clarify docs about awaiting future multiple times (python#97738)
  [docs] Update logging cookbook with recipe for using a logger like an output… (pythonGH-97730)
  pythongh-97607: Fix content parsing in the impl-detail reST directive (python#97652)
  pythongh-95975: Move except/*/finally ref labels to more precise locations (python#95976)
  pythongh-97591: In `Exception.__setstate__()` acquire strong references before calling `tp_hash` slot (python#97700)
  pythongh-95588: Drop the safety claim from `ast.literal_eval` docs. (python#95919)
  ...
@seberg
Copy link
Contributor

seberg commented Oct 4, 2022

Are there any hints on a workaround(s)? The only thing I can think of right now is just rejecting "large" inputs (not sure how large). Which is still a bit inconvenient in my use-case (which at some time explicitly wanted to allow them). Rejecting deep nesting, or very many nodes would seem a bit clearer.
I can of course do that, and maybe there is just no avoiding it anyway, but "advertising" it, e.g. by having a kwarg to opt-in might be good?

@vstinner
Copy link
Member Author

vstinner commented Oct 4, 2022

Are there any hints on a workaround(s)?

In my experience, the most secure way to evaluate/run untrusted code is to spawn a separated process and runs this process in a sandbox where you can limit time, memory, syscalls, etc.

@seberg
Copy link
Contributor

seberg commented Oct 4, 2022

Yea, but whether or not it helps me, it seem like there should be some practical solution for very simple stuff here.

  • Nobody wants to sand-box evaluating a single floating point number. So there is some "threshold" of complexity that should be safe. Maybe using json actually helps increase it (not for me, but generally)? (json also warns about sizes, but maybe it is more graceful in the sense that there is no sudden crash, just DoS through linear resource usage increase?).
  • If I use literal-eval as a library, it seems strange if I sandbox for the user (can I even do that)? But, I also cannot reasonably force everyone to sandbox. So I must add some arbitrary threshold beyond which I tell users: "Oh, maybe be careful, if you are worried you probably have to sandbox."

I would be surprised if I am the only one in a position where historically literal_eval was assumed to be safe (maybe sloppily so) and removing that guarantee will be bumpy.

@vstinner
Copy link
Member Author

vstinner commented Oct 4, 2022

I would be surprised if I am the only one in a position where historically literal_eval was assumed to be safe (maybe sloppily so) and removing that guarantee will be bumpy.

It was never safe. The only change is the documentation that has been fixed. One way to reduce the risk of crash is to reject strings longer than a limit (ex: 100 characters).

This issue is closed. I suggest you opening a discussion elsewhere. There is no plan to implement a sandbox in CPython: https://lwn.net/Articles/574215/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

9 participants