Skip to content

Embedded null characters can lead to bugs or even security vulnerabilities #111656

Closed
@vstinner

Description

@vstinner

I just modified PyUnicode_AsUTF8() of the C API to raise an exception if a string contains an embedded null character to reduce the risk of security vulnerabilities. PyUnicode_AsUTF8() caller expects a string terminated by a null byte. If the UTF-8 encoded string contains embedded null byte, the caller is likely to truncate the string without knowing that there are more bytes after "the first" null byte.

See: https://owasp.org/www-community/attacks/Embedding_Null_Code

It's not only about security issue, it can also just be seen as a bug: unwanted behavior.

Previous issues:

Discussions:


Example with Python 3.12:

import ctypes

libc = ctypes.cdll.LoadLibrary('libc.so.6')
printf = libc.printf
PyUnicode_AsUTF8 = ctypes.pythonapi.PyUnicode_AsUTF8
PyUnicode_AsUTF8.argtypes = (ctypes.py_object,)
PyUnicode_AsUTF8.restype = ctypes.c_char_p

my_string = "World\0truncated string"
printf(b"Hello %s\n", PyUnicode_AsUTF8(my_string))

Output:

Hello World

The truncated string part is silently ignored!


Multiple functions were modified in the past to prevent this problem. Examples:

  • _dbm.open(): check filename
  • _gdbm.open(): check filename
  • PyBytes_AsStringAndSize(str, NULL)
  • grp.getgrnam(): check name
  • pwd.getpwnam(): check name
  • _locale.strxfrm(): check argument
  • path_converter() of the os module: basically any filename and path
  • PyUnicode_AsWideCharString()
  • os.putenv()
  • _posixsubprocess.fork_exec(): executable_list
  • _struct.Struct: check format
  • _tkinter SetVar() and varname_converter()
  • _winapi.CreateProcess() getenvironment()
  • PyUnicode_EncodeLocale()
  • PyUnicode_EncodeFSDefault()
  • unicode_decode_locale()
  • PyUnicode_FSConverter()
  • PyUnicode_DecodeLocale()
  • PyUnicode_DecodeLocaleAndSize()
  • PyUnicode_FSDecoder()
  • PyUnicode_AsUTF8() -- recently modified
  • _Py_stat(): check path
  • getargs.c: 's', 'y' and 'z' formats

There are exceptions which accept embedded null bytes/characters:

  • socket: AF_UNIX socket name

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions