Skip to content

Possible undefined behavior in arraymodule and getargs #116447

Closed
@sobolevn

Description

@sobolevn

Bug report

Problem

Recently I got contacted by a team of static analysis enthusiasts, who analyze CPython's internals for bugs.

They have found this:

photo_2024-03-07 10 15 01

Problematic lines:

  • static int
    BB_setitem(arrayobject *ap, Py_ssize_t i, PyObject *v)
    {
    unsigned char x;
    /* 'B' == unsigned char, maps to PyArg_Parse's 'b' formatter */
    if (!PyArg_Parse(v, "b;array item must be integer", &x))
    return -1;
    if (i >= 0)
    ((char *)ap->ob_item)[i] = x;
  • cpython/Python/getargs.c

    Lines 615 to 630 in 40b79ef

    char *p = va_arg(*p_va, char *);
    long ival = PyLong_AsLong(arg);
    if (ival == -1 && PyErr_Occurred())
    RETURN_ERR_OCCURRED;
    else if (ival < 0) {
    PyErr_SetString(PyExc_OverflowError,
    "unsigned byte integer is less than minimum");
    RETURN_ERR_OCCURRED;
    }
    else if (ival > UCHAR_MAX) {
    PyErr_SetString(PyExc_OverflowError,
    "unsigned byte integer is greater than maximum");
    RETURN_ERR_OCCURRED;
    }
    else
    *p = (unsigned char) ival;
  • object.h is ignored for now as unrelated

Basically, this issue is about char vs unsigned char.

Built with:

CC=clang CXX=clang++ CXXFLAGS=$"-fsanitize=integer,undefined" CFLAGS="-fsanitize=integer,undefined" LDFLAGS="-fsanitize=integer,undefined" ./configure

Built on Linux.

To reproduce this warning you would need to run this code with CPython built with some external extension, but it just finds the desribed issues:

from array import array
array('B', [234])

Important note: I cannot find and reproduce any buggy behavior, I can only see the static analysis report.

Found by Linux Verification Center (portal.linuxtesting.ru) with SVACE.

Reporter: Svyatoslav Tereshin (s.tereshin@fobos-nt.ru).

Proposed solution

The solution is proposed by me :)

diff --git Modules/arraymodule.c Modules/arraymodule.c
index df09d9d8478..317f4974814 100644
--- Modules/arraymodule.c
+++ Modules/arraymodule.c
@@ -247,7 +247,7 @@ BB_setitem(arrayobject *ap, Py_ssize_t i, PyObject *v)
     if (!PyArg_Parse(v, "b;array item must be integer", &x))
         return -1;
     if (i >= 0)
-        ((char *)ap->ob_item)[i] = x;
+        ((unsigned char *)ap->ob_item)[i] = x;
     return 0;
 }
 
diff --git Python/getargs.c Python/getargs.c
index 08e97ee3e62..a7bfad4dd69 100644
--- Python/getargs.c
+++ Python/getargs.c
@@ -612,7 +612,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags,
     switch (c) {
 
     case 'b': { /* unsigned byte -- very short int */
-        char *p = va_arg(*p_va, char *);
+        unsigned char *p = va_arg(*p_va, unsigned char *);
         long ival = PyLong_AsLong(arg);
         if (ival == -1 && PyErr_Occurred())
             RETURN_ERR_OCCURRED;

Why?

  • arraymodule do similar casts for other setters, like: (wchar_t *)ap->ob_item, etc. This seems like a correct thing to do here as well
  • getargs's 'b' works with unsigned char, according to the docs:

    cpython/Doc/c-api/arg.rst

    Lines 232 to 234 in 40b79ef

    ``b`` (:class:`int`) [unsigned char]
    Convert a nonnegative Python integer to an unsigned tiny int, stored in a C
    :c:expr:`unsigned char`.

Local tests pass after this change.

However, I know that different platforms and different compilers might have different pitfalls here.
So, I would like to discuss this before I will send a patch (or if at all).

CC @serhiy-storchaka

Linked PRs

Metadata

Metadata

Assignees

Labels

extension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions