Skip to content

Pickle BINSTRING incorrect data type for size #135321

Closed
@Legoclones

Description

@Legoclones

Bug report

Bug description:

The BINSTRING opcode in Pickle has two arguments: a "4-byte little-endian signed int" length, and a string of that many bytes (see the code comment for BINSTRING in pickletools). Since it is signed, this means that any provided value over 0x7fffffff would be interpreted as a negative number. The Python pickle implementation specifically treats it as a signed 32-bit length and checks to see if the length is < 0:

cpython/Lib/pickle.py

Lines 1454 to 1458 in 754e7c9

def load_binstring(self):
# Deprecated BINSTRING uses signed 32-bit length
len, = unpack('<i', self.read(4))
if len < 0:
raise UnpicklingError("BINSTRING pickle has negative byte count")

However, the C pickle implementation runs calc_binsize(s, 4) for BINSTRING and returns a Py_ssize_t. Since Py_ssize_t data types are the same size as the compiler's size_t type (PEP0353), this means a Py_ssize_t is 64-bits long on 64-bit systems. Since the size variable here is also a Py_ssize_t, that means the threshold for negative values is much higher.

cpython/Modules/_pickle.c

Lines 5546 to 5558 in a58026a

Py_ssize_t size;
char *s;
if (_Unpickler_Read(self, st, &s, nbytes) < 0)
return -1;
size = calc_binsize(s, nbytes);
if (size < 0) {
PyErr_Format(st->UnpicklingError,
"BINSTRING exceeds system's maximum size of %zd bytes",
PY_SSIZE_T_MAX);
return -1;
}

This is all just the background to illustrate that because size is not an int, a pickle with the BINSTRING opcode using a length > 0x7fffffff will fail in the Python implementation (since it's negative), but deserialize just fine in the C implementation.

The following payload will demonstrate the difference:

payload:      b'T\x00\x00\x00\x80'+b'a'*0x80000000 + b'.'

pickle:       FAILURE BINSTRING pickle has negative byte count
_pickle.c:    aaaaaaaaaaaaaaaaaaa....
pickletools:  pickletools failed string4 byte count < 0: -2147483648

The required payload does need to be 2GB in size which is very large, but not impossible on modern systems.

Note that the LONG4 opcode is in a similar situation, except the output for calc_binint() is an int data type so this issue does not exist there.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions