Skip to content

Savedata is not endianness portable  #2693

@nurupo

Description

@nurupo

Savedata created on a little-endian systems will not load on big-endian systems and the other way around.

I have looked into save_compatibility_test failing on s390x and found a couple of issues that I would like to document:

  1. Toxcore uses little-endian to host and host to little-endian conversion functions. For example, lendian_bytes_to_host32() in tox_load():

    c-toxcore/toxcore/tox.c

    Lines 704 to 709 in 710eb67

    memcpy(data32, data, sizeof(uint32_t));
    lendian_bytes_to_host32(data32 + 1, data + sizeof(uint32_t));
    if (data32[0] != 0 || data32[1] != STATE_COOKIE_GLOBAL) {
    return -1;
    }

    They function such that if WORDS_BIGENDIAN is defined, they assume the host is big-endian, otherwise it's little endian:

    c-toxcore/toxcore/state.c

    Lines 127 to 136 in 710eb67

    void lendian_bytes_to_host32(uint32_t *dest, const uint8_t *lendian)
    {
    uint32_t d;
    memcpy(&d, lendian, sizeof(uint32_t));
    #ifdef WORDS_BIGENDIAN
    d = ((d << 8) & 0xFF00FF00) | ((d >> 8) & 0xFF00FF);
    d = (d << 16) | (d >> 16);
    #endif /* WORDS_BIGENDIAN */
    *dest = d;
    }

    However, WORDS_BIGENDIAN is never defined by anything on a big-endian system, so those functions always assume the host is little-endian and just memcpy() the data as is, without any conversion. This results in little-endian systems storing and reading integers in the little-endian order, and big-endian systems storing and reading integers in the big-endian order.

    Trying to load a savedata created on amd64 on a s390x system will result in the tox_load() code linked above returning -1 on line 708 as the savedata file's magic number won't match due to the wrong endianness, with tox_new() returning TOX_ERR_NEW_LOAD_BAD_FORMAT to the user, as evident by save_compatibility_test failing on s390s with:

    ./auto_tests/save_compatibility_test.c:87: failed `tox': failed to create tox, error number: 9
    

    size_t index = 0;
    Tox_Err_New err;
    Tox *tox = tox_new_log(&options, &err, &index);
    ck_assert_msg(tox, "failed to create tox, error number: %d", err);

  2. If we fix the previous issue by defining WORDS_BIGENDIAN on a big-endian system, e.g. by adding the following to CMakeLists.txt:

    include(TestBigEndian)
    test_big_endian(IS_BIG_ENDIAN)
    if(IS_BIG_ENDIAN)
      add_definitions(-DWORDS_BIGENDIAN)
    endif()

    then the magic number matches and the code proceeds further in parsing the savedata, however the s390x save_compatibility_test then fails with:

    [#0] ERROR state.c:41   state_load:     state file garbled: 01ce != 01ce
    

    uint32_t cookie_type;
    lendian_bytes_to_host32(&cookie_type, data + sizeof(uint32_t));
    data += size_head;
    length -= size_head;
    if (length < length_sub) {
    /* file truncated */
    LOGGER_ERROR(log, "state file too short: %u < %u", length, length_sub);
    return -1;
    }
    if (lendian_to_host16(cookie_type >> 16) != cookie_inner) {
    /* something is not matching up in a bad way, give up */
    LOGGER_ERROR(log, "state file garbled: %04x != %04x", cookie_type >> 16, cookie_inner);
    return -1;
    }
    const uint16_t type = lendian_to_host16(cookie_type & 0xFFFF);

    The issue here is that the code does the same endianness conversion twice. On line 28 it converts little-endian to host 32-bit. This will convert the little-endian to the big-endian representation. But then on the line 39 it performs a little-endian to host conversion again, mistakenly converting those 16 bits from the host endianness (which is big-endian! this is why calling lendian_to_host16() on it produces non-portable result, as it's not lendian to begin with) to the little endian. Then the comparison fails because left side is little-endian now and the right side is big-endian. (Then there are the lower 16-bits being converted on like 45, which also is unnecessary and will produce an incorrect result).

    The saving code does the conversion twice too:

    host_to_lendian_bytes32(data, (host_to_lendian16(cookie_type) << 16) | host_to_lendian16(section_type));

    which are no-ops on little endian but produce unexpected result on big-endian.

Fixing this would break the savedata format on big-endian systems, changing it in non backwards-compatible way. There have been talk of switching savedata to using msgpack, which would also be a breaking change, so it might make sense to keep the current broken behavior for now and do all the savedata breaking changes together.

Here is a small snippet to reproduce the issue
$ git checkout 710eb674a50f17201bb7556d389d82d2133d98de
$ sudo docker run -it --rm -v "$PWD":/toxcore debian:bookworm /bin/bash
# dpkg --add-architecture s390x
# apt-get update
# apt-get install -y --no-install-recommends binutils-s390x-linux-gnu gcc-s390x-linux-gnu g++-s390x-linux-gnu qemu-user-static
# apt-get install -y libsodium-dev:s390x libopus-dev:s390x libvpx-dev:s390x pkg-config:s390x
# apt-get install -y cmake make
# echo "
    SET(CMAKE_SYSTEM_NAME Linux)

    SET(CMAKE_C_COMPILER   /usr/bin/s390x-linux-gnu-gcc)
    SET(CMAKE_CXX_COMPILER /usr/bin/s390x-linux-gnu-g++)
    SET(CMAKE_AR           /usr/bin/s390x-linux-gnu-ar)

    SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
    SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
    SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)

    SET(CROSSCOMPILING_EMULATOR /usr/bin/qemu-s390x-static)
" > /usr/local/share/s390x-linux-gnu.cmake
# export PKG_CONFIG_LIBDIR=/usr/lib/s390x-linux-gnu/pkgconfig:/usr/share/pkgconfig
# cp -a /toxcore/ ~/toxcore
# cd ~/toxcore/
# rm -rf _build/
# cmake -B_build -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/usr/local/share/s390x-linux-gnu.cmake -DENABLE_SHARED=ON -DENABLE_STATIC=OFF -DAUTOTEST=ON -DBOOTSTRAP_DAEMON=OFF
# cmake --build _build --parallel "$(nproc)" --target auto_save_compatibility_test
# cd _build/
# ./auto_tests/auto_save_compatibility_test 

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High prioritybugBug fix for the user, not a fix to a build script

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions