-
Notifications
You must be signed in to change notification settings - Fork 300
Description
Savedata created on a little-endian systems will not load on big-endian systems and the other way around.
I have looked into save_compatibility_test failing on s390x and found a couple of issues that I would like to document:
-
Toxcore uses little-endian to host and host to little-endian conversion functions. For example,
lendian_bytes_to_host32()intox_load():Lines 704 to 709 in 710eb67
memcpy(data32, data, sizeof(uint32_t)); lendian_bytes_to_host32(data32 + 1, data + sizeof(uint32_t)); if (data32[0] != 0 || data32[1] != STATE_COOKIE_GLOBAL) { return -1; } They function such that if
WORDS_BIGENDIANis defined, they assume the host is big-endian, otherwise it's little endian:Lines 127 to 136 in 710eb67
void lendian_bytes_to_host32(uint32_t *dest, const uint8_t *lendian) { uint32_t d; memcpy(&d, lendian, sizeof(uint32_t)); #ifdef WORDS_BIGENDIAN d = ((d << 8) & 0xFF00FF00) | ((d >> 8) & 0xFF00FF); d = (d << 16) | (d >> 16); #endif /* WORDS_BIGENDIAN */ *dest = d; } However,
WORDS_BIGENDIANis never defined by anything on a big-endian system, so those functions always assume the host is little-endian and justmemcpy()the data as is, without any conversion. This results in little-endian systems storing and reading integers in the little-endian order, and big-endian systems storing and reading integers in the big-endian order.Trying to load a savedata created on amd64 on a s390x system will result in the
tox_load()code linked above returning -1 on line 708 as the savedata file's magic number won't match due to the wrong endianness, withtox_new()returningTOX_ERR_NEW_LOAD_BAD_FORMATto the user, as evident bysave_compatibility_testfailing on s390s with:./auto_tests/save_compatibility_test.c:87: failed `tox': failed to create tox, error number: 9c-toxcore/auto_tests/save_compatibility_test.c
Lines 84 to 87 in 710eb67
size_t index = 0; Tox_Err_New err; Tox *tox = tox_new_log(&options, &err, &index); ck_assert_msg(tox, "failed to create tox, error number: %d", err); -
If we fix the previous issue by defining
WORDS_BIGENDIANon a big-endian system, e.g. by adding the following toCMakeLists.txt:include(TestBigEndian) test_big_endian(IS_BIG_ENDIAN) if(IS_BIG_ENDIAN) add_definitions(-DWORDS_BIGENDIAN) endif()
then the magic number matches and the code proceeds further in parsing the savedata, however the s390x
save_compatibility_testthen fails with:[#0] ERROR state.c:41 state_load: state file garbled: 01ce != 01ceLines 27 to 45 in 710eb67
uint32_t cookie_type; lendian_bytes_to_host32(&cookie_type, data + sizeof(uint32_t)); data += size_head; length -= size_head; if (length < length_sub) { /* file truncated */ LOGGER_ERROR(log, "state file too short: %u < %u", length, length_sub); return -1; } if (lendian_to_host16(cookie_type >> 16) != cookie_inner) { /* something is not matching up in a bad way, give up */ LOGGER_ERROR(log, "state file garbled: %04x != %04x", cookie_type >> 16, cookie_inner); return -1; } const uint16_t type = lendian_to_host16(cookie_type & 0xFFFF); The issue here is that the code does the same endianness conversion twice. On line 28 it converts little-endian to host 32-bit. This will convert the little-endian to the big-endian representation. But then on the line 39 it performs a little-endian to host conversion again, mistakenly converting those 16 bits from the host endianness (which is big-endian! this is why calling
lendian_to_host16()on it produces non-portable result, as it's notlendianto begin with) to the little endian. Then the comparison fails because left side is little-endian now and the right side is big-endian. (Then there are the lower 16-bits being converted on like 45, which also is unnecessary and will produce an incorrect result).The saving code does the conversion twice too:
Line 77 in 710eb67
host_to_lendian_bytes32(data, (host_to_lendian16(cookie_type) << 16) | host_to_lendian16(section_type)); which are no-ops on little endian but produce unexpected result on big-endian.
Fixing this would break the savedata format on big-endian systems, changing it in non backwards-compatible way. There have been talk of switching savedata to using msgpack, which would also be a breaking change, so it might make sense to keep the current broken behavior for now and do all the savedata breaking changes together.
Here is a small snippet to reproduce the issue
$ git checkout 710eb674a50f17201bb7556d389d82d2133d98de
$ sudo docker run -it --rm -v "$PWD":/toxcore debian:bookworm /bin/bash
# dpkg --add-architecture s390x
# apt-get update
# apt-get install -y --no-install-recommends binutils-s390x-linux-gnu gcc-s390x-linux-gnu g++-s390x-linux-gnu qemu-user-static
# apt-get install -y libsodium-dev:s390x libopus-dev:s390x libvpx-dev:s390x pkg-config:s390x
# apt-get install -y cmake make
# echo "
SET(CMAKE_SYSTEM_NAME Linux)
SET(CMAKE_C_COMPILER /usr/bin/s390x-linux-gnu-gcc)
SET(CMAKE_CXX_COMPILER /usr/bin/s390x-linux-gnu-g++)
SET(CMAKE_AR /usr/bin/s390x-linux-gnu-ar)
SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
SET(CROSSCOMPILING_EMULATOR /usr/bin/qemu-s390x-static)
" > /usr/local/share/s390x-linux-gnu.cmake
# export PKG_CONFIG_LIBDIR=/usr/lib/s390x-linux-gnu/pkgconfig:/usr/share/pkgconfig
# cp -a /toxcore/ ~/toxcore
# cd ~/toxcore/
# rm -rf _build/
# cmake -B_build -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/usr/local/share/s390x-linux-gnu.cmake -DENABLE_SHARED=ON -DENABLE_STATIC=OFF -DAUTOTEST=ON -DBOOTSTRAP_DAEMON=OFF
# cmake --build _build --parallel "$(nproc)" --target auto_save_compatibility_test
# cd _build/
# ./auto_tests/auto_save_compatibility_test