Closed
Description
As an next step of #122985 ("c_char on AIX should be u8"), I checked all builtin targets' c_char (as of nightly-2024-09-03) using the following script.
check_c_char.sh
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
export RUSTC="${RUSTC:-rustc}"
CLANG="${CLANG:-clang}"
"${RUSTC}" -vV >&2
echo >&2
"${CLANG}" --version >&2
echo >&2
cargo new --lib tmp
cd tmp
failed=''
for target_spec in $("${RUSTC}" -Z unstable-options --print all-target-specs-json | jq -c '. | to_entries[]'); do
eval "$(jq -r '@sh "RUST_TARGET=\(.key) LLVM_TARGET=\(.value."llvm-target")"' <<<"${target_spec}")"
clang_defs=$("${CLANG}" -E -dM -x c /dev/null -target "${LLVM_TARGET}")
clang_c_char=i8
if grep -Fq '__CHAR_UNSIGNED__ 1' <<<"${clang_defs}"; then
clang_c_char=u8
fi
cat >src/lib.rs <<EOF
#![no_std]
const _C: core::ffi::c_char = 0_${clang_c_char};
EOF
if { cargo check -Z build-std=core --target "${RUST_TARGET}" -q 2>&1 || :; } | grep -Fq 'error[E0308]'; then
echo "${RUST_TARGET}: should be ${clang_c_char}"
failed=1
fi
done
cd ..
rm -rf tmp
if [[ -n "${failed}" ]]; then
exit 1
fi
The result (stdout of the script) is:
aarch64-kmc-solid_asp3: should be u8
aarch64-unknown-hermit: should be u8
aarch64-unknown-illumos: should be u8
aarch64-unknown-none: should be u8
aarch64-unknown-none-softfloat: should be u8
aarch64-unknown-redox: should be u8
aarch64-unknown-teeos: should be u8
aarch64-unknown-trusty: should be u8
armebv7r-none-eabi: should be u8
armebv7r-none-eabihf: should be u8
armv4t-none-eabi: should be u8
armv5te-none-eabi: should be u8
armv7-sony-vita-newlibeabihf: should be u8
armv7-unknown-trusty: should be u8
armv7a-kmc-solid_asp3-eabi: should be u8
armv7a-kmc-solid_asp3-eabihf: should be u8
armv7a-none-eabi: should be u8
armv7a-none-eabihf: should be u8
armv7r-none-eabi: should be u8
armv7r-none-eabihf: should be u8
armv8r-none-eabihf: should be u8
csky-unknown-linux-gnuabiv2: should be i8
csky-unknown-linux-gnuabiv2hf: should be i8
hexagon-unknown-none-elf: should be u8
riscv32i-unknown-none-elf: should be u8
riscv32im-risc0-zkvm-elf: should be u8
riscv32im-unknown-none-elf: should be u8
riscv32ima-unknown-none-elf: should be u8
riscv32imac-esp-espidf: should be u8
riscv32imac-unknown-none-elf: should be u8
riscv32imac-unknown-nuttx-elf: should be u8
riscv32imac-unknown-xous-elf: should be u8
riscv32imafc-esp-espidf: should be u8
riscv32imafc-unknown-none-elf: should be u8
riscv32imafc-unknown-nuttx-elf: should be u8
riscv32imc-esp-espidf: should be u8
riscv32imc-unknown-none-elf: should be u8
riscv32imc-unknown-nuttx-elf: should be u8
riscv64-linux-android: should be u8
riscv64gc-unknown-hermit: should be u8
riscv64gc-unknown-none-elf: should be u8
riscv64gc-unknown-nuttx-elf: should be u8
riscv64imac-unknown-none-elf: should be u8
riscv64imac-unknown-nuttx-elf: should be u8
thumbv4t-none-eabi: should be u8
thumbv5te-none-eabi: should be u8
thumbv6m-none-eabi: should be u8
thumbv6m-nuttx-eabi: should be u8
thumbv7em-none-eabi: should be u8
thumbv7em-none-eabihf: should be u8
thumbv7em-nuttx-eabi: should be u8
thumbv7em-nuttx-eabihf: should be u8
thumbv7m-none-eabi: should be u8
thumbv7m-nuttx-eabi: should be u8
thumbv8m.base-none-eabi: should be u8
thumbv8m.base-nuttx-eabi: should be u8
thumbv8m.main-none-eabi: should be u8
thumbv8m.main-none-eabihf: should be u8
thumbv8m.main-nuttx-eabi: should be u8
thumbv8m.main-nuttx-eabihf: should be u8
x86_64-unknown-l4re-uclibc: should be i8
stderr of the script (version info, etc.)
rustc 1.83.0-nightly (bd53aa3bf 2024-09-02)
binary: rustc
commit-hash: bd53aa3bf7a24a70d763182303bd75e5fc51a9af
commit-date: 2024-09-02
host: aarch64-apple-darwin
release: 1.83.0-nightly
LLVM version: 19.1.0
Homebrew clang version 18.1.8
Target: arm64-apple-darwin23.5.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin
Creating library `tmp` package
note: see more `Cargo.toml` keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: unknown platform, assuming -mfloat-abi=soft
clang: warning: no target microcontroller specified on command line, cannot link standard libraries, please pass -mmcu=<mcu name> [-Wavr-rtlib-linking-quirks]
error: unknown target triple 'xtensa-none-unknown-elf'
Note about the above script (potential false positive / false negative)
- Note that targets that broken and cannot build core (such as m68k:
m68k-unknown-linux-gnu
: can't compile functions with certain return types #89498) are ignored in the above script. (potential false negative) - The script passes llvm-target field of the target spec as a target for Clang, so if it does not match the Rust target, it may cause a false positive or false negative.
As stated in the title, there are many target_os = "none" target
and tier3 targets that do not match.
- As for the
target_os = "none" target
, most of them probably do not match with Clang's default.- I guess this is because
c_char
was originally defined instd
and this mismatch was missed when it was moved tocore
. - I don't think many people use
c_char
directly on these targets, but this includes a lot of tier 2 targets, so changing this to match Clang could have a not small impact on embedded ecosystem.
- I guess this is because
- All other targets seem to be tier 3, so I think changing them to match Clang's default should be no problem.
- However, note that Clang's default may be wrong for minor architectures such as C-SKY. (see the reference below)
References:
- Apple:
- Windows:
- Windows MSVC C++ Language Reference says "Microsoft-specific: Variables of type char are promoted to int as if from type signed char by default, unless the /J compilation option is used."
- Vita:
- ARM:
- Section 8 "Arm C and C++ Language Mappings" in Procedure Call Standard for the Arm® Architecture says C/C++ char is unsigned byte
- AArch64:
- Section 10 "Arm C and C++ language mappings" in Procedure Call Standard for the Arm 64-bit Architecture (AArch64) says C/C++ char is unsigned byte
- RISC-V:
- C/C++ type representations section in RISC-V Calling Conventions page in RISC-V ELF psABI Document says "char is unsigned."
- PowerPC
- Section 2.1.2.2 "Fundamental Types" in 64-Bit ELF V2 ABI Specification says char is unsigned byte
- Section 3.1.4 "Fundamental Types" in 64-bit PowerPC ELF Application Binary Interface Supplement 1.9 says ANSI C is unsigned byte
- "Table 3-1 Scalar Types" in SYSTEM V APPLICATION BINARY INTERFACE PowerPC Processor Supplement says ANSI C char is unsigned byte
- XL C for AIX Language Reference says "By default, char behaves like an unsigned char."
- s390x
- "Table 1.1.: Scalar types" in ELF Application Binary Interface s390x Supplement Version 1.6.1 categorize ISO C char in unsigned integer
- z/OS XL C/C++ Language Reference says "By default, char behaves like an unsigned char."
- Hexagon:
- Section 3.1 "Basic data type" in Qualcomm Hexagon™ Application Binary Interface User Guide says "By default, the
char
data type is unsigned."
- Section 3.1 "Basic data type" in Qualcomm Hexagon™ Application Binary Interface User Guide says "By default, the
- C-SKY:
- Section 2.1.2 "Primary Data Type" in C-SKY V2 CPU Applications Binary Interface Standards Manual says ANSI C char is unsigned byte
- Note: this doesn't seem to match Clang's default (perhaps we need to fix Clang? cc @Dirreke)
$ clang -E -dM -x c /dev/null -target csky-unknown-linux-gnuabiv2 | grep __CHAR #define __CHAR16_TYPE__ unsigned short #define __CHAR32_TYPE__ unsigned int #define __CHAR_BIT__ 8
- Note: this doesn't seem to match Clang's default (perhaps we need to fix Clang? cc @Dirreke)
- Section 2.1.2 "Primary Data Type" in C-SKY V2 CPU Applications Binary Interface Standards Manual says ANSI C char is unsigned byte
- MSP430:
- Section 2.1 "Basic Types" in MSP430 Embedded Application Binary Interface says "The char type is unsigned by default".
- Note: this doesn't seem to match Clang's default (c_char signedness doesn't match with Clang's default on various no-std and tier 3 targets #129945 (comment))
- Section 2.1 "Basic Types" in MSP430 Embedded Application Binary Interface says "The char type is unsigned by default".
- Xtensa:
- Section 2.17.1 "Data Types and Alignment" of Xtensa LX Microprocessor Overview handbook says "
char
type is unsigned by default".
- Section 2.17.1 "Data Types and Alignment" of Xtensa LX Microprocessor Overview handbook says "
- LoongArch:
- "Appendix: C data types and machine data types" in Procedure Call Standard for the LoongArch Architecture says "For all base ABI types of LoongArch, the char data type in C is signed by default."
- AVR
- https://gcc.gnu.org/wiki/avr-gcc#Type_Layout says char is signed
@rustbot label +A-abi +O-Arm +O-AArch64 +O-riscv +O-csky +O-msp430 +O-xtensa +O-bare-metal +O-android +O-illumos +O-hermit
Metadata
Metadata
Assignees
Labels
Area: Concerning the application binary interface (ABI)Category: This is a bug.Armv8-A or later processors in AArch64 modeTarget: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 stateOperating system: AndroidTarget: Rust without an operating systemTarget: glaCSKY above covers over me~Operating System: Hermitthe other shiny OSTarget: RISC-V architectureRelevant to the library team, which will review and decide on the PR/issue.