size_t/ptrdiff_t transition part 1: test(1), .sh.match, init, macro, stk(3)#867
size_t/ptrdiff_t transition part 1: test(1), .sh.match, init, macro, stk(3)#867
Conversation
…on, init and stk(3)
This is the first of a thirteen(!!) part patch series that enables
ksh93 to operate within a 64-bit address space (currently dubbed
thickfold). These changes were accomplished by fixing most (but not
all) of the warnings that materialize during compilation when the
following clang compiler flags are passed:
'-Wsign-compare -Wshorten-64-to-32 -Wsign-conversion -Wimplicit-int-conversion'
Originally, this patch was going to take the suggested approach of
replacing int with ssize_t. While that does work in practice, it's
a bad idea because POSIX does not guarantee ssize_t will accept
any negative value besides -1, despite its signed nature[1]:
> The type ssize_t shall be capable of storing values at least
> in the range [-1, {SSIZE_MAX}].
As such, for correctness and portability these patches will prefer
usage of the C89 ptrdiff_t, which is practically guaranteed to
accept the full range of negative numbers an int can accept. In
practice, ptrdiff_t and size_t will be of the same size, being
both 32-bit or both 64-bit. There are edge cases where this might
not be so, but I've chosen to use ptrdiff_t despite that since
the caveats of such edge cases aren't nearly as bad as that of
a platform's ssize_t rejecting values lower than -1.
[1]: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_types.h.html#tag_14_70
In some areas ssize_t results from e.g. sfvalue() may end up being
used with ptrdiff_t variables. This is not pedantically correct,
but it's certainly better than the prior int hell status quo,
wherein ssize_t was usually shortened to int.
Conspicuous changes with noteworthyness:
- Fixed many, many compiler warnings by adding a considerable
number of casts and changing the types of more than quite
a few variables. This is a consequence of using compiler
warnings as an aid for implementing 64-bit memory allocation;
while the warnings were useful, there were so many warnings
fixed in the process that it increased the patch sizes greatly.
- Transitioned the strgrpmatch() and strngrpmatch calls to use
a ssize_t* pointer rather than an int* pointer.
- For the sh_options macros/function, enforce uint64_t as the main
argument type to fix compiler warnings.
- The libast stk sublibrary already uses the ssize_t/size_t types,
which made it rather easy to integrate proper ptrdiff_t usage.
In fact, stktell() has *always* returned a 'ptrdiff_t' result.
The documentation in stk(3) has been incorrectly claiming
since < 1995 it returns 'int', which is wrong. That error
has been rectified alongside the other updates to stk.
The thickfold patch series was accomplished by fixing compiler warnings,
so I've decided to provide metrics as to the change in the total number
of warnings occurring during compilation.
Change in the number of warnings on Linux when compiling with clang using
-Wsign-compare -Wshorten-64-to-32 -Wsign-conversion -Wimplicit-int-conversion:
4,042 => 3,808 => 248 (progression from cc5e069 => part 1 => part 13)
Side note: To re-emphasize, this patch is one part of a whole
(although it can be used on it's own). The full suite of
changes can be found on the thickfold-size_t branch.
This first part has been submitted severed from the other
changes to make code review less laborious (I hope).
The full patch series can be found here:
https://github.com/JohnoKing/ksh/tree/ad588c003d/patches
Progresses ksh93#592
This code is set to be deleted to fix ACE vulns, so undo the ptrdiff_t change to avoid a merge conflict.
32451d1 to
8639c8d
Compare
However, ptrdiff_t is meant to store the difference between two pointers (hence the name) so it's quite misleading, in terms of reading the code, to use it as a general int type. There is intmax_t and uintmax_t, provided by libast on old systems that don't have these natively. Why not use those? |
|
Hmmm… did you choose |
That is the main reason, though not the only one. The Emacs documentation raises a valid point regarding pointer subtraction:
The ksh codebase uses lots of pointer subtraction, such as when operating with the stack: ksh/src/lib/libast/include/stk.h Line 44 in d462de6 Lines 2162 to 2175 in d462de6 Using |
|
OK, that seems technically sound in practice, but As far as I can tell, using |
That would work, but those types are typically used for storing pointer addresses, not sizes. I'd think that would be just as confusing to a code reader, if not moreso. I've never seen
Line 536 in d462de6 ksh/src/lib/libast/include/sfio.h Lines 89 to 90 in d462de6 ksh/src/lib/libast/sfio/sfvscanf.c Lines 593 to 594 in d462de6 ksh/src/lib/libast/sfio/sfvprintf.c Lines 481 to 482 in d462de6 ksh/src/lib/libast/sfio/sftable.c Lines 264 to 265 in d462de6 Line 268 in d462de6 Any platform that doesn't implement |
|
It's also worth noting on ARM CHERI |
|
Then let's at least do something like /* (u)intksh_t will typically be 32 bits wide in 32-bit binaries and 64 bits wide in 64-bit binaries */
typedef ptrdiff_t intksh_t;
typedef unsigned ptrdiff_t uintksh_t;to make the source more readable, and also make it easier to change the approach later, should that become necessary. This would go well in defs.h. Of course ptrdiff_t should continue to be used directly for storing values that are actually the result of subtracting two pointers. |
The ksh codebase frequently assumes the result from pointer subtraction will have the same bitness as the size variables, and that the results can themselves be used for sizes. For just one example, Lines 94 to 95 in d462de6 Lines 125 to 129 in d462de6 Line 216 in d462de6 Line 126 in d462de6 Lines 141 to 142 in d462de6 On my current thickfold branch there are currently a grand total of 993 results for
I doubt it can be changed later. To reiterate the point from the Emacs document I quoted earlier, objects larger than
A nonstandard type definition doesn't necessarily improve readability either. A coder unfamiliar with the ksh codebase isn't going to have any idea what an |
This is the first of a thirteen(!!) part patch series that enables ksh93 to operate within a 64-bit address space (currently dubbed thickfold). These changes were accomplished by fixing most (but not all) of the warnings that materialize during compilation when the following clang compiler flags are passed:
-Wsign-compare -Wshorten-64-to-32 -Wsign-conversion -Wimplicit-int-conversionOriginally, this patch was going to take the suggested approach of replacing int with
ssize_t. While that does work in practice, it's a bad idea because POSIX does not guaranteessize_twill accept any negative value besides-1, despite its signed nature1:As such, for correctness and portability these patches will prefer usage of the C89
ptrdiff_t, which is practically guaranteed to accept the full range of negative numbers an int can accept. In practice,ptrdiff_tandsize_twill be of the same size, being both 32-bit or both 64-bit. There are edge cases where this might not be so2, but I've chosen to useptrdiff_tdespite that since the caveats of such edge cases aren't nearly as bad as that of a platform'sssize_tpotentially rejecting values lower than -1.In some areas
ssize_tresults from e.g.sfvalue()may end up being used withptrdiff_tvariables. This is not pedantically correct, but it's certainly better than the priorinthell status quo, whereinssize_twas usually shortened to int.Conspicuous changes with noteworthiness:
strgrpmatch()andstrngrpmatch()calls to use assize_t*pointer rather than anint*pointer.sh_optionsmacros/function, enforceuint64_tas the main argument type to fix compiler warnings.ssize_t/size_ttypes, which made it rather easy to integrate properptrdiff_tusage. In fact,stktell()has always returned aptrdiff_tresult. The documentation in stk(3) has been incorrectly claiming since < 1995 it returnsint, which is both pedantically and in actuality wrong. That error has been rectified alongside the other updates to stk.The thickfold patch series was accomplished by fixing compiler warnings, so I've decided to provide metrics as to the change in the total number of warnings occurring during compilation.
Change in the number of warnings on Linux when compiling with clang using
-Wsign-compare -Wshorten-64-to-32 -Wsign-conversion -Wimplicit-int-conversion(using the 'warnings generated' strings):4,042 => 3,797 => 86 (progression from d462de6 => part 1 => part 13)
Side note: To re-emphasize, this patch is one part of a whole (although it can be used on its own). The full suite of changes can be found on the thickfold-size_t branch. This first part has been submitted severed from the other changes to make code review less
hellishlaborious (I hope). The full patch series can be found here (last updated 2026-01-30 with minor changes to account for 128-bit pointers on ARM CHERI and better handle differingssize_tandptrdiff_tsizes):https://github.com/JohnoKing/ksh/tree/850fb892d/patches
Progresses #592
Footnotes
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_types.h.html#tag_14_70 ↩
I'll note there are also edge cases where
ssize_tandsize_tare of differing bit sizes,size_tbeing unsigned 64-bit andssize_tbeing signed 32-bit. I presume this situation is extremely uncommon, but it's yet another reason whyptrdiff_tis a better choice. ↩