Skip to content

size_t/ptrdiff_t transition part 1: test(1), .sh.match, init, macro, stk(3)#867

Open
JohnoKing wants to merge 5 commits intoksh93:devfrom
JohnoKing:thickfold-part-1-submission
Open

size_t/ptrdiff_t transition part 1: test(1), .sh.match, init, macro, stk(3)#867
JohnoKing wants to merge 5 commits intoksh93:devfrom
JohnoKing:thickfold-part-1-submission

Conversation

@JohnoKing
Copy link

@JohnoKing JohnoKing commented Jun 5, 2025

This is the first of a thirteen(!!) part patch series that enables ksh93 to operate within a 64-bit address space (currently dubbed thickfold). These changes were accomplished by fixing most (but not all) of the warnings that materialize during compilation when the following clang compiler flags are passed:
-Wsign-compare -Wshorten-64-to-32 -Wsign-conversion -Wimplicit-int-conversion

Originally, this patch was going to take the suggested approach of replacing int with ssize_t. While that does work in practice, it's a bad idea because POSIX does not guarantee ssize_t will accept any negative value besides -1, despite its signed nature1:

The type ssize_t shall be capable of storing values at least
in the range [-1, {SSIZE_MAX}].

As such, for correctness and portability these patches will prefer usage of the C89 ptrdiff_t, which is practically guaranteed to accept the full range of negative numbers an int can accept. In practice, ptrdiff_t and size_t will be of the same size, being both 32-bit or both 64-bit. There are edge cases where this might not be so2, but I've chosen to use ptrdiff_t despite that since the caveats of such edge cases aren't nearly as bad as that of a platform's ssize_t potentially rejecting values lower than -1.

In some areas ssize_t results from e.g. sfvalue() may end up being used with ptrdiff_t variables. This is not pedantically correct, but it's certainly better than the prior int hell status quo, wherein ssize_t was usually shortened to int.

Conspicuous changes with noteworthiness:

  • Fixed many, many compiler warnings by adding a considerable number of casts and changing the types of more than quite a few variables. This is a consequence of using compiler warnings as an aid for implementing 64-bit memory allocation; while the warnings were useful, there were so many warnings fixed in the process that it increased the patch sizes greatly.
  • Transitioned the strgrpmatch() and strngrpmatch() calls to use a ssize_t* pointer rather than an int* pointer.
  • For the sh_options macros/function, enforce uint64_t as the main argument type to fix compiler warnings.
  • The libast stk sublibrary already uses the ssize_t/size_t types, which made it rather easy to integrate proper ptrdiff_t usage. In fact, stktell() has always returned a ptrdiff_t result. The documentation in stk(3) has been incorrectly claiming since < 1995 it returns int, which is both pedantically and in actuality wrong. That error has been rectified alongside the other updates to stk.

The thickfold patch series was accomplished by fixing compiler warnings, so I've decided to provide metrics as to the change in the total number of warnings occurring during compilation.
Change in the number of warnings on Linux when compiling with clang using -Wsign-compare -Wshorten-64-to-32 -Wsign-conversion -Wimplicit-int-conversion (using the 'warnings generated' strings):
4,042 => 3,797 => 86 (progression from d462de6 => part 1 => part 13)

Side note: To re-emphasize, this patch is one part of a whole (although it can be used on its own). The full suite of changes can be found on the thickfold-size_t branch. This first part has been submitted severed from the other changes to make code review less hellish laborious (I hope). The full patch series can be found here (last updated 2026-01-30 with minor changes to account for 128-bit pointers on ARM CHERI and better handle differing ssize_t and ptrdiff_t sizes):
https://github.com/JohnoKing/ksh/tree/850fb892d/patches

Progresses #592

Footnotes

  1. https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_types.h.html#tag_14_70

  2. I'll note there are also edge cases where ssize_t and size_t are of differing bit sizes, size_t being unsigned 64-bit and ssize_t being signed 32-bit. I presume this situation is extremely uncommon, but it's yet another reason why ptrdiff_t is a better choice.

JohnoKing added 5 commits June 4, 2025 20:53
…on, init and stk(3)

This is the first of a thirteen(!!) part patch series that enables
ksh93 to operate within a 64-bit address space (currently dubbed
thickfold). These changes were accomplished by fixing most (but not
all) of the warnings that materialize during compilation when the
following clang compiler flags are passed:
'-Wsign-compare -Wshorten-64-to-32 -Wsign-conversion -Wimplicit-int-conversion'

Originally, this patch was going to take the suggested approach of
replacing int with ssize_t. While that does work in practice, it's
a bad idea because POSIX does not guarantee ssize_t will accept
any negative value besides -1, despite its signed nature[1]:
    > The type ssize_t shall be capable of storing values at least
    > in the range [-1, {SSIZE_MAX}].
As such, for correctness and portability these patches will prefer
usage of the C89 ptrdiff_t, which is practically guaranteed to
accept the full range of negative numbers an int can accept. In
practice, ptrdiff_t and size_t will be of the same size, being
both 32-bit or both 64-bit. There are edge cases where this might
not be so, but I've chosen to use ptrdiff_t despite that since
the caveats of such edge cases aren't nearly as bad as that of
a platform's ssize_t rejecting values lower than -1.
[1]: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_types.h.html#tag_14_70

In some areas ssize_t results from e.g. sfvalue() may end up being
used with ptrdiff_t variables. This is not pedantically correct,
but it's certainly better than the prior int hell status quo,
wherein ssize_t was usually shortened to int.

Conspicuous changes with noteworthyness:
- Fixed many, many compiler warnings by adding a considerable
  number of casts and changing the types of more than quite
  a few variables. This is a consequence of using compiler
  warnings as an aid for implementing 64-bit memory allocation;
  while the warnings were useful, there were so many warnings
  fixed in the process that it increased the patch sizes greatly.
- Transitioned the strgrpmatch() and strngrpmatch calls to use
  a ssize_t* pointer rather than an int* pointer.
- For the sh_options macros/function, enforce uint64_t as the main
  argument type to fix compiler warnings.
- The libast stk sublibrary already uses the ssize_t/size_t types,
  which made it rather easy to integrate proper ptrdiff_t usage.
  In fact, stktell() has *always* returned a 'ptrdiff_t' result.
  The documentation in stk(3) has been incorrectly claiming
  since < 1995 it returns 'int', which is wrong. That error
  has been rectified alongside the other updates to stk.

The thickfold patch series was accomplished by fixing compiler warnings,
so I've decided to provide metrics as to the change in the total number
of warnings occurring during compilation.
Change in the number of warnings on Linux when compiling with clang using
-Wsign-compare -Wshorten-64-to-32 -Wsign-conversion -Wimplicit-int-conversion:
4,042 => 3,808 => 248 (progression from cc5e069 => part 1 => part 13)

Side note: To re-emphasize, this patch is one part of a whole
(although it can be used on it's own). The full suite of
changes can be found on the thickfold-size_t branch.
This first part has been submitted severed from the other
changes to make code review less laborious (I hope).
The full patch series can be found here:
https://github.com/JohnoKing/ksh/tree/ad588c003d/patches

Progresses ksh93#592
This code is set to be deleted to fix ACE vulns,
so undo the ptrdiff_t change to avoid a merge conflict.
@JohnoKing JohnoKing force-pushed the thickfold-part-1-submission branch from 32451d1 to 8639c8d Compare January 29, 2026 05:51
@McDutchie
Copy link

As such, for correctness and portability these patches will prefer usage of the C89 ptrdiff_t, which is practically guaranteed to accept the full range of negative numbers an int can accept.

However, ptrdiff_t is meant to store the difference between two pointers (hence the name) so it's quite misleading, in terms of reading the code, to use it as a general int type.

There is intmax_t and uintmax_t, provided by libast on old systems that don't have these natively. Why not use those?

@McDutchie
Copy link

Hmmm… did you choose ptrdiff_t because it is a 64-bit int type on 64-bit systems and a 32-bit int type on 32-bit systems?

@JohnoKing
Copy link
Author

JohnoKing commented Jan 29, 2026

Hmmm… did you choose ptrdiff_t because it is a 64-bit int type on 64-bit systems and a 32-bit int type on 32-bit systems?

That is the main reason, though not the only one. The Emacs documentation raises a valid point regarding pointer subtraction:

Using ptrdiff_t limits objects to PTRDIFF_MAX bytes, but larger objects would cause trouble anyway since they would break pointer subtraction, so this does not impose an arbitrary limit.

The ksh codebase uses lots of pointer subtraction, such as when operating with the stack:

#define stktell(sp) ((sp)->_next-(sp)->_data)

ksh/src/cmd/ksh93/sh/lex.c

Lines 2162 to 2175 in d462de6

static unsigned char *stack_shift(unsigned char *sp, unsigned char *dp)
{
unsigned char *ep;
int offset = stktell(sh.stk);
int left = offset - (sp - (unsigned char*)stkptr(sh.stk, 0));
int shift = (dp+1-sp);
offset += shift;
stkseek(sh.stk,offset);
sp = (unsigned char*)stkptr(sh.stk,offset);
ep = sp - shift;
while(left--)
*--sp = *--ep;
return sp;
}

Using intmax_t wouldn't help in these cases because you'd still be constrained by PTRDIFF_MAX regardless.

@McDutchie
Copy link

OK, that seems technically sound in practice, but ptrdiff_t is specifically intended for storing the result of subtracting two pointers, which is just confusing when reading the code if it's used for all sorts of integer values.

As far as I can tell, using intptr_t (or uintptr_t for unsigned) would work as well, and it seems a more explicit/elegant way of using an integer type that is the size of a pointer. Their availability is guaranteed by libast in features/common.

@JohnoKing
Copy link
Author

JohnoKing commented Jan 30, 2026

using intptr_t (or uintptr_t for unsigned) would work as well, and it seems a more explicit/elegant way of using an integer type that is the size of a pointer.

That would work, but those types are typically used for storing pointer addresses, not sizes. I'd think that would be just as confusing to a code reader, if not moreso. I've never seen intptr_t used as a signed counterpart to size_t; ptrdiff_t is more commonly used for sizes in practice.

Their availability is guaranteed by libast in features/common.

ptrdiff_t was also guaranteed prior to 84717f7; the removed feature test is <=1995, though it was originally in src/lib/libast/feature/types. After 84717f7 its presence has been implicitly assumed. On the current dev branch ptrdiff_t is used in the following places:

ptrdiff_t d;

#define SFFMT_TFLAG 000000020 /* 't' flag, ptrdiff_t */
#define SFFMT_ZFLAG 000000040 /* 'z' flag, size_t */

else if(flags&SFFMT_TFLAG)
size = sizeof(ptrdiff_t);

else if(flags&SFFMT_TFLAG)
size = sizeof(ptrdiff_t);

else if(flags&SFFMT_TFLAG)
size = sizeof(ptrdiff_t);

sfprintf(state->tmp, "%s(?{%I*o})", b, sizeof(ptrdiff_t), (intptr_t)x);

Any platform that doesn't implement ptrdiff_t has been broken since the aforementioned commit (assuming such even exist). I could readd a revised form of the old feature test if you think it's actually necessary. (Side note: the grep code should use sizeof(intptr_t) since it's using a pointer represented as an integer; that's now fixed as part of the patch series.)

@JohnoKing
Copy link
Author

It's also worth noting on ARM CHERI intptr_t is 128-bit (and contains more than the pointer address) whilst ptrdiff_t is 64-bit. In that case you still only have 64-bits for addressable memory, meaning intptr_t and ptrdiff_t aren't interchangeable.

@McDutchie
Copy link

McDutchie commented Jan 30, 2026

Then let's at least do something like

/* (u)intksh_t will typically be 32 bits wide in 32-bit binaries and 64 bits wide in 64-bit binaries */
typedef ptrdiff_t intksh_t;
typedef unsigned ptrdiff_t uintksh_t;

to make the source more readable, and also make it easier to change the approach later, should that become necessary. This would go well in defs.h.

Of course ptrdiff_t should continue to be used directly for storing values that are actually the result of subtracting two pointers.

@JohnoKing
Copy link
Author

Of course ptrdiff_t should continue to be used directly for storing values that are actually the result of subtracting two pointers.

The ksh codebase frequently assumes the result from pointer subtraction will have the same bitness as the size variables, and that the results can themselves be used for sizes. For just one example, lex_advance() in lex.c is passed a ptrdiff_t result from fcfill() that is then used with another ptrdiff_t for an argument to sfwrite(), which takes size_t and consequently implicitly casts the int n to size_t:

if((n = ptr-_Fcin.fcbuff) && _Fcin.fcfun)
(*_Fcin.fcfun)(f,(const char*)_Fcin.fcbuff,n,_Fcin.context);

ksh/src/cmd/ksh93/sh/fcin.c

Lines 125 to 129 in d462de6

void fcnotify(void (*fun)(Sfio_t*,const char*,int,void*),void* context)
{
_Fcin.fcfun = fun;
_Fcin.context = context;
}

fcnotify(lex_advance,lp);

static void lex_advance(Sfio_t *iop, const char *buff, int size, void *context)

ksh/src/cmd/ksh93/sh/lex.c

Lines 141 to 142 in d462de6

int n = size - (lp->lexd.docend-(char*)buff);
sfwrite(sh.strbuf,lp->lexd.docend,n);

On my current thickfold branch there are currently a grand total of 993 results for ptrdiff_t returned by a grep search through ./src. While not all of them are strictly for pointer differences, a large number are, and many of those simultaneously assume the pointer difference will also be used as a size or for a size. This is a consequence of the codebase prior to these changes frequently using int as though it could represent both pointer differences and sizes. Accurately replacing all ptrdiff_t variables that aren't used in any way for pointer arithmetic would be very time consuming.

also make it easier to change the approach later

I doubt it can be changed later. To reiterate the point from the Emacs document I quoted earlier, objects larger than PTRDIFF_MAX break pointer subtraction. Changing the types to use something else other than ptrdiff_t is likely to yield little to no benefit; in the best case scenario the code base would be refactored to avoid using signed types and pointer subtraction for any sizes (which isn't happening).

make the source more readable

A nonstandard type definition doesn't necessarily improve readability either. A coder unfamiliar with the ksh codebase isn't going to have any idea what an intksh_t is supposed to represent. (A name change could fix that, but you're still left with PTRDIFF_MAX as an unavoidable pitfall.) ptrdiff_t is part of the C standard and is often used for signed sizes in practice, even if that isn't the type's primary purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants