Feature/104258 str functions #167

FlufflesTheMicrosaur · 2025-08-09T05:58:36Z

https://ministry.kronosaur.com/record.hexm?id=104258

… incrementing off-by-one mismatch error

…ch rest of code

gcabbage · 2025-08-09T12:49:31Z

Alchemy/CodeChain/DefPrimitives.h

@@ -396,14 +396,52 @@ static PRIMITIVEPROCDEF g_DefPrimitives[] =
 			"(sqrtn x) -> real z",
 			"n",	0,	},

+		{	"strBeginsWith",	fnStr, FN_STR_BEGINS_WITH,
+			"(strBeginsWith string matchString [caseSensitive=Nil]) -> True|Nil",


This style of documenting an optional argument [caseSensitive=Nil] does not match the existing code. To me it reads like you're describing an optional keyword argument.

For an optional positional argument as you've implemented it (i.e. pass a non-Nil value to use) then it should be just [caseSensitive]

Alternatively if you want a "keyword-style" option as used in some of recently added functions you'd have ['caseSensitive] i.e. for use in code as (strBeginsWith string substring 'caseSensitive)

Note - could also use the slight shorter (= (find string substring) 0), but I expect begins with is slightly faster

I think non-keyword optionals should always specify their default, it would also protect against creating bad overloads where Nil and not providing an argument are treated differently (ex, old objDestroy)

For a keyword arg I'd probably just change it to use a shorter keyword.

Either way is fine with me as long as it's consistent - if the style is changed it should be a clear decision to change and someone should update all the other doc strings so optional args specify their default

Yeah, would need to be in its own PR though. We'll raise it w/ george on monday

gcabbage · 2025-08-09T12:55:10Z

Alchemy/CodeChain/DefPrimitives.h

 		{	"strCapitalize",	fnStrCapitalize,0,
 			"(strCapitalize string) -> string",
 			NULL,	PPFLAG_SIDEEFFECTS,	},

+		{	"strContains",	fnStr, FN_STR_CONTAINS,
+			"(strContains string substring  [caseSensitive=Nil]) -> True|Nil",


strContains is unnecessary - you can already use find (or strFind) which return a non-Nil result when the substring is contained in string

This does add the caseSensitive option - but that should really be added to find/strFind if it's needed

This one was sort of a freebie option with strCount that was a bit faster if you just needed to check for presence. Its probably possible to just redirect find/strFind to the new case sensitive code though and return an index instead of True

Oh, I remember the other rationale for it returning a bool - so you can just do (conditionalThing (strContains "example" "e") ...) instead of needing to check for non-Nil. The reason for all these compacted/minimized function call versions is for stuff like Dynamic Data ROMs that ends up doing a bunch of parsing over large blocks of text.

I mean, at the end of the day, keeping this and then ALSO rerouting strFind and having an option for returning an int instead of True in the code is easy enough.

You shouldn't need to check for non-Nil as any integer will be considered True in a conditional i.e. you can write

(if (find "examlpe" "e") ...)

or

(switch (find "example" "e") (block ...

... wow 0 evaluating to True is super unintuitive.

It was empty string being "True" that caught me out - see https://ministry.kronosaur.com/record.hexm?id=72735

Basically only Nil and empty list / struct are "False", anything else is "True"

I feel like this needs to actually be documented somewhere prominent/accessible

gcabbage · 2025-08-09T13:05:14Z

Alchemy/CodeChain/DefPrimitives.h

 		{	"strFind",			fnStrFind,		0,
 			"(strFind string target) -> pos of target in string (0-based)",
 			"ss",	0,	},

+		{	"strJoin",	fnStr, FN_STR_JOIN,
+			"(strJoin delimiter listOfStrings) -> string. Does not ignore Nil or empty strings.",


The existing join does not ignore empty strings - it only skips Nil values.

I think it would be better to add an option (e.g. ['includeNil] or keepNil etc.) to the existing join function than have two almost but not quite identical versions. However, if you do go with a new function then I'd suggest

keep the same argument order as join

the list argument is not restricted to a list of strings, so just call it list

e.g "(strJoin list separator) -> string. Like join but converts Nil to empty string.",

The main motivation for this was for a function that when I use it, I likely am doing something where I have to use it a lot at once, I prefer not having to repeatedly type out keyword arguments to reduce strain & also visual clutter in the code.

It doesnt help that some editors (vscode at least) want to keep adding '' instead of just one ', which is normally very useful but here is very much not since transcendence complains about that.

gcabbage · 2025-08-09T13:28:36Z

Alchemy/CodeChain/DefPrimitives.h

+
+		{	"strSlice",	fnStr, FN_STR_SLICE,
+			"(strSlice string pos [count]) -> string.\n"
+			"If pos is negative, the index is relative to the end of the string. The last character is at pos -1.",


Or you could extend the functionality of the existing subset function

An easy way to index from the end could be useful for lists as well as strings

Or could add a ['fromEnd] or ['descending] option to avoid any possible bugs if existing code relies on negative indices being equivalent to 0. For example (subset list|string 10 'fromEnd)

Note - if user needs an empty string rather than Nil they can get those with low overhead using: (or (subset string pos) "")

subset is in a sticky situation because of the current behavior (where <0 = 0) and I'm pretty sure some stuff somewhere is just expecting that to be the case and passing unfiltered subtraction into it

But on the other hand I dont want to have to be verbosely specifying some keyword argument in a function I have to type out repeatedly as mentioned above.

It would be handy to have that feature for lists too

I think what I might do for now is break this out of fnStr, merge it with the join logic and use a case to switch between join and slice, changing the args to be (slice strOrList start [end=-1]) which makes it consistent with the concept of python slices (where it got its name) since we dont have that option for just directly using an end and avoiding an additional tlisp call. I guess I can modify join with a '- keyword to enable that allow-negatives feature on there

Do you mean merge 'slice' with 'subset' i.e.`

{ "slice", fnSubset, FN_SLICE, "(slice list|string start [end=-1])"... // Python style indexing { "subset", fnSubset, FN_SUBSET, (subset list|string start [count=Nil] // existing behaviour

fnSubset would have the switch / case to determine how the start/end/count arguments are interpreted?

BTW - it looks like fnSubset is checking all the arguments so you can't go off either end of the string. But I think your current fnStr / FN_STR_SLICE will go out of bounds if you pass a large negative index e.g.

(strSlice "short string" -100000)

Would internally call strSubString(sSource, 12-100000, 12)
and strSubString only checks for offset > length, so will blindly access 100,000 bytes before the pointer...

slice & subset:
The intent is to have them as separate functions so we dont need to repeatedly use a keyword argument to get the desired behavior for something we will be calling a lot in the code that would use it
strSlice:
Oh good catch, didnt realize strSubString didnt have a check for negative access. I'll fix that.

FlufflesTheMicrosaur · 2025-08-09T19:10:28Z

Also I think I'm going to rework a lot of the functionality into lower level code in Kernel and then simply call those functions from tlisp handlers, since this is potentially useful stuff to have elsewhere in the engine too.

…d in revised version of some tlisp functions

FlufflesTheMicrosaur · 2025-08-10T07:13:33Z

TODO list of changes to make:

get rid of strContains as a tlisp function (dont remove the code though, can be used in core engine code, see later step)
reroute strFind to strContains logic and return an int instead
(optional) add strFindAll which can return a list of all indexes that are found
fix negative offset in strSubString in Kernel
support strSlice (from start to end) in Kernel
support contains/count/find/findAll in Kernel
support beginsWith/endsWith in Kernel
flip args and fix docs for strJoin
move handling of subset and slice to their own non-string-specific function (might do this as a different PR?)

gcabbage · 2025-08-10T09:51:56Z

BTW - @NMS127 added a nice Tlisp implementation of string replacement back in kronosaur/Transcendence#69. It won't as fast as your native version, but it looks to be 10-100x faster than the DDR one.

…solute

…eAbsolute' into feature/104258-strFunctions

FlufflesTheMicrosaur · 2025-08-10T10:37:24Z

Yeah, due to running into unexpected behavior (due to lack of real documentation for tlisp) we encountered in making DDR, we had ended up just writing code that takes nothing for granted and takes the safest path towards working at the expense of efficiency.

Also there needs to be a help function that covers lambdas & aliases...

gcabbage · 2025-08-10T10:51:09Z

See Support Docstrings for lambda functions and proof of concept implementation: https://github.com/gcabbage/Alchemy/tree/lambda-help
Another one waiting for feedback / ministry search / some tickets being closed so it's find able again

FlufflesTheMicrosaur · 2025-08-10T11:11:49Z

Bumped it

Also there technically is ministry search on discord via anton (it only works for transcendenceDev right now due to george not having finished some APIs), but we've just been trying to make tickets before forgetting them and forgetting to actually check, hence the duplicate that got made.

… versions, created section in KernelString.h for functions required for inline calls

…ith/strEndsWith/strCount to use new kernel implementations, remove strContains

…tion for strFind

…d strBegins/EndsWith, strCount, strFind, strFindAll

…strReplace, strJoin

FlufflesTheMicrosaur · 2025-08-11T22:02:07Z

I have added test to the Tlisp unit tests as well now

# Conflicts: # Alchemy/CodeChain/Functions.h # Alchemy/Include/KernelString.h # Mammoth/Include/TSEVersions.h

…GBColor but wasnt updated here

…o DWToARGBColor but wasnt updated here" (commit made on wrong branch) This reverts commit ea73b1a.

…rSplit

…ments do

…f duplicated code

…reused, use strTrimWhitespace for default behavior

…m later

FlufflesTheMicrosaur added 9 commits August 8, 2025 03:47

feat: 104258: initial partial implementation of str functions

37962d2

feat: 104258: simplify some logic with initial functions, fix some bugs

4f213e4

feat: 104258: add strJoin and strReplace. strReplace currently has an…

d5c5d0b

… incrementing off-by-one mismatch error

fix: 104258: fix strReplace off by one error

b23b9e3

feat: 104258: add strSlice strStrip strSplit

2c32c02

chore: 104258: remove unnecessary checks, improve line spacing to mat…

349c70d

…ch rest of code

chore: 104258: more cleanup

991189e

chore: 104258: add docs to version.h

ab46671

chore: 104258: add warning about no api check

d025ff3

gcabbage reviewed Aug 9, 2025

View reviewed changes

fix: 104258: fix negative pointer offset in Kernel::strSubString

9831a2a

feat: 104258: add strSubStringWrapAround for KernelString.h to be use…

41b2cd2

…d in revised version of some tlisp functions

feat: 104258: add strSlice to KernelString

009136f

FlufflesTheMicrosaur added 2 commits August 10, 2025 02:52

feat: 104332: shareable case sensitive implementation of strCompareAb…

6e30560

…solute

Merge remote-tracking branch 'origin/feature/case_sensitive_strCompar…

9d47b31

…eAbsolute' into feature/104258-strFunctions

FlufflesTheMicrosaur added 8 commits August 10, 2025 18:40

feat: add strStartsWith/strEndsWith using common code, deprecated old…

a6ff97e

… versions, created section in KernelString.h for functions required for inline calls

feat: 104258: switch args on strJoin to match join, switch strBeginsW…

212394a

…ith/strEndsWith/strCount to use new kernel implementations, remove strContains

feat: 104258: slice & subset improvements for both strings and lists

075e8af

feat: 104258: add strFindAll, improves docs, enable case sensitive op…

0e90c79

…tion for strFind

chore: 104258: add subset and slice list unittests

ea18084

chore: 104258: add unit tests for str version of subset and slice, an…

92a0af6

…d strBegins/EndsWith, strCount, strFind, strFindAll

fix: 104258: fix bug with strJoin, add tests for strSplit, strStrip, …

3186952

…strReplace, strJoin

fix: 104258: off-by-one edgecase in strReplace

d41cd7e

FlufflesTheMicrosaur changed the base branch from integration/API55 to master August 12, 2025 01:01

FlufflesTheMicrosaur added 8 commits August 11, 2025 18:35

Merge branch 'master' into feature/104258-strFunctions

f3dfc1f

# Conflicts: # Alchemy/CodeChain/Functions.h # Alchemy/Include/KernelString.h # Mammoth/Include/TSEVersions.h

fix: 104358: fix merge issue where DWToRGBColor was renamed to DWToAR…

ea73b1a

…GBColor but wasnt updated here

Revert "fix: 104358: fix merge issue where DWToRGBColor was renamed t…

6a59693

…o DWToARGBColor but wasnt updated here" (commit made on wrong branch) This reverts commit ea73b1a.

chore 104258: cleanup strSplit code, add additional test cases for st…

633b2bd

…rSplit

Chore: 104258: fix some comments to use the same spacing existing com…

6c2f451

…ments do

chore: 104258: refactor strReplace to use new str functions instead o…

ff44fee

…f duplicated code

chore: 104258: refactor strStrip into a kernel function so it can be …

dbc687a

…reused, use strTrimWhitespace for default behavior

chore: 104258: remove strJoin for implementation in more advanced for…

2abe717

…m later

gmoromisato merged commit 9b78d19 into kronosaur:master Aug 18, 2025

Feature/104258 str functions #167

Feature/104258 str functions #167

Conversation

FlufflesTheMicrosaur commented Aug 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FlufflesTheMicrosaur Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FlufflesTheMicrosaur commented Aug 9, 2025

Uh oh!

FlufflesTheMicrosaur commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gcabbage commented Aug 10, 2025

Uh oh!

FlufflesTheMicrosaur commented Aug 10, 2025

Uh oh!

gcabbage commented Aug 10, 2025

Uh oh!

FlufflesTheMicrosaur commented Aug 10, 2025

Uh oh!

FlufflesTheMicrosaur commented Aug 11, 2025

Uh oh!

Uh oh!

FlufflesTheMicrosaur Aug 9, 2025 •

edited

Loading

FlufflesTheMicrosaur commented Aug 10, 2025 •

edited

Loading