Skip to content

Feature/104258 str functions #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

FlufflesTheMicrosaur
Copy link
Contributor

@@ -396,14 +396,52 @@ static PRIMITIVEPROCDEF g_DefPrimitives[] =
"(sqrtn x) -> real z",
"n", 0, },

{ "strBeginsWith", fnStr, FN_STR_BEGINS_WITH,
"(strBeginsWith string matchString [caseSensitive=Nil]) -> True|Nil",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style of documenting an optional argument [caseSensitive=Nil] does not match the existing code. To me it reads like you're describing an optional keyword argument.

  • For an optional positional argument as you've implemented it (i.e. pass a non-Nil value to use) then it should be just [caseSensitive]
  • Alternatively if you want a "keyword-style" option as used in some of recently added functions you'd have ['caseSensitive] i.e. for use in code as (strBeginsWith string substring 'caseSensitive)

Note - could also use the slight shorter (= (find string substring) 0), but I expect begins with is slightly faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think non-keyword optionals should always specify their default, it would also protect against creating bad overloads where Nil and not providing an argument are treated differently (ex, old objDestroy)

For a keyword arg I'd probably just change it to use a shorter keyword.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine with me as long as it's consistent - if the style is changed it should be a clear decision to change and someone should update all the other doc strings so optional args specify their default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, would need to be in its own PR though. We'll raise it w/ george on monday

{ "strCapitalize", fnStrCapitalize,0,
"(strCapitalize string) -> string",
NULL, PPFLAG_SIDEEFFECTS, },

{ "strContains", fnStr, FN_STR_CONTAINS,
"(strContains string substring [caseSensitive=Nil]) -> True|Nil",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strContains is unnecessary - you can already use find (or strFind) which return a non-Nil result when the substring is contained in string

  • This does add the caseSensitive option - but that should really be added to find/strFind if it's needed

Copy link
Contributor Author

@FlufflesTheMicrosaur FlufflesTheMicrosaur Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was sort of a freebie option with strCount that was a bit faster if you just needed to check for presence. Its probably possible to just redirect find/strFind to the new case sensitive code though and return an index instead of True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I remember the other rationale for it returning a bool - so you can just do (conditionalThing (strContains "example" "e") ...) instead of needing to check for non-Nil. The reason for all these compacted/minimized function call versions is for stuff like Dynamic Data ROMs that ends up doing a bunch of parsing over large blocks of text.

I mean, at the end of the day, keeping this and then ALSO rerouting strFind and having an option for returning an int instead of True in the code is easy enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to check for non-Nil as any integer will be considered True in a conditional i.e. you can write

(if (find "examlpe" "e") ...)

or

(switch
   (find "example" "e")
      (block ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... wow 0 evaluating to True is super unintuitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was empty string being "True" that caught me out - see https://ministry.kronosaur.com/record.hexm?id=72735

Basically only Nil and empty list / struct are "False", anything else is "True"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this needs to actually be documented somewhere prominent/accessible

{ "strFind", fnStrFind, 0,
"(strFind string target) -> pos of target in string (0-based)",
"ss", 0, },

{ "strJoin", fnStr, FN_STR_JOIN,
"(strJoin delimiter listOfStrings) -> string. Does not ignore Nil or empty strings.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing join does not ignore empty strings - it only skips Nil values.

I think it would be better to add an option (e.g. ['includeNil] or keepNil etc.) to the existing join function than have two almost but not quite identical versions. However, if you do go with a new function then I'd suggest

  • keep the same argument order as join
  • the list argument is not restricted to a list of strings, so just call it list

e.g "(strJoin list separator) -> string. Like join but converts Nil to empty string.",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation for this was for a function that when I use it, I likely am doing something where I have to use it a lot at once, I prefer not having to repeatedly type out keyword arguments to reduce strain & also visual clutter in the code.

It doesnt help that some editors (vscode at least) want to keep adding '' instead of just one ', which is normally very useful but here is very much not since transcendence complains about that.


{ "strSlice", fnStr, FN_STR_SLICE,
"(strSlice string pos [count]) -> string.\n"
"If pos is negative, the index is relative to the end of the string. The last character is at pos -1.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could extend the functionality of the existing subset function

  • An easy way to index from the end could be useful for lists as well as strings
  • Or could add a ['fromEnd] or ['descending] option to avoid any possible bugs if existing code relies on negative indices being equivalent to 0. For example (subset list|string 10 'fromEnd)

Note - if user needs an empty string rather than Nil they can get those with low overhead using: (or (subset string pos) "")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subset is in a sticky situation because of the current behavior (where <0 = 0) and I'm pretty sure some stuff somewhere is just expecting that to be the case and passing unfiltered subtraction into it

But on the other hand I dont want to have to be verbosely specifying some keyword argument in a function I have to type out repeatedly as mentioned above.

It would be handy to have that feature for lists too

I think what I might do for now is break this out of fnStr, merge it with the join logic and use a case to switch between join and slice, changing the args to be (slice strOrList start [end=-1]) which makes it consistent with the concept of python slices (where it got its name) since we dont have that option for just directly using an end and avoiding an additional tlisp call. I guess I can modify join with a '- keyword to enable that allow-negatives feature on there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean merge 'slice' with 'subset' i.e.`

{   "slice", fnSubset, FN_SLICE,
    "(slice list|string start [end=-1])"...  // Python style indexing
{   "subset", fnSubset, FN_SUBSET,
    (subset list|string start [count=Nil]  // existing behaviour

fnSubset would have the switch / case to determine how the start/end/count arguments are interpreted?

BTW - it looks like fnSubset is checking all the arguments so you can't go off either end of the string. But I think your current fnStr / FN_STR_SLICE will go out of bounds if you pass a large negative index e.g.

(strSlice "short string" -100000)

Would internally call strSubString(sSource, 12-100000, 12)
and strSubString only checks for offset > length, so will blindly access 100,000 bytes before the pointer...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice & subset:
The intent is to have them as separate functions so we dont need to repeatedly use a keyword argument to get the desired behavior for something we will be calling a lot in the code that would use it
strSlice:
Oh good catch, didnt realize strSubString didnt have a check for negative access. I'll fix that.

@FlufflesTheMicrosaur
Copy link
Contributor Author

Also I think I'm going to rework a lot of the functionality into lower level code in Kernel and then simply call those functions from tlisp handlers, since this is potentially useful stuff to have elsewhere in the engine too.

…d in revised version of some tlisp functions
@FlufflesTheMicrosaur
Copy link
Contributor Author

FlufflesTheMicrosaur commented Aug 10, 2025

TODO list of changes to make:

  • get rid of strContains as a tlisp function (dont remove the code though, can be used in core engine code, see later step)
  • reroute strFind to strContains logic and return an int instead
  • (optional) add strFindAll which can return a list of all indexes that are found
  • fix negative offset in strSubString in Kernel
  • support strSlice (from start to end) in Kernel
  • support contains/count/find/findAll in Kernel
  • support beginsWith/endsWith in Kernel
  • flip args and fix docs for strJoin
  • move handling of subset and slice to their own non-string-specific function (might do this as a different PR?)

@gcabbage
Copy link
Contributor

BTW - @NMS127 added a nice Tlisp implementation of string replacement back in kronosaur/Transcendence#69. It won't as fast as your native version, but it looks to be 10-100x faster than the DDR one.

@FlufflesTheMicrosaur
Copy link
Contributor Author

Yeah, due to running into unexpected behavior (due to lack of real documentation for tlisp) we encountered in making DDR, we had ended up just writing code that takes nothing for granted and takes the safest path towards working at the expense of efficiency.

Also there needs to be a help function that covers lambdas & aliases...

@gcabbage
Copy link
Contributor

See Support Docstrings for lambda functions and proof of concept implementation: https://github.com/gcabbage/Alchemy/tree/lambda-help
Another one waiting for feedback / ministry search / some tickets being closed so it's find able again

@FlufflesTheMicrosaur
Copy link
Contributor Author

Bumped it

Also there technically is ministry search on discord via anton (it only works for transcendenceDev right now due to george not having finished some APIs), but we've just been trying to make tickets before forgetting them and forgetting to actually check, hence the duplicate that got made.

@FlufflesTheMicrosaur
Copy link
Contributor Author

I have added test to the Tlisp unit tests as well now

@FlufflesTheMicrosaur FlufflesTheMicrosaur changed the base branch from integration/API55 to master August 12, 2025 01:01
@gmoromisato gmoromisato merged commit 9b78d19 into kronosaur:master Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants