Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize string operations throughout the codebase #86235

Closed

Conversation

MewPurPur
Copy link
Contributor

@MewPurPur MewPurPur commented Dec 16, 2023

A more manageable version of #82412 that does more types of optimizations, but only applies them to user-exposed classes (so the total volume of them is a lot less). No benchmarking done.

For reference, most of them are like:

  • s.substr(0, x) -> s.left(x)
  • s.substr(0, s.length() - x) -> s.left(-x)
  • s.substr(x, s.length()) -> s.right(-x)
  • s.substr(x, s.length() - x) -> s.right(-x)

I think all of these are equivalent 100% of the time, but they are ~15% faster (benchmarking from my PR that optimized left and right).

core/string/translation_po.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
scene/gui/file_dialog.cpp Outdated Show resolved Hide resolved
scene/gui/rich_text_label.cpp Outdated Show resolved Hide resolved
scene/gui/rich_text_label.cpp Outdated Show resolved Hide resolved
scene/gui/rich_text_label.cpp Outdated Show resolved Hide resolved
@AThousandShips
Copy link
Member

It appears the changes breaks some code, see the broken doc generation

@MewPurPur MewPurPur force-pushed the optimize-string-funcs branch 2 times, most recently from 287214b to f1a6d94 Compare December 16, 2023 20:10
@AThousandShips
Copy link
Member

Unit tests show their worth, let's see what's broken with your implementation of the string methods, will do some looking into it tomorrow if you haven't solved 🙂

@MewPurPur
Copy link
Contributor Author

MewPurPur commented Dec 16, 2023

I can't figure it out from just looking 😞

I can debug more thoroughly tomorrow.

@kleonc
Copy link
Member

kleonc commented Dec 17, 2023

I think all of these are equivalent 100% of the time

These are not the same for x < 0:

  • s.substr(0, x) -> s.left(x)
  • s.substr(x, s.length()) -> s.right(-x)

These are not the same for x <= 0:

  • s.substr(0, s.length() - x) -> s.left(-x)
  • s.substr(x, s.length() - x) -> s.right(-x)

but they are ~15% faster (benchmarking from my PR that optimized left and right).

The mentioned PR: #80824 (please link next time you refer to some other PR).
Its description is a little unclear to me but I understand it as the benchmarked speedup is for old left/right vs new left/right. In such case it doesn't say anything about the performance of left/right vs substr, and this would need to be benchmarked.
But I'd rather focus on trying to optimize substr in the first place, e.g. are the length() calls optimized out by the compiler and done once? If not then these should cached/ called once manually (and used in check instead of is_empty() etc.).

godot/core/string/ustring.cpp

Lines 2975 to 2995 in 2d0ee20

String String::substr(int p_from, int p_chars) const {
if (p_chars == -1) {
p_chars = length() - p_from;
}
if (is_empty() || p_from < 0 || p_from >= length() || p_chars <= 0) {
return "";
}
if ((p_from + p_chars) > length()) {
p_chars = length() - p_from;
}
if (p_from == 0 && p_chars >= length()) {
return String(*this);
}
String s;
s.copy_from_unchecked(&get_data()[p_from], p_chars);
return s;
}

Same for other methods (like left/right). Note it's already like that e.g. in find:

godot/core/string/ustring.cpp

Lines 3002 to 3004 in 2d0ee20

const int src_len = p_str.length();
const int len = length();


Regarding code simplification these can be done for sure:

  • s.substr(x, s.length()) -> s.substr(x)
  • s.substr(x, s.length() - x) -> s.substr(x)

I find these better readability-wise, plus it gets rid of redundant s.length() calls on the caller side (length() is called in substr anyway).

@AThousandShips
Copy link
Member

AThousandShips commented Dec 17, 2023

I'd say that unless things are in a very performance critical area readability trumps performance, I'm in favor of the obvious ones like substr(x, length), but generally to stay on the safe side as left/right can be confusing at times, and need to ensure properly the correct conversions

@MewPurPur
Copy link
Contributor Author

MewPurPur commented Dec 17, 2023

Hmpf, so my old PR only did ones with left(), and I did make sure the number was always positive. I thought right() would be about as easy but it's not.

I know that substr() is confusing for a lot of people in a way similar to ternaries, where you constantly have to check if the second argument was the final position, or the substring's length (i.e. does "ABCDEF".substr(1, 3) give BCD or BC?) so I always prefer the dedicated methods when the substring isn't in the middle of the string. But while left() tends to always be readable, right() just sucks for readability when you flip the sign IMO.

I think I'll abandon the idea because of that.

Its description is a little unclear to me but I understand it as the benchmarked speedup is for old left/right vs new left/right. In such case it doesn't say anything about the performance of left/right vs substr, and this would need to be benchmarked.

The old left/right were calling substr() directly, so it seems fair to me. Unless there is a notable penalty for going through an extra function that I don't know of.

@AThousandShips
Copy link
Member

I'd say to remove the potentially ambiguous cases and see if that resolves the CI, that being the right cases with negative indices at least

The substr cases with trivial length should be good IMO, as are the trivial left cases, i.e. substr(0, x), and maybe do right in a future PR with more discussion and evaluation

@MewPurPur MewPurPur force-pushed the optimize-string-funcs branch 5 times, most recently from 524f5ae to 520a728 Compare December 18, 2023 03:05
@MewPurPur
Copy link
Contributor Author

MewPurPur commented Dec 18, 2023

What the heck, is the logic of right() not right or something? AFAIK only rich_text_label.cpp changes could cause this kind of CI fail, and I don't think I've made mistakes here, I've even only kept the "safe" right()s at this point.

I'm tired, will investigate this later.

@MewPurPur
Copy link
Contributor Author

I'll leave this idea at peace for now.

@MewPurPur MewPurPur closed this Mar 14, 2024
@KoBeWi KoBeWi removed this from the 4.x milestone Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants