Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Make cstring constructor from char* explicit and string_view conversion implicit #4688

Open
asl opened this issue May 28, 2024 · 7 comments
Labels
compiler-performance Topics on improving the performance of the compiler core. enhancement This topic discusses an improvement to existing compiler code. question This is a topic requesting clarification.

Comments

@asl
Copy link
Contributor

asl commented May 28, 2024

Thinking more about it... What if we do the opposite?

  • Conversion to string_view should be implicit
  • Construction from const char* should be explicit

Here is the rationale: currently it is not possible to have a string_view function argument if the function is assumed to accept both cstring and const char*. E.g. the following:

void foo(cstring);
void foo(string_view);

foo("bar")

Is ambiguous as cstring could be constructed from char* implicitly, string_view could be constructed from char* implicitly, but we cannot drop the cstring overload as cstring cannot be implicitly converted to string_view.

Certainly, there are lots of examples in codebase that do e.g.

cstring getName() const {  return "Foo"; }

that will be broken. But IMO these should be all fixed to cstring::literal call and explicit conversion to cstring should emphasize that this is not a cheap operation (involving strlen + map lookup).

Originally posted by @asl in #4676 (comment)

@asl
Copy link
Contributor Author

asl commented May 28, 2024

Tagging @vlstill @fruffy @ChrisDodd for opinions

@fruffy
Copy link
Collaborator

fruffy commented May 28, 2024

Personally I think this is a good idea, even though it may be a lot of extra refactoring work. Making this explicit will make people think twice whether they want to use a cstring.
@vlstill also introduced a _cs suffix at some point which should help here? #4342

@asl
Copy link
Contributor Author

asl commented May 28, 2024

@vlstill also introduced a _cs suffix at some point which should help here? #4342

Yes. _cs would reduce the amount of typing indeed. As far as I can see, the backends are major abusers of cstrings. ebpf is the nicest example :) This would affect downstream users certainly, though. However, #4481 (comment) clearly shows that there is lots of things here and there. cstring cache contains the whole copy of the input, for example. As all tokens are stored as cstring's.

@fruffy fruffy added enhancement This topic discusses an improvement to existing compiler code. question This is a topic requesting clarification. labels May 28, 2024
@vlstill
Copy link
Contributor

vlstill commented May 29, 2024

I like the idea of explicit constructor for cstring. I think this would be a major undertaking to do, there is a lot of places in the compiler that count on implicit casts. This would be much simpler if there was some tooling support for making changes like this automatic (it almost seems like something that clang-tidy or something similar should be able to do, but I don't know if there is any solution). That would also make it much simpler for the downstream tools.

I've not tried this one I think, but I've tried to make the construction of IR::ID explicit and that was too much for me. However, without IR::ID explicit constructor, this one does not make much sense. These implicit casts are defined in cstring and IR::ID:

  • const char * -> cstring
  • std::string -> cstring
  • std::stringstream -> cstring
  • const char * -> IR::ID
  • cstring -> IR::ID
  • std::string -> IR::ID

So making cstring explicit is not enough, as there would still be chain of implicit casts const char * -> IR::ID -> cstring :-(.

I've also checked how the standard library does similar conversions and it seems to be similar case as you propose (with the additional problem that the implicit cast in stdlib can introduce dangling reference but cstring -> string_view can't). The std library has:

  • explicit cast std::string_view -> std::string (constructor)
  • implicit cast std::string -> std::string_view (operator)

@asl
Copy link
Contributor Author

asl commented May 29, 2024

@vlstill Yeah. That's lots of work. I already started changes here and there. Some observations:

  • There are lots of cstring's created from string literals. Here _cs is a rescue. Bonus points: this makes the stuff faster as we do not need to copy the string, we just take the address of the literal and put into the map.
  • Many places receives cstring as an argument. They should really take string_view. I'm doing these changes on the way.
  • cstring itself deserves some love. This includes e.g. cstring::startsWith / cstring::endsWith. Currently they take cstring as an argument, essentially caching them. There is certainly no need for this
  • cstring concatenations. Lots of cases. Should definitely require refactoring. I'm putting FIXMEs for now

Maybe something else will occur on my way :)

@fruffy fruffy added the compiler-performance Topics on improving the performance of the compiler core. label May 29, 2024
@asl
Copy link
Contributor Author

asl commented Jun 13, 2024

So, I just moved a downstream backend over the changes in #4694

It was not that painful. Important story that it uncovered lots of rusty cruft accumulated over the time:

  • cstrings created for temporaries
  • cstring literals where just normal string literals would suffice
  • unnecessary to / from cstring conversion
  • lots of (not so) creative ways to create strings here and there

I would say it was definitely worth the cleanup :)

@asl
Copy link
Contributor Author

asl commented Jul 17, 2024

Some story from dowstream code. It had code like this:

void foo(cstring a, cstring b, ...);

void bar() {
  ..
  std::string s = a->toString().c_str();
  if (rare_condition)
    s += prefix;
  foo(s, s);
}

bar() is quite hot function for some reason. As std::string => cstring conversion is still implicit we ended with:

  • std::string allocation (from cstring)
  • two std::string => cstring conversions (that involved map lookup) for no particular case

So, just changing the code to:

void foo(cstring a, cstring b, ...);

void bar() {
  ..
  cstring s = a->toString();
  if (rare_condition)
    s += prefix;
  foo(s, s);
}

yielded a noticeable speedup as no memory allocations and map lookups were performed on most probable hot path. Even on slow path we only pay single price for cstring construction.

This overhead went unnoticed due to implicit conversions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-performance Topics on improving the performance of the compiler core. enhancement This topic discusses an improvement to existing compiler code. question This is a topic requesting clarification.
Projects
None yet
Development

No branches or pull requests

3 participants