Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider exposing serialization in public headers #828

Open
aarongreig opened this issue Aug 24, 2023 · 7 comments
Open

Consider exposing serialization in public headers #828

aarongreig opened this issue Aug 24, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@aarongreig
Copy link
Contributor

Being able to serialize API objects and functions is something many users of UR are going to want to do, and as it stands they have to do it themselves. We do have pretty much all the enum serialization etc you'd ever need in ur_params.hpp but it isn't a public interface. We should consider moving it out of common into include to allow deduplication of this kind of code

@pbalcer
Copy link
Contributor

pbalcer commented Aug 28, 2023

We need to consider a few things before we make this a public interface:

  1. Revise names. This was initially called ur_params because it was meant for printing out _params_t structures, which is really only useful in tracing. Now that we want to make it more broadly applicable, it might be a good idea to come up with a more descriptive name. stringify/print/display?
    We are also calling some functions in the header serialize.... This can be misleading, since a) we don't have a matching deserialize and b) for the most part, it wouldn't even be possible to deserialize what this is outputting since we don't adhere to any specific format (and the process is lossy anyway).
  2. This header is huge (~16k lines at the moment). We should consider splitting it into smaller but still usable pieces.
  3. The operator<< implementations are outside the namespace. This is for convenience because the types they are implemented for are also outside the namespace, so the compiler will not find the operator implementations automatically. If this is meant to be public, I think I'd rather have the operators inside of the namespace and require bringing it into the scope or just fully qualifying the calls.
  4. Should we have a C API for this?

Luckily, the current implementation is header-only (I was wrong earlier), so that's one less thing to worry about.

@lukaszstolarczuk lukaszstolarczuk added the enhancement New feature or request label Aug 28, 2023
@kbenzie
Copy link
Contributor

kbenzie commented Aug 28, 2023

For points 3/4 I when working on urinfo the need to create a std::stringstream to stringify things is not ideal. I came here to suggest a std::to_string type interface but it probably makes sense to have this available as a C API and build C++ wrappers on top of that.

@pbalcer
Copy link
Contributor

pbalcer commented Aug 28, 2023

I came here to suggest a std::to_string type interface

That would imply a memory allocation per displayed type. When implementing the logger, we wanted to implement a poor's man std::format using that same code, allowing you to do auto ur_type_str = format("{}", some_ur_type);. I'm not sure what happened to that. @PatKamin ?

@kbenzie
Copy link
Contributor

kbenzie commented Aug 28, 2023

I'm not tied to the std::to_string interface but it would potentially be fewer allocations than interacting with the std::stringstream then converting to std::string before then manipuating that string.

That would imply a memory allocation per displayed type.

If it were a C API would there ever be a need to return anything other than a const char* pointing to constant data in the executable?

@pbalcer
Copy link
Contributor

pbalcer commented Aug 28, 2023

If it were a C API would there ever be a need to return anything other than a const char* pointing to constant data in the executable?

Yes, only enum string representation is const. Most everything else needs to be dynamically constructed (maybe for flags we could precompute all possible combinations, but that's probably too much :-)). I think it would be better to be consistent. Ultimately, I don't think allocating these string is that much of a problem for user-facing APIs. I was mostly worried about blowout of allocations when stringifying complex, possibly nested, structures. But that's only a problem if we were to use something like to_string internally in the implementation (e.g., stringifying a single larger _params_t could require dozen or more tiny allocations. Every array element might also need to be an allocation with this scheme), but that's easily avoided.

@PatKamin PatKamin self-assigned this Sep 5, 2023
@PatKamin
Copy link
Contributor

Part of this header could be still in common, unexposed to the user. All serialize... and is_handle<> functions are kind of helper functions for stream operators and as such they could be still internal to the project. This is leaving operators overload part to be exposed in include dir and in the API.

Now, if this were to be a C API, and I think it has to be, to be conformant with the rest of the UR API, the new set of functions would require to have either precompiled messages or buffers big enough to store possible strings. The helper functions will be rewritten with char arrays as buffers, replacing current ostream usage. All the reduced memory footprint from having heavy ostream usage would be lost.

Maybe the right way would be to expose current operators overload alongside the new C API set of functions and making C and C++ API mutually exclusive during compilation? This would make stream operators not merely a wrapper for a C API, they would still present a more memory-efficient way of printing objects (but when linked as a C++ code only) comparing to char array usage.

@pbalcer
Copy link
Contributor

pbalcer commented Sep 14, 2023

the new set of functions would require to have either precompiled messages or buffers big enough to store possible strings

I don't think it's possible to have all the strings precompiled. Unless we just do enums. As for having a preallocated buffer, that could work, but would probably have to be per-thread and be dynamically resizable. Users would have to be aware that calling a stringify function would overwrite or even reallocate that buffer - that I don't like. Just returning an allocation would be simpler (worst case scenario it would just be an easily detectable leak).

Maybe the right way would be to expose current operators overload alongside the new C API set of functions and making C and C++ API mutually exclusive during compilation?

How would you enforce that? With some macros? And why? Even if the two APIs are different, it shouldn't be a problem to have both of them in a program at the same time.

This would make stream operators not merely a wrapper for a C API, they would still present a more memory-efficient way of printing objects (but when linked as a C++ code only) comparing to char array usage.

I do think we need a way to implement printing complex structs in a way that doesn't involve doing dozens small short-lived allocations every time. But we can do that entirely on the implementation side (which could leverage the existing C++ code).
However, if that doesn't work for some reason and we need to be able to avoid the allocation in the C API, one approach could be to require the user to provide a write callback that would be then used to print out the string. I/O libraries use this approach often (e.g., libuv) to avoid allocations inside the library. To make it simpler, we could provide some default implementations for the callback (to a fd, to a new allocation and so on).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants