-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another idea about type support #774
Comments
|
The type of the keys of the objects would, I think, be determined by a type in the object implementation. I could see a case for having one trait for the value string type, and another one for the serialization type. This might require conversion functions between the two types. |
I've just seen Vinnie's Falco CppCon talk, and I agree that defining and using concepts is the way to go (the part about solving the allocator awareness problem was really interesting). |
Thanks, I'll have a look. |
@theodelrieu Thanks for the reference! Though I did not get all the details, this seems to be exactly the thing we need!!! I also realized @vinniefalco commented on some issues here before. :) |
Thanks for the mention! I don't think the feature is necessary. I believe users want performance and ease of use. They want to customize the allocator, they want fast parsing, they want efficient manipulation of the contents of a JSON container. I don't think it is important to let users control the type of integer for representing numbers. Furthermore by allowing customization points, the implementation becomes highly constrained. In fact we need the opposite, we need to refactor the interface so that the implementation is even less constrained. Every JSON library that I have seen uses separate classes to represent each of the JSON values. These classes are first-class types. In this library you can see them with names such as
Another problem with storing an allocator in every JSON value is that of storage. A stateful allocator is going to require at least 4 or 8 bytes depending on architecture. This will quickly add up for even modestly sized JSON documents. If we can't use first-class types then how will we construct and return values? I think it is worth investigating the use of iterators. A JSON document can have a single associated allocator and expose I have to stress that the returned proxy would not be copyable, or else we are back to the problem of having to store numerous copies of an allocator. I also suspect that in the "ideal" JSON library, the data structure produced by parsing should be different from the data structure produced by assembling a JSON object procedurally from code. It seems there could be performance gains from simply retaining the input buffer on a parse and setting up views over that data. For things like arrays and objects which require some extra data, an associated allocator may be invoked. To support parsing of multiple discontiguous input strings, a parser can handle the case where a string is split across buffers by performing a small allocation to linearize the string. I believe this hybrid approach could produce very nice results. But if we use a different representation for parsing than editing how would that work? There could be a I have not explored these ideas very much but if you think they are interesting I would be happy to schedule a Google Hangout where we can get together and talk about them, and do some exploratory coding up of declarations in a shared code document (like http://codeshare.io/). What do you think? |
This library currently only requires C++11, not C++17. Using |
|
I really need to see Vinnie's talk. I was in the SG14 meeting, so I couldn't see it live. |
Yes, we could have an equivalent class for compilers that don't support C++17, but that doesn't eliminate the other part of the problem, the fact that it doesn't actually provide any storage. |
The storage is provided by the JSON container |
Exactly: There may be some cases where it's possible, but in others, it would require separate storage for the string, so you're back to having std::string or equivalent for that storage. So now you have some arbitrary storage, and a |
Thanks for the quick response! I agree that ease of use is the main reason for people using this library. Secondary reasons are performance (and allocators, of course #766 #25), but also a lot of people ask how they can control various details of the library's behavior such as the number serialization format (#777). I also heard requests for specific number formats (#799) or maintaining the insertion order in objects (#727). Also the parser strategy (pull vs. push) is a frequently discussed issue. With the sketched approach, I wanted to decouple the value types from the rest of the library. @theodelrieu made a similar proposal to split the actual JSON-specific code from the underlying values. I hoped that by specifying the interfaces, the whole library becomes cleaner. At the same time, I also fear that this not only would mean a complete rewrite, but also make things "configurable" in such a clumsy way, that hardly anyone would use this feature... Your description of allocator usage shows me that the described approach would be less helpful as I thought - but I have little to no knowledge on allocators in the first place... I have not fully understood how your iterator-based approach looks like. So I really second the idea of getting together in a Hangout. On the idea of having a view on parsed data: This sounds interesting, but I am not sure whether we should prioritize this. I am also not sure whether this is possible, because strings may need to be processed to cope with "\uxxxx" literals. In any way - I would love to progress the whole type idea, because it has been a hack from the start and it would feel good to have solved this in a nice way. However, I would like to maintain support for C++11 compilers (which still seems to be an issue for many systems), and also would like to keep the library's interface as "STL-like" as it is right now. |
Yes, and also incremental parsing (does this library support that?) I have not yet found a JSON library that allows pieces of input to be supplied in separate calls. This would be very useful for implementing a beast Body container.
Right, so a bit of background. I approach library development from an interface perspective. It seems to me that the interface is the most important aspect of a library, as implementation can always be changed later in a manner that is transparent to users. I'm not suggesting that having a view on parsed data is a priority for implementation, but rather I am saying that a good design will allow for this use-case. It should not be the only parser implementation - a good design will allow multiple parser implementations to present a unified interface for the caller to view the results, while each implementation can have radically different techniques. For example, if the user cares about
To be honest I don't understand it fully either :) I just have a collection of ideas gained from working with JSON in a few projects. They need to be refined to be usable. I think with your considerable experience (I didn't even know about
I totally agree. It is far too early for anyone to be dropping support for C++11 (Beast requires only C++11 and likely will remain that way for years). A JSON library could provide its own |
Consider joining the cpplang Slack: https://cpplang.now.sh/ |
Thanks for your input! I'll gladly join the Hangouts :) |
FYI, there is also a Slack workspace for this project. https://nlohmannjson.slack.com/ I am on both. |
I need an invite to the JSON slack (vinnie.falco@gmail.com) |
Invite sent. |
I had a brief chat with @vinniefalco on Wednesday and he explained to me the issues he sees in the current library implementation. I try to summarize them as I understood them - please correct me if I'm wrong:
As for a Hangout: How about 8 p.m. CET (http://everytimezone.com/#2017-10-30,1860,cn3) on Monday, October 30? |
I'll be available from November 3rd unfortunately, if you don't mind reporting the meeting, I'd really like to participate :) |
I am OK with waiting until November 3. |
@nlohmann Also, there is no single implementation which is optimal for all use-cases. The simplest example is "parse to read-only view" versus "edit a new json document". Another example, consider an "insert-only" container. |
Sorry, I won’t make it tonight. |
Hi, let's post on Slack our disponibilities, to try to meet next week ;) |
I just wanted to throw in my 0.02 here. It's not possible to be optimal for all use cases, but one valuable technique is for your API to have layers: there are lower layers that are still public. These layers make fewer assumptions, are less convenient for common use cases, but are very flexible and "zero cost". IMHO, for a json library there should always be a lowest level public API which is applying a visitor to a stream of data assumed to be in json format (or I guess, multiple functions, one for each fundamental stream of data you can deal with).
The exact interface provided by the visitor and how it receives callbacks is obviously something that requires some thought. But it is not really that complicated. The library should actually call to this function for all of its high level use cases (like creating json objects), so you have that as a sanity check. This very cleanly separates parsing logic, from data structure logic. The link between them is of course an appropriate visitor, that simply populates a recursive datastructure as it receives callbacks. Fun fact: passing a trivial visitor to the Once you make this separation, it becomes easier to inject customization points, because it's clear where they belong. You can also progressively layer things: you can create visitors that make certain assumptions, but allow certain customizations as well, to simplify semi-common use cases like changing the canonical data structures it's parsed into. You can also write a visitor that provides the input iterator type interface. To tie in with some of @vinniefalco 's ideas: if you want to ignore certain types of tokens and quickly parse others, you may not need a data structure at all, just a very simple visitor that has null implementations for some callbacks and immediately handles others on the spot. The only thing this doesn't give you I suppose, is if you really want to manually control the progress of the parser as it steps through. AFAIK though, in order to do this you have to step away from a true recursive descent algorithm and actually keep an explicit stack (which may or may not be slower; would definitely require some thought). I think though that the approach I'm suggesting will cover most use cases. It's basically inversion of control: for loop for std::for_each. The main thing you get when you can manually control step through is the ability to exit parsing early. You can still do this with an exception, but if you want to exit early as an optimization then an exception will probably be too slow. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@nlohmann Any update on using wstrings? I am working on a project were this library would be the perfect fit, but the whole codebase is wstring... |
The problem
The library currently offers some template arguments to configure the types used for numbers, strings, booleans, objects, and arrays. However, this type support is very limited, and even simple things like using
std::unordered_map
for objects or anstd::vector
with a user-defined allocator for arrays do not work. Even worse, there are tons ofstd::string
usages in the code even though we actually have a typebasic_json::string_t
which is a typedef for the passed string type (see #766).With issue #456 came the idea to replace the long template parameter list with a policy class. I like the idea, but I am afraid that this change does not improve things much with respect to the mentioned problems.
Another solution
One possible solution for a true type configuration could be the following: we define interfaces for each type that must be implemented by concrete implementations to be used in the
basic_json
class which would be refactored to only rely on this interface.For instance, we could have a class hierarchy
and finally
to implement the Boolean value.
We could then have a traits class like
and finally a value like
We would provide the default implementations wrapping
bool
,int64_t
,uint64_t
,std::string
,std::vector
, andstd::map
asdefault_value_traits
. If someone likes a different type, one just needs to implement the respective interface, add the implementation to a different traits class and pass it tojson_value
.My questions
What do you think about this approach?
What types should we make configurable. I currently think of
Should the keys of objects be of the same string type? What about strings used during serialization?
What kind of constructors should we require? I sketched that Booleans should be constructible from
bool
. But what else? Should we require strings to be constructible fromconst char*
? Or alsostd::string
?How much overhead do we need to pay with this approach?
How much behavior should we require? For instance, should a type bring its own
dump()
function? Should a number type bring itsfrom_string(const char*)
function for parsing?This is a rough idea right now, but I would really appreciate feedback. I know there is a lot of work to be done, but I really want to get rid of all the limitations I described above.
The text was updated successfully, but these errors were encountered: