-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for iterator-range parsing #290
Comments
I think the current implementation would not benefit from an iterator-range access: Inside the parser, I employ a lexer which relies on access to the input via a pointer of type To process a template<typename T>
static basic_json parse(const T& s, const parser_callback_t cb = nullptr) where What do you think? |
So only contiguous containers of 1 byte integrals are supported I believe? (for now at least) Maybe it's better to use methods defined in the Container Concept, and use the It could look like this: template<typename ByteContainer>
static basic_json parse(const ByteContainer& buf, const parser_callback_t cb = nullptr)
{
// ensure std::begin(ByteContainer) is valid + retrieve iterator type
using Traits = std::iterator_traits<decltype(std::begin(std::declval<ByteContainer>()))>;
static_assert(sizeof(Traits::value_type) == 1, "");
// There is no new tag for ContiguousIterator in C++17, won't need to change
static_assert(std::is_same_v<Traits::iterator_category, std::random_access_category_tag>, "");
// quite ugly..., but will work with containers, C arrays
do_lexing(reinterpret_cast<const lexer_char_t*>(&(*std::begin(buf)));
} I think it's reasonable to assume that the container is invalid if iterator_traits cannot be defined with it. What's your opinion ? |
By the way, we can check for contiguity with the Iterator overload too, which removes some Maybe we could add both ? |
This looks good! So far, parsing a string or an input stream could rely on an end of string or end of file mark. For this overload, one would need to pass a length to the parser - I guess Shall I open a feature branch for this? |
Sure! You want the user to give the length to parse? |
If I do not have a prototype in mind. I would just implement and integrate your proposal into a feature branch so we can find out whether we oversaw a dependency. |
Oh ok, I misunderstood what you wanted. |
I started a feature branch and started with a constructor for the It currently only checks if the iterators are random-access. This is, however, not sufficient to guarantee contiguous storage, and I have not found a way to enforce this in C++11. However, first tests (see https://github.com/nlohmann/json/blob/feature/iterator_range_parsing/test/src/unit-class_parser.cpp#L748) show that parsing works with
The next step would be to add overloads for the Any comments? |
This is not a static property, but a runtime property, unfortunately. For example, std::deque can be either contiguous or not depending on the size of the deque. http://stackoverflow.com/questions/35004633/checking-if-a-sequence-container-is-contiguous-in-memory |
@gregmarr Thanks for clarifying. So I think the only way to cope with this is to make clear in the documentation that the behavior is undefined for iterator ranges with non-contiguous storage. Or any better idea? |
Probably. You could probably have an assert that you get the same address for begin iterator plus range length, and the end iterator. |
I added an assertion to do the check from the Stack Overflow article: int i = 0;
assert(std::accumulate(first, last, true, [&i, &first](bool res, decltype(*first) val)
{
return res and (val == *(std::next(std::addressof(*first), i++)));
})); (Any idea how to get the |
(One way would be to pass |
I added a first draft of the deserialization function. Feedback is greatly appreciated. In particular, it would be nice to maybe weaken the preconditions. |
The 1 byte precondition can be enforced at compile time. static_assert(sizeof(std::iterator_traits<IteratorType>::value_type) == 1, ""); I don't know if asserting for non-empty range is a good choice, maybe returning a 'null' value would be better? |
@theodelrieu Right, checking the size can be done at compile time, I will change this. For an empty range, I would rather throw the same assertion as the parser would, but this would require creating some fake data - returning |
I saw you throw some exceptions to indicate parse errors, we might reuse one of those exceptions in this case? |
I see if I can find a nice way to throw a parser error exception for an empty range. Apart of the assertions, would this fix your original need? |
Yes! By the way, you were talking earlier about enhancing the template<typename T>
static basic_json parse(const T& s, const parser_callback_t cb = nullptr) Do you have any plan to do so? It would be great when one wants to parse the whole buffer. I can create a new issue for that though. |
What do you mean? An overload for a type |
I'm talking about your first comment:
|
@theodelrieu I just committed some more changes. There are now the following overloads for the
The last two items allowed to clean up the interface: Note that compared to your initial proposal, I now check whether the
I was skeptical at first, but now I think this made a great addition to the parser. Please let me know if anything is missing - otherwise, I would close this issue and add the feature with the 2.0.3 release. |
Great! I just have a few questions left:
|
Without adding I shall try the other two proposals. |
@gregmarr I saw this as well, but did not want to change anything tonight, because Valgrind and Clang's Address Sanitizers did not complain. And would |
I'm assuming this is using the char * overload.
It's an integral type and its size is 1. |
With Xcode Version 8.0 beta 2 (8S162m), uint8_t v[] = {'t', 'r', 'u', 'e'};
json::parse(v); uses the static basic_json parse(const ContiguousContainer& c,
const parser_callback_t cb = nullptr) overload. Maybe MSVC chooses differently. I shall try tomorrow. Good night! |
I tried on VS2015, I think this is a MSVC bug. Here is a small test to reproduce template <typename ContiguousContainer>
void f(ContiguousContainer &&) {
std::cout << "first\n";
}
template <typename CharT>
void f(CharT const*) {
std::cout << "second\n";
}
int main() {
auto t = std::vector<int>{1,2,3};
uint8_t tab[] = {'t', 'r', 'u', 'e'};
char *test;
static_assert(!std::is_pointer<decltype(tab)>::value, "");
f(tab);
f(t);
f(test);
return 0; Here the first overload binds to everything and should always be taken, but MSVC considers the second to be a best match. Here is a test with GCC. I don't know if there is a workaround, but if it's a confirmed bug, we should report it to Microsoft. |
I believe that MSVC is correct here. The second template is more specialized that the first, so it's selected. http://en.cppreference.com/w/cpp/language/function_template
|
I truly believe this is a MSVC bug: Here we pass a I cannot find the exact quote on cppreference about |
That's an interesting question. Template overload resolution is indeed a bit of black magic. From the same page as above: P is the function parameter type (ContiguousContainer&&, CharT const *) Before deduction begins, the following adjustments to P and A are made:
For the CharT const * overload, #1 applies, and for the ContiguousContainer && overload, #4 applies. ContiguousContainer is then deduced as uint8_t[4]& The deduced template functions are then (after reference collapsing) The latter still looks to be more specialized, as it is const qualified. However, there are also rules about what happens when considering the functions in a different order would give different results. Maybe that's involved here, or perhaps the lack of two-phase lookup in templates in VC. |
As a matter of fact, I committed a version where the two functions in question are in a different order: no change... |
Indeed it is tricky, I think we need a language lawyer on this one, we can post an example code on StackOverflow? For now, we can put back the C array overload, to see if it helps? |
@theodelrieu It would be nice to have more opinions on this, so I'm fine with any code on Stack Overflow. Meanwhile, I tried the following: I changed the template<typename CharPT, typename std::enable_if<
std::is_pointer<CharPT>::value and
std::is_integral<typename std::remove_pointer<CharPT>::type>::value and
sizeof(std::remove_pointer<CharPT>) == 1, int>::type = 0>
static basic_json parse(const CharPT s,
const parser_callback_t cb = nullptr) Calling this function with
|
Adding the C array overload (and removing the experiment from my previous comment) template<class T, std::size_t N>
static basic_json parse(T (&array)[N],
const parser_callback_t cb = nullptr) makes it worse:
|
I made another experiment, but now run in another strange MSVC compiler error (https://ci.appveyor.com/project/nlohmann/json/build/1168):
After googling a bit, this may be a known MSVC issue... |
I am working on a workaround right now, I think we can help MSVC a little |
My workaround compiles. The following code is a POC, I hope we can tweak it to make it cleaner. // Trick struct to help MSVC choose the good overload
template <bool B>
struct IsPointer{};
// Impl method for contiguous containers
template<typename ContiguousContainer,
typename std::enable_if<
std::is_base_of<
std::random_access_iterator_tag,
typename std::iterator_traits<decltype(std::begin(std::declval<ContiguousContainer>()))>::iterator_category>::value
, int>::type = 0>
static basic_json parse_impl(const ContiguousContainer& c, IsPointer<false>, const parser_callback_t cb)
{
return parse(std::begin(c), std::end(c), cb);
}
// IsPointer<true> prevents the implicit array-to-pointer conversion
template<typename CharT>
static basic_json parse_impl(const CharT* c, IsPointer<true>, const parser_callback_t cb)
{
return parser(reinterpret_cast<const char*>(c), cb).parse();
}
// We have to have a parse_impl for streams too. If we keep the parse(istream&)
// and parse(istream&&) overloads, calling parse with a std::stringstream
// will chose the parse(T&&) one
template <typename Stream, typename = typename std::enable_if<std::is_base_of<std::istream,
typename std::remove_reference<Stream>::type>::value, char>::type>
static basic_json parse_impl(Stream& s, IsPointer<false>, const parser_callback_t cb)
{
return parser(s, cb).parse();
}
// Only 'parse' overload with two arguments, (we keep the iterator-range one)
template<class T>
static basic_json parse(T&& c, const parser_callback_t cb = nullptr)
{
return parse_impl(std::forward<T>(c),
IsPointer<std::is_pointer<typename std::remove_reference<T>::type>::value>{},
cb);
} The IsPointer trick is ugly, especially since we have to give it to every parse_impl overload. |
Do I understand correctly that you need the |
In EDIT: It also seems MSVC performs the array-to-pointer conversion before resolving the enable_if. |
Just a question: Why do you need to wrap the result of |
You mean something like : The template <bool IsPointer, typename ContiguousContainer>
basic_json parse_impl(ContiguousContainer&&); But you cannot partially specialize methods, so I need to wrap the bool in an empty struct. |
I found a better way to fix the issue, your experiment failed to compile because of the last template line: template<typename CharPT, typename std::enable_if<
std::is_pointer<CharPT>::value and
std::is_integral<typename std::remove_pointer<CharPT>::type>::value and
sizeof(std::remove_pointer<CharPT>) == 1, int>::type = 0>
static basic_json parse(const CharPT s,
const parser_callback_t cb = nullptr)
{
return parser(reinterpret_cast<const char*>(s), cb).parse();
} Changing to |
@theodelrieu I comitted your proposed change (6e6e1c9) and all tests succeed. |
I rebuilt using ninja, and launched ninja test. Well, I have no remarks left, how about you? |
I think the branch can be merged, unless you have any further remarks. |
I'm fine for merging it! |
Commit 776880b merged the feature branch. This ticket can be closed and will be part of the 2.0.3 release. Thanks everybody for the constructive discussions! |
Hi, it would be handy to add an overload to the
parse
method.In my code, I deal a lot with
vector<uint8_t>
when communicating with my server.However I sometimes have to send JSON, and I must convert the buffer to a
std::string
, which is unfortunate.I know I could typedef basic_json and use
std::basic_string<uint8_t>
asStringType
.However, other parts of the code uses the
std::string
version, and I don't want to patch the whole code because of that.We could add this overload to
parse
, which will behave likestd::string
iterator constructor:What are your thoughts on that ?
If you agree with the idea, I can take care of the PR.
The text was updated successfully, but these errors were encountered: