Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newparser: post-merge fixes #431

Merged
merged 7 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 27 additions & 4 deletions changelog/current.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,36 @@
**All the changes described come from a single PR: [#PR414](https://github.com/biojppm/rapidyaml/pull/414).**
Most of the changes are from the giant Parser refactor described below. Before getting to that, a couple of other minor changes first.


### Fixes

- [#PR431](https://github.com/biojppm/rapidyaml/pull/431) - Emitter: prevent stack overflows when emitting malicious trees by providing a max tree depth for the emit visitor. This was done by adding an `EmitOptions` structure as an argument both to the emitter and to the emit functions, which is then forwarded to the emitter. This `EmitOptions` structure has a max tree depth setting with a default value of 64.
- [#PR431](https://github.com/biojppm/rapidyaml/pull/431) - Fix `_RYML_CB_ALLOC()` using `(T)` in parenthesis, making the macro unusable.


### New features

- [#PR431](https://github.com/biojppm/rapidyaml/pull/431) - append-emitting to existing containers in the `emitrs_` functions, suggested in [#345](https://github.com/biojppm/rapidyaml/issues/345). This was achieved by adding a `bool append=false` as the last parameter of these functions.
- [#PR431](https://github.com/biojppm/rapidyaml/pull/431) - add depth query methods:
```cpp
Tree::depth_asc(id_type) const; // O(log(num_tree_nodes)) get the depth of a node ascending (ie, from root to node)
Tree::depth_desc(id_type) const; // O(num_tree_nodes) get the depth of a node descending (ie, from node to deep-most leaf node)
ConstNodeRef::depth_asc() const; // likewise
ConstNodeRef::depth_desc() const;
NodeRef::depth_asc() const;
NodeRef::depth_desc() const;
```


------
All other changes come from [#PR414](https://github.com/biojppm/rapidyaml/pull/414).

### Parser refactor

The parser was completely refactored ([#PR414](https://github.com/biojppm/rapidyaml/pull/414)). This was a large and hard job carried out over several months, and the result is:
The parser was completely refactored ([#PR414](https://github.com/biojppm/rapidyaml/pull/414)). This was a large and hard job carried out over several months, but it brings important improvements.

- A new event-based parser engine is now in place, enabling the improvements described below. This engine uses a templated event handler, where each event is a function call, which spares branches on the event handler. The parsing code was fully rewritten, and is now much more simple (albeit longer), and much easier to work with and fix.
- The new parser is an event-based parser, based on an event dispatcher engine. This engine is templated on event handler, where each event is a function call, which spares branches on the event handler. The parsing code was fully rewritten, and is now much more simple (albeit longer), and much easier to work with and fix.
- YAML standard-conformance was improved significantly. Along with many smaller fixes and additions, (too many to list here), the main changes are the following:
- The parser engine can now successfully parse container keys, emitting all the events in the correct , **but** as before, the ryml tree cannot accomodate these (and this constraint is no longer enforced by the parser, but instead by `EventHandlerTree`). For an example of a handler which can accomodate key containers, see the one which is used for the test suite at `test/test_suite/test_suite_event_handler.hpp`
- The parser engine can now successfully parse container keys, emitting all the events in correctly, **but** as before, the ryml tree cannot accomodate these (and this constraint is no longer enforced by the parser, but instead by `EventHandlerTree`). For an example of a handler which can accomodate key containers, see the one which is used for the test suite at `test/test_suite/test_suite_event_handler.hpp`
- Anchor keys can now be terminated with colon (eg, `&anchor: key: val`), as dictated by the standard.
- The parser engine can now be used to create native trees in other programming languages, or in cases where the user *must* have container keys.
- Performance of both parsing and emitting improved significantly; see some figures below.
Expand Down
2 changes: 1 addition & 1 deletion src/c4/yml/common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,7 @@ do \
} \
} while(0)
#define _RYML_CB_ALLOC_HINT(cb, T, num, hint) (T*) (cb).m_allocate((num) * sizeof(T), (hint), (cb).m_user_data)
#define _RYML_CB_ALLOC(cb, T, num) _RYML_CB_ALLOC_HINT((cb), (T), (num), nullptr)
#define _RYML_CB_ALLOC(cb, T, num) _RYML_CB_ALLOC_HINT((cb), T, (num), nullptr)
#define _RYML_CB_FREE(cb, buf, T, num) \
do { \
(cb).m_free((buf), (num) * sizeof(T), (cb).m_user_data); \
Expand Down
61 changes: 29 additions & 32 deletions src/c4/yml/emit.def.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,28 +29,13 @@
if(type == EMIT_YAML)
_emit_yaml(id);
else if(type == EMIT_JSON)
_do_visit_json(id);
_do_visit_json(id, 0);
else
_RYML_CB_ERR(m_tree->callbacks(), "unknown emit type");
m_tree = nullptr;
return this->Writer::_get(error_on_excess);
}

template<class Writer>
substr Emitter<Writer>::emit_as(EmitType_e type, Tree const& t, bool error_on_excess)
{
if(t.empty())
return {};
return this->emit_as(type, t, t.root_id(), error_on_excess);
}

template<class Writer>
substr Emitter<Writer>::emit_as(EmitType_e type, ConstNodeRef const& n, bool error_on_excess)
{
_RYML_CB_CHECK(n.tree()->callbacks(), n.readable());
return this->emit_as(type, *n.tree(), n.id(), error_on_excess);
}


//-----------------------------------------------------------------------------

Expand Down Expand Up @@ -82,7 +67,7 @@
this->Writer::_do_write(":\n");
++ilevel;
}
_do_visit_block_container(id, ilevel, ilevel);
_do_visit_block_container(id, 0, ilevel, ilevel);
return;
}
}
Expand Down Expand Up @@ -256,13 +241,15 @@
}

template<class Writer>
void Emitter<Writer>::_do_visit_flow_sl(id_type node, id_type ilevel)
void Emitter<Writer>::_do_visit_flow_sl(id_type node, id_type depth, id_type ilevel)
{
const bool prev_flow = m_flow;
m_flow = true;
RYML_ASSERT(!m_tree->is_stream(node));
RYML_ASSERT(m_tree->is_container(node) || m_tree->is_doc(node));
RYML_ASSERT(m_tree->is_root(node) || (m_tree->parent_is_map(node) || m_tree->parent_is_seq(node)));
if(C4_UNLIKELY(depth > m_opts.max_depth()))
_RYML_CB_ERR(m_tree->callbacks(), "max depth exceeded");

if(m_tree->is_doc(node))
{
Expand Down Expand Up @@ -345,7 +332,7 @@
else
{
// with single-line flow, we can never go back to block
_do_visit_flow_sl(child, ilevel + 1);
_do_visit_flow_sl(child, depth + 1, ilevel + 1);
}
}

Expand All @@ -363,19 +350,25 @@
C4_SUPPRESS_WARNING_MSVC_WITH_PUSH(4702) // unreachable error, triggered by flow_ml not implemented

template<class Writer>
void Emitter<Writer>::_do_visit_flow_ml(id_type id, id_type ilevel, id_type do_indent)
void Emitter<Writer>::_do_visit_flow_ml(id_type id, id_type depth, id_type ilevel, id_type do_indent)

Check warning on line 353 in src/c4/yml/emit.def.hpp

View check run for this annotation

Codecov / codecov/patch

src/c4/yml/emit.def.hpp#L353

Added line #L353 was not covered by tests
{
C4_UNUSED(id);
C4_UNUSED(depth);
C4_UNUSED(ilevel);
C4_UNUSED(do_indent);
c4::yml::error("not implemented");

Check warning on line 359 in src/c4/yml/emit.def.hpp

View check run for this annotation

Codecov / codecov/patch

src/c4/yml/emit.def.hpp#L359

Added line #L359 was not covered by tests
#ifdef THIS_IS_A_WORK_IN_PROGRESS
if(C4_UNLIKELY(depth > m_opts.max_depth()))
_RYML_CB_ERR(m_tree->callbacks(), "max depth exceeded");
const bool prev_flow = m_flow;
m_flow = true;
c4::yml::error("not implemented");
// do it...
m_flow = prev_flow;
#endif
}

template<class Writer>
void Emitter<Writer>::_do_visit_block_container(id_type node, id_type level, bool do_indent)
void Emitter<Writer>::_do_visit_block_container(id_type node, id_type depth, id_type level, bool do_indent)
{
if(m_tree->is_seq(node))
{
Expand All @@ -397,19 +390,19 @@
{
_indent(level, do_indent);
this->Writer::_do_write("- ");
_do_visit_flow_sl(child, 0u);
_do_visit_flow_sl(child, depth+1, 0u);
this->Writer::_do_write('\n');
}
else if(ty.is_flow_ml())
{
_indent(level, do_indent);
this->Writer::_do_write("- ");
_do_visit_flow_ml(child, 0u, do_indent);
_do_visit_flow_ml(child, depth+1, 0u, do_indent);

Check warning on line 400 in src/c4/yml/emit.def.hpp

View check run for this annotation

Codecov / codecov/patch

src/c4/yml/emit.def.hpp#L400

Added line #L400 was not covered by tests
this->Writer::_do_write('\n');
}
else
{
_do_visit_block(child, level, do_indent); // same level
_do_visit_block(child, depth+1, level, do_indent); // same indentation level
}
}
do_indent = true;
Expand All @@ -436,18 +429,18 @@
if(ty.is_flow_sl())
{
_indent(level, do_indent);
_do_visit_flow_sl(ich, 0u);
_do_visit_flow_sl(ich, depth+1, 0u);
this->Writer::_do_write('\n');
}
else if(ty.is_flow_ml())
{
_indent(level, do_indent);
_do_visit_flow_ml(ich, 0u);
_do_visit_flow_ml(ich, depth+1, 0u);

Check warning on line 438 in src/c4/yml/emit.def.hpp

View check run for this annotation

Codecov / codecov/patch

src/c4/yml/emit.def.hpp#L438

Added line #L438 was not covered by tests
this->Writer::_do_write('\n');
}
else
{
_do_visit_block(ich, level, do_indent); // same level!
_do_visit_block(ich, depth+1, level, do_indent); // same level!
}
} // keyval vs container
do_indent = true;
Expand All @@ -456,11 +449,13 @@
}

template<class Writer>
void Emitter<Writer>::_do_visit_block(id_type node, id_type ilevel, id_type do_indent)
void Emitter<Writer>::_do_visit_block(id_type node, id_type depth, id_type ilevel, id_type do_indent)
{
RYML_ASSERT(!m_tree->is_stream(node));
RYML_ASSERT(m_tree->is_container(node) || m_tree->is_doc(node));
RYML_ASSERT(m_tree->is_root(node) || (m_tree->parent_is_map(node) || m_tree->parent_is_seq(node)));
if(C4_UNLIKELY(depth > m_opts.max_depth()))
_RYML_CB_ERR(m_tree->callbacks(), "max depth exceeded");
if(m_tree->is_doc(node))
{
_write_doc(node);
Expand Down Expand Up @@ -537,16 +532,18 @@
if(m_tree->is_root(node) || m_tree->is_doc(node))
next_level = ilevel; // do not indent at top level

_do_visit_block_container(node, next_level, do_indent);
_do_visit_block_container(node, depth, next_level, do_indent);
}

C4_SUPPRESS_WARNING_MSVC_POP


template<class Writer>
void Emitter<Writer>::_do_visit_json(id_type id)
void Emitter<Writer>::_do_visit_json(id_type id, id_type depth)
{
_RYML_CB_CHECK(m_tree->callbacks(), !m_tree->is_stream(id)); // JSON does not have streams
if(C4_UNLIKELY(depth > m_opts.max_depth()))
_RYML_CB_ERR(m_tree->callbacks(), "max depth exceeded");
if(m_tree->is_keyval(id))
{
_writek_json(id);
Expand Down Expand Up @@ -574,7 +571,7 @@
{
if(ich != m_tree->first_child(id))
this->Writer::_do_write(',');
_do_visit_json(ich);
_do_visit_json(ich, depth+1);
}

if(m_tree->is_seq(id))
Expand Down
Loading
Loading