-
Notifications
You must be signed in to change notification settings - Fork 77
Recover from custom tags in YAML #2266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* Avoid eagerly buffering the tag uri when we can scan the uri and check later. * Throw if a local tag prefix has a secondary/named handle.
* Add support for local tag prefixes.
* Prefer scanning a verbatim tag in its own scope. This ensures we never mix/taint its heuristics with that of a normal tag. * Always return whether a handle is named tag handle in `_scanTagHandle`. Named tags. * Ensure named tags never have empty suffixes. * Indicate whether a `TagToken` is verbatim.
* Require non-empty suffix for secondary tags. *
* A global tag uri prefix must with `tag:` if declared in verbatim. * Add more tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves the YAML parser's robustness by adding support for custom tags, allowing it to recover and parse them as generic nodes instead of throwing errors. This aligns the package more closely with the YAML spec's recommendations for parsers that don't support external tag resolution. The changes are extensive and well-implemented, including refactoring parsing logic to use modern Dart features like pattern matching and records, which enhances readability. The addition of previously skipped YAML spec tests is a major plus, increasing confidence in the parser's correctness. I've identified one potential bug regarding tag directive parsing that might not strictly adhere to the spec, and a minor leftover TODO comment. Overall, this is a valuable enhancement to the yaml package.
pkgs/yaml/lib/src/scanner.dart
Outdated
| var prefix = _scanner.peekChar() == EXCLAMATION | ||
| ? _scanTagHandle(directive: true, isGlobalTagPrefix: true).tagHandle | ||
| : ''; | ||
|
|
||
| prefix += _scanTagUri(); // Readability's sake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation for parsing a tag directive's prefix seems to deviate from the YAML 1.2.2 spec. The spec for s-tag-prefix is c-tag-handle | ns-uri-char+, meaning it's either a tag handle or a URI prefix, but not both. The current code allows a tag handle to be followed by a URI, which could lead to incorrect parsing of malformed directives.
For example, a directive like %TAG !m! !my-tag:foo would be incorrectly parsed.
I suggest changing the logic to strictly follow the "either-or" nature of the spec.
String prefix;
if (_scanner.peekChar() == EXCLAMATION) {
prefix = _scanTagHandle(directive: true, isGlobalTagPrefix: true).tagHandle;
} else {
prefix = _scanTagUri();
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't throw but I get what you mean.
That was before I made _scanTagHandle return isNamed or throw for named handles when directive is true. I was throwing in the current scope but quickly realised that %TAG !m! !my-tag:foo%21 can be a false positive for the prefix.
By default, !my-tag:foo in %TAG !m! !my-tag:foo will be scanned till a non-uri-char is seen.
As long as isGlobalTagPrefix is true, we can never get anything other than a non-uri char back from _scanTagHandle. Calling _scanTagUri would just return an empty string. All in all...
Great catch but I think you mean:
final prefix = _scanner.peekChar() == EXCLAMATION
? _scanTagHandle(directive: true, isGlobalTagPrefix: true).tagHandle
: _scanTagUri();febb0d8 to
f055c28
Compare
Package publishing
Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation. |
PR HealthLicense Headers ✔️
All source files should start with a license header. Unrelated files missing license headers
This check can be disabled by tagging the PR with
API leaks
|
| Package | Leaked API symbol | Leaking sources |
|---|---|---|
| yaml | ErrorListener | yaml/yaml.dart::loadYaml::errorListener yaml/yaml.dart::loadYamlNode::errorListener yaml/yaml.dart::loadYamlDocument::errorListener |
| yaml | ScalarEvent | yaml_node.dart::YamlScalar::internal::scalar |
| yaml | EventType | event.dart::Event::type event.dart::EventType::streamStart event.dart::EventType::streamEnd event.dart::EventType::documentStart event.dart::EventType::documentEnd event.dart::EventType::alias event.dart::EventType::scalar event.dart::EventType::sequenceStart event.dart::EventType::sequenceEnd event.dart::EventType::mappingStart event.dart::EventType::mappingEnd event.dart::EventType::values event.dart::Event::new::type event.dart::ScalarEvent::type |
This check can be disabled by tagging the PR with skip-leaking-check.
Breaking changes ✔️
| Package | Change | Current Version | New Version | Needed Version | Looking good? |
|---|---|---|---|---|---|
| yaml | None | 3.1.3 | 3.1.3 | 3.1.3 | ✔️ |
This check can be disabled by tagging the PR with skip-breaking-check.
Coverage ⚠️
| File | Coverage |
|---|---|
| pkgs/yaml/lib/src/loader.dart | 💔 93 % ⬇️ 0 % |
| pkgs/yaml/lib/src/parser.dart | 💚 94 % ⬆️ 0 % |
| pkgs/yaml/lib/src/scanner.dart | 💔 94 % ⬇️ 1 % |
| pkgs/yaml/lib/src/token.dart | 💚 48 % |
| pkgs/yaml/lib/src/utils.dart | 💚 100 % |
This check for test coverage is informational (issues shown here will not fail the PR).
This check can be disabled by tagging the PR with skip-coverage-check.
Changelog Entry ❗
| Package | Changed Files |
|---|---|
| package:yaml | pkgs/yaml/lib/src/loader.dart pkgs/yaml/lib/src/parser.dart pkgs/yaml/lib/src/scanner.dart pkgs/yaml/lib/src/token.dart pkgs/yaml/lib/src/utils.dart |
Changes to files need to be accounted for in their respective changelogs.
This check can be disabled by tagging the PR with skip-changelog-check.
|
Thanks for contributing. We might want to reconsider if we really want to support tags in yaml. It is a lot of complexity. Is there a good use case? Is there demand for it? I'd rather not ever support instantiation of custom dart objects during parsing. Does tag support make sense without that? Maybe we'd rather roll back the existing half-baked support that exists in the code base. |
|
cc @jonasfj |
I am not suggesting a use case with this contribution.
Custom tags should just be ignored if a parser has no intention to support them (which is what |
This PR helps
package:yamlrecover in the face of custom tags. The YAML spec recommends this for a parser that doesn't not provide a way to externally resolve them. See more here:package:yamlis a staple of the community. No change here breaks existing functionality. Instead, this gives the package a thick skin to power through any yaml string it encounters.TL;DR
Fixes how tags are parsed.
It seems existing code knows it can handle named tags and verbatim tags but trips itself while scanning the tag and eventually tries to trick the parser into resolving a malformed (schema or verbatim) tag. A (custom) tag is never ignored unless it is a schema tags (
!!resolving to prefixtag:yaml.org,2002:).Changes
Any non-schema tag (not
!!resolving to prefixtag:yaml.org,2002:) is represented partially or completely based on its kind.Improves support for named tag handles in both tag shorthands and global tags.
TODOin some directive tests that needed an assert.Contribution guidelines:
dart format.Many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.
Note: The Dart team is trialing Gemini Code Assist. Don't take its comments as final Dart team feedback. Use the suggestions if they're helpful; otherwise, wait for a human reviewer.