Skip to content

Conversation

@kekavc24
Copy link
Contributor

@kekavc24 kekavc24 commented Dec 3, 2025

This PR helps package:yaml recover in the face of custom tags. The YAML spec recommends this for a parser that doesn't not provide a way to externally resolve them. See more here:

package:yaml is a staple of the community. No change here breaks existing functionality. Instead, this gives the package a thick skin to power through any yaml string it encounters.

TL;DR

Fixes how tags are parsed.

It seems existing code knows it can handle named tags and verbatim tags but trips itself while scanning the tag and eventually tries to trick the parser into resolving a malformed (schema or verbatim) tag. A (custom) tag is never ignored unless it is a schema tags (!! resolving to prefix tag:yaml.org,2002:).

Changes

  1. Any non-schema tag (not !! resolving to prefix tag:yaml.org,2002:) is represented partially or completely based on its kind.

    • Scalars are defaulted to strings.
    • Collections (map or sequence) are constructed by default.
  2. Improves support for named tag handles in both tag shorthands and global tags.

%TAG !named! tag:hello
---
- !named!scalar 3.10
- !named!mapping { key: value }
- !named!sequence [ value ]
// Parsed as 
[ '3.10', {'key': 'value'}, ['value']]
  1. Improves support for verbatim tags.
- !<!i-feel-verbatim> 3.10
- !<tag:yeah-you-are> { key: value }
- !<tag:so-verbatim> [ ]
// Parsed as 
[ '3.10', {'key': 'value'}, []]
  1. Added the remaining YAML spec tests (and a few more) for global tag directives, named tag handles and verbatim tags that were explicitly skipped.
  2. Fixed a TODO in some directive tests that needed an assert.

  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Contribution guidelines:

Many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

Note: The Dart team is trialing Gemini Code Assist. Don't take its comments as final Dart team feedback. Use the suggestions if they're helpful; otherwise, wait for a human reviewer.

* Avoid eagerly buffering the tag uri when we can scan the uri and check later.
* Throw if a local tag prefix has a secondary/named handle.
* Add support for local tag prefixes.
* Prefer scanning a verbatim tag in its own scope. This ensures we never mix/taint its heuristics with that of a normal tag.
* Always return whether a handle is named tag handle in `_scanTagHandle`. Named tags.
* Ensure named tags never have empty suffixes.
* Indicate whether a `TagToken` is verbatim.
* Require non-empty suffix for secondary tags.
*
* A global tag uri prefix must with `tag:` if declared in verbatim.
* Add more tests.
@kekavc24 kekavc24 requested a review from a team as a code owner December 3, 2025 03:43
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the YAML parser's robustness by adding support for custom tags, allowing it to recover and parse them as generic nodes instead of throwing errors. This aligns the package more closely with the YAML spec's recommendations for parsers that don't support external tag resolution. The changes are extensive and well-implemented, including refactoring parsing logic to use modern Dart features like pattern matching and records, which enhances readability. The addition of previously skipped YAML spec tests is a major plus, increasing confidence in the parser's correctness. I've identified one potential bug regarding tag directive parsing that might not strictly adhere to the spec, and a minor leftover TODO comment. Overall, this is a valuable enhancement to the yaml package.

Comment on lines 940 to 944
var prefix = _scanner.peekChar() == EXCLAMATION
? _scanTagHandle(directive: true, isGlobalTagPrefix: true).tagHandle
: '';

prefix += _scanTagUri(); // Readability's sake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The implementation for parsing a tag directive's prefix seems to deviate from the YAML 1.2.2 spec. The spec for s-tag-prefix is c-tag-handle | ns-uri-char+, meaning it's either a tag handle or a URI prefix, but not both. The current code allows a tag handle to be followed by a URI, which could lead to incorrect parsing of malformed directives.

For example, a directive like %TAG !m! !my-tag:foo would be incorrectly parsed.

I suggest changing the logic to strictly follow the "either-or" nature of the spec.

    String prefix;
    if (_scanner.peekChar() == EXCLAMATION) {
      prefix = _scanTagHandle(directive: true, isGlobalTagPrefix: true).tagHandle;
    } else {
      prefix = _scanTagUri();
    }

Copy link
Contributor Author

@kekavc24 kekavc24 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't throw but I get what you mean.

That was before I made _scanTagHandle return isNamed or throw for named handles when directive is true. I was throwing in the current scope but quickly realised that %TAG !m! !my-tag:foo%21 can be a false positive for the prefix.

By default, !my-tag:foo in %TAG !m! !my-tag:foo will be scanned till a non-uri-char is seen.

As long as isGlobalTagPrefix is true, we can never get anything other than a non-uri char back from _scanTagHandle. Calling _scanTagUri would just return an empty string. All in all...

Great catch but I think you mean:

    final prefix = _scanner.peekChar() == EXCLAMATION
        ? _scanTagHandle(directive: true, isGlobalTagPrefix: true).tagHandle
        : _scanTagUri();

@kekavc24 kekavc24 force-pushed the partial-custom-tag-support branch from febb0d8 to f055c28 Compare December 3, 2025 12:32
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

Package publishing

Package Version Status Publish tag (post-merge)
package:bazel_worker 1.1.5 already published at pub.dev
package:benchmark_harness 2.4.0 already published at pub.dev
package:boolean_selector 2.1.2 already published at pub.dev
package:browser_launcher 1.1.3 already published at pub.dev
package:cli_config 0.2.1-wip WIP (no publish necessary)
package:cli_util 0.5.0-wip WIP (no publish necessary)
package:clock 1.1.3-wip WIP (no publish necessary)
package:code_builder 4.11.1-wip WIP (no publish necessary)
package:coverage 1.15.0 already published at pub.dev
package:csslib 1.0.2 already published at pub.dev
package:extension_discovery 2.1.0 already published at pub.dev
package:file 7.0.2-wip WIP (no publish necessary)
package:file_testing 3.1.0-wip WIP (no publish necessary)
package:glob 2.1.3 already published at pub.dev
package:graphs 2.3.3-wip WIP (no publish necessary)
package:html 0.15.7-wip WIP (no publish necessary)
package:io 1.1.0-wip WIP (no publish necessary)
package:json_rpc_2 4.1.0-wip WIP (no publish necessary)
package:markdown 7.3.1-wip WIP (no publish necessary)
package:mime 2.1.0-wip WIP (no publish necessary)
package:oauth2 2.0.5 already published at pub.dev
package:package_config 2.3.0-wip WIP (no publish necessary)
package:pool 1.5.2 already published at pub.dev
package:process 5.0.5 already published at pub.dev
package:pub_semver 2.2.0 already published at pub.dev
package:pubspec_parse 1.5.1-wip WIP (no publish necessary)
package:source_map_stack_trace 2.1.3-wip WIP (no publish necessary)
package:source_maps 0.10.14-wip WIP (no publish necessary)
package:source_span 1.10.1 already published at pub.dev
package:sse 4.1.8 already published at pub.dev
package:stack_trace 1.12.2-wip (error) pubspec version (1.12.2-wip) and changelog (1.12.2-dev) don't agree
package:stream_channel 2.1.4 already published at pub.dev
package:stream_transform 2.1.2-wip WIP (no publish necessary)
package:string_scanner 1.4.1 already published at pub.dev
package:term_glyph 1.2.3-wip WIP (no publish necessary)
package:test_reflective_loader 0.5.0 ready to publish test_reflective_loader-v0.5.0
package:timing 1.0.2 already published at pub.dev
package:unified_analytics 8.0.9 ready to publish unified_analytics-v8.0.9
package:watcher 1.1.5-wip WIP (no publish necessary)
package:yaml 3.1.3 already published at pub.dev
package:yaml_edit 2.2.3 already published at pub.dev

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

PR Health

License Headers ✔️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
no missing headers

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/bazel_worker/benchmark/benchmark.dart
pkgs/benchmark_harness/integration_test/perf_benchmark_test.dart
pkgs/boolean_selector/example/example.dart
pkgs/clock/lib/clock.dart
pkgs/clock/lib/src/clock.dart
pkgs/clock/lib/src/default.dart
pkgs/clock/lib/src/stopwatch.dart
pkgs/clock/lib/src/utils.dart
pkgs/clock/test/clock_test.dart
pkgs/clock/test/default_test.dart
pkgs/clock/test/stopwatch_test.dart
pkgs/clock/test/utils.dart
pkgs/coverage/lib/src/coverage_options.dart
pkgs/html/example/main.dart
pkgs/html/lib/dom.dart
pkgs/html/lib/dom_parsing.dart
pkgs/html/lib/html_escape.dart
pkgs/html/lib/parser.dart
pkgs/html/lib/src/constants.dart
pkgs/html/lib/src/encoding_parser.dart
pkgs/html/lib/src/html_input_stream.dart
pkgs/html/lib/src/list_proxy.dart
pkgs/html/lib/src/query_selector.dart
pkgs/html/lib/src/token.dart
pkgs/html/lib/src/tokenizer.dart
pkgs/html/lib/src/treebuilder.dart
pkgs/html/lib/src/utils.dart
pkgs/html/test/dom_test.dart
pkgs/html/test/parser_feature_test.dart
pkgs/html/test/parser_test.dart
pkgs/html/test/query_selector_test.dart
pkgs/html/test/selectors/level1_baseline_test.dart
pkgs/html/test/selectors/level1_lib.dart
pkgs/html/test/selectors/selectors.dart
pkgs/html/test/support.dart
pkgs/html/test/tokenizer_test.dart
pkgs/html/test/trie_test.dart
pkgs/html/tool/generate_trie.dart
pkgs/pubspec_parse/test/git_uri_test.dart
pkgs/stack_trace/example/example.dart
pkgs/watcher/test/custom_watcher_factory_test.dart
pkgs/yaml_edit/example/example.dart

This check can be disabled by tagging the PR with skip-license-check.

API leaks ⚠️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbol Leaking sources
yaml ErrorListener yaml/yaml.dart::loadYaml::errorListener
yaml/yaml.dart::loadYamlNode::errorListener
yaml/yaml.dart::loadYamlDocument::errorListener
yaml ScalarEvent yaml_node.dart::YamlScalar::internal::scalar
yaml EventType event.dart::Event::type
event.dart::EventType::streamStart
event.dart::EventType::streamEnd
event.dart::EventType::documentStart
event.dart::EventType::documentEnd
event.dart::EventType::alias
event.dart::EventType::scalar
event.dart::EventType::sequenceStart
event.dart::EventType::sequenceEnd
event.dart::EventType::mappingStart
event.dart::EventType::mappingEnd
event.dart::EventType::values
event.dart::Event::new::type
event.dart::ScalarEvent::type

This check can be disabled by tagging the PR with skip-leaking-check.

Breaking changes ✔️
Package Change Current Version New Version Needed Version Looking good?
yaml None 3.1.3 3.1.3 3.1.3 ✔️

This check can be disabled by tagging the PR with skip-breaking-check.

Coverage ⚠️
File Coverage
pkgs/yaml/lib/src/loader.dart 💔 93 % ⬇️ 0 %
pkgs/yaml/lib/src/parser.dart 💚 94 % ⬆️ 0 %
pkgs/yaml/lib/src/scanner.dart 💔 94 % ⬇️ 1 %
pkgs/yaml/lib/src/token.dart 💚 48 %
pkgs/yaml/lib/src/utils.dart 💚 100 %

This check for test coverage is informational (issues shown here will not fail the PR).

This check can be disabled by tagging the PR with skip-coverage-check.

Changelog Entry
Package Changed Files
package:yaml pkgs/yaml/lib/src/loader.dart
pkgs/yaml/lib/src/parser.dart
pkgs/yaml/lib/src/scanner.dart
pkgs/yaml/lib/src/token.dart
pkgs/yaml/lib/src/utils.dart

Changes to files need to be accounted for in their respective changelogs.

This check can be disabled by tagging the PR with skip-changelog-check.

@sigurdm
Copy link
Contributor

sigurdm commented Dec 4, 2025

Thanks for contributing.

We might want to reconsider if we really want to support tags in yaml. It is a lot of complexity.

Is there a good use case? Is there demand for it?

I'd rather not ever support instantiation of custom dart objects during parsing. Does tag support make sense without that?

Maybe we'd rather roll back the existing half-baked support that exists in the code base.

@sigurdm
Copy link
Contributor

sigurdm commented Dec 4, 2025

cc @jonasfj

@kekavc24
Copy link
Contributor Author

kekavc24 commented Dec 4, 2025

Is there a good use case? Is there demand for it?

I am not suggesting a use case with this contribution. package:yaml should just work as the spec suggests and not care about tags if it doesn't want to handle them.


I'd rather not ever support instantiation of custom dart objects during parsing. Does tag support make sense without that?

Custom tags should just be ignored if a parser has no intention to support them (which is what package:yaml is trying to do with existing code). Always ignore and deem them application-specific instead of throwing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants