Skip to content

[PoC] Imprement yaml parser written by pure ruby instead of Psych#9352

Open
hsbt wants to merge 29 commits intomasterfrom
pure-psych
Open

[PoC] Imprement yaml parser written by pure ruby instead of Psych#9352
hsbt wants to merge 29 commits intomasterfrom
pure-psych

Conversation

@hsbt
Copy link
Member

@hsbt hsbt commented Feb 25, 2026

What was the end-user or developer problem that led to this PR?

We would like to allow users to freely specify the version, but the version of the libraries used internally like Psych cannot be specified.

Vendoring is workaround solution, But we can't vendor Psych because that is C extension.

What is your fix for the problem, implemented in this PR?

I tried to rewrite the minimum specification of YAML with Gemini.

Make sure the following tasks are checked

hsbt added 23 commits February 25, 2026 11:10
…ncy support, and improve parsing and indentation
…ON, and harden YAML serializer parsing to improve safety and correctness
…logic, replace DisallowedClass with ArgumentError, prevent stalled parsing by shifting unchanged lines, and drop aliases check in safe_yaml
…ts in YAMLSerializer for improved performance and readability
…d YAML anchors and customized scalar quoting
Copilot AI review requested due to automatic review settings February 25, 2026 09:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces RubyGems' dependency on Psych (a C extension YAML library) with a pure Ruby YAML parser implementation to allow users to freely specify library versions without C extension constraints. The implementation provides a custom Gem::YAMLSerializer module that handles YAML serialization and deserialization for Gem specifications and related data structures.

Changes:

  • Implemented a new pure Ruby YAML parser in lib/rubygems/yaml_serializer.rb with custom dump and load methods
  • Removed dependency on Psych library and deleted lib/rubygems/psych_tree.rb
  • Updated all YAML serialization/deserialization call sites to use the new YAMLSerializer API

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
lib/rubygems/yaml_serializer.rb Complete rewrite from simple stub to full YAML parser with support for Gem objects, arrays, hashes, strings, and Ruby object tags
lib/rubygems.rb Changed to load yaml_serializer instead of psych and psych_tree
lib/rubygems/safe_yaml.rb Updated to use YAMLSerializer.load instead of Psych.safe_load
lib/rubygems/specification.rb Simplified to_yaml method to use YAMLSerializer.dump directly
lib/rubygems/package.rb Changed to use YAMLSerializer.dump for checksums
lib/rubygems/package/old.rb Changed exception handling from Psych::SyntaxError to StandardError
lib/rubygems/commands/specification_command.rb Updated to use YAMLSerializer.dump for spec output
lib/rubygems/config_file.rb Added defensive check for respond_to?(:empty?)
lib/rubygems/ext/cargo_builder.rb Correctly switched from YAML to JSON for parsing cargo metadata
test/rubygems/test_gem_safe_yaml.rb Added pend statements for Psych-specific tests (logic appears inverted)
test/rubygems/test_gem_package.rb Updated to use YAMLSerializer.dump in tests
test/rubygems/test_gem_commands_owner_command.rb Changed expected exception from Psych::DisallowedClass to ArgumentError
test/rubygems/helper.rb Updated load_yaml helper to use YAMLSerializer.load
lib/rubygems/psych_tree.rb Deleted (no longer needed)
Manifest.txt Removed psych_tree.rb entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 299 to 312
if obj["value"]
Gem::Platform.new(obj["value"])
else
Gem::Platform.new([obj["cpu"], obj["os"], obj["version"]])
end
elsif ["!ruby/object:Gem::Requirement", "!ruby/object:Gem::Version::Requirement"].include?(obj[:tag])
r = Gem::Requirement.allocate
reqs = convert_any(obj["requirements"] || obj["value"], permitted_symbols)
if reqs.is_a?(Array)
reqs.map! do |item|
if item.is_a?(Array) && item.size == 2
[item[0], item[1].is_a?(Gem::Version) ? item[1] : Gem::Version.new(item[1].to_s)]
else
item
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convert_any function handles Gem::Version, Gem::Platform, Gem::Requirement, and Gem::Dependency by reconstructing them from their serialized form, but it doesn't handle "Gem::Version::Requirement" consistently with other permitted classes. The code checks for both "!ruby/object:Gem::Requirement" and "!ruby/object:Gem::Version::Requirement" (line 299), but Gem::Version::Requirement is listed separately in PERMITTED_CLASSES. Verify that this is intentional and that the handling is correct for both cases.

Copilot uses AI. Check for mistakes.
res = parts.join
res << "\n" unless res.end_with?("\n")
res
elsif quote || obj.empty? || obj =~ /^[!*&:@%]/ || obj =~ /^-?\d+(\.\d+)?$/ || obj =~ /^[<>=-]/ ||
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition obj.empty? on line 79 checks for empty strings to force quoting, but this check comes after checking for newlines. However, the order of checks matters - an empty string won't match the newline check, so this is fine. But consider that the regex /^[!*&:@%]/ and other special character checks might not cover all YAML special cases. For example, strings starting with ? or - followed by space might also need quoting depending on context.

Suggested change
elsif quote || obj.empty? || obj =~ /^[!*&:@%]/ || obj =~ /^-?\d+(\.\d+)?$/ || obj =~ /^[<>=-]/ ||
elsif quote || obj.empty? || obj =~ /^[!*&:@%]/ || obj =~ /^-?\d+(\.\d+)?$/ || obj =~ /^[<>=-]/ ||
obj =~ /^\?/ || obj =~ /^-(\s|$)/ ||

Copilot uses AI. Check for mistakes.
$1.gsub(/''/, "'")
elsif val == "true"
true
elsif val == "false"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline array parsing logic val =~ /^\[(.*)\]$/ with val == "[]" ? [] : [$1.strip] is overly simplistic. It treats any non-empty bracketed content as a single-element array rather than properly parsing comma-separated values. For example, "[a, b, c]" would be parsed as a single-element array containing the string "a, b, c" instead of a three-element array. This doesn't match standard YAML inline array syntax.

Suggested change
elsif val == "false"
inner = $1.strip
return [] if inner.empty?
inner.split(/\s*,\s*/).reject(&:empty?).map { |element| unquote_simple(element) }

Copilot uses AI. Check for mistakes.
Comment on lines 203 to 208
elsif val.empty?
if lines.any? && (lines[0].lstrip.start_with?("- ") || lines[0].lstrip == "-") && lines[0][/^ */].size == indent
res[key] = parse_any(lines, indent, permitted_tags, aliases)
else
res[key] = parse_any(lines, indent + 1, permitted_tags, aliases)
end
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the hash parsing logic, when a value is empty and followed by an array, the code calls parse_any(lines, indent, ...) (line 205), but when followed by other content, it calls parse_any(lines, indent + 1, ...) (line 207). This inconsistency could lead to incorrect indentation handling. The logic should consistently determine the expected indentation level for nested content based on whether it's an array or hash, not based on arbitrary checks.

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 382
# frozen_string_literal: true

module Gem
# A stub yaml serializer that can handle only hashes and strings (as of now).
module YAMLSerializer
module_function

def dump(hash)
yaml = String.new("---")
yaml << dump_hash(hash)
def dump(obj)
"---#{dump_obj(obj, 0)}"
end

def dump_hash(hash)
yaml = String.new("\n")
hash.each do |k, v|
yaml << k << ":"
if v.is_a?(Hash)
yaml << dump_hash(v).gsub(/^(?!$)/, " ") # indent all non-empty lines
elsif v.is_a?(Array) # Expected to be array of strings
if v.empty?
yaml << " []\n"
else
yaml << "\n- " << v.map {|s| s.to_s.gsub(/\s+/, " ").inspect }.join("\n- ") << "\n"
def dump_obj(obj, indent, quote: false)
case obj
when Gem::Specification
parts = [" !ruby/object:Gem::Specification\n"]
parts << "#{" " * indent}name:#{dump_obj(obj.name, indent + 2)}"
parts << "#{" " * indent}version:#{dump_obj(obj.version, indent + 2)}"
parts << "#{" " * indent}platform: #{obj.platform}\n"
if obj.platform.to_s != obj.original_platform.to_s
parts << "#{" " * indent}original_platform: #{obj.original_platform}\n"
end

attributes = Gem::Specification.attribute_names.map(&:to_s).sort - %w[name version platform]
attributes.each do |name|
val = obj.instance_variable_get("@#{name}")
next if val.nil?
parts << "#{" " * indent}#{name}:#{dump_obj(val, indent + 2)}"
end
res = parts.join
res << "\n" unless res.end_with?("\n")
res
when Gem::Version
" !ruby/object:Gem::Version\n#{" " * indent}version: #{dump_obj(obj.version.to_s, indent + 2).lstrip}"
when Gem::Platform
" !ruby/object:Gem::Platform\n#{" " * indent}cpu: #{obj.cpu.inspect}\n#{" " * indent}os: #{obj.os.inspect}\n#{" " * indent}version: #{obj.version.inspect}\n"
when Gem::Requirement
" !ruby/object:Gem::Requirement\n#{" " * indent}requirements:#{dump_obj(obj.requirements, indent + 2)}"
when Gem::Dependency
[
" !ruby/object:Gem::Dependency\n",
"#{" " * indent}name: #{dump_obj(obj.name, indent + 2).lstrip}",
"#{" " * indent}requirement:#{dump_obj(obj.requirement, indent + 2)}",
"#{" " * indent}type: #{dump_obj(obj.type, indent + 2).lstrip}",
"#{" " * indent}prerelease: #{dump_obj(obj.prerelease?, indent + 2).lstrip}",
"#{" " * indent}version_requirements:#{dump_obj(obj.requirement, indent + 2)}",
].join
when Hash
if obj.empty?
" {}\n"
else
parts = ["\n"]
obj.each do |k, v|
is_symbol = k.is_a?(Symbol) || (k.is_a?(String) && k.start_with?(":"))
key_str = k.is_a?(Symbol) ? k.inspect : k.to_s
parts << "#{" " * indent}#{key_str}:#{dump_obj(v, indent + 2, quote: is_symbol)}"
end
parts.join
end
when Array
if obj.empty?
" []\n"
else
yaml << " " << v.to_s.gsub(/\s+/, " ").inspect << "\n"
parts = ["\n"]
obj.each do |v|
parts << "#{" " * indent}-#{dump_obj(v, indent + 2)}"
end
parts.join
end
when Time
" #{obj.utc.strftime("%Y-%m-%d %H:%M:%S.%N Z")}\n"
when String
if obj.include?("\n")
parts = [obj.end_with?("\n") ? " |\n" : " |-\n"]
obj.each_line do |line|
parts << "#{" " * (indent + 2)}#{line}"
end
res = parts.join
res << "\n" unless res.end_with?("\n")
res
elsif quote || obj.empty? || obj =~ /^[!*&:@%]/ || obj =~ /^-?\d+(\.\d+)?$/ || obj =~ /^[<>=-]/ ||
obj == "true" || obj == "false" || obj == "nil" ||
obj.include?(":") || obj.include?("#") || obj.include?("[") || obj.include?("]") ||
obj.include?("{") || obj.include?("}") || obj.include?(",")
" #{obj.to_s.inspect}\n"
else
" #{obj}\n"
end
when Numeric, Symbol, TrueClass, FalseClass, nil
" #{obj.inspect}\n"
else
" #{obj.to_s.inspect}\n"
end
yaml
end

ARRAY_REGEX = /
^
(?:[ ]*-[ ]) # '- ' before array items
(['"]?) # optional opening quote
(.*) # value
\1 # matching closing quote
$
/xo

HASH_REGEX = /
^
([ ]*) # indentations
([^#]+) # key excludes comment char '#'
(?::(?=(?:\s|$))) # : (without the lookahead the #key includes this when : is present in value)
[ ]?
(['"]?) # optional opening quote
(.*) # value
\3 # matching closing quote
$
/xo

def load(str)
res = {}
stack = [res]
last_hash = nil
last_empty_key = nil
str.split(/\r?\n/) do |line|
if match = HASH_REGEX.match(line)
indent, key, quote, val = match.captures
val = strip_comment(val)
def load(str, permitted_classes: [], permitted_symbols: [], aliases: true)
return {} if str.nil? || str.empty?
lines = str.split(/\r?\n/)
if lines[0]&.start_with?("---")
if lines[0].strip == "---"
lines.shift
else
lines[0] = lines[0].sub(/^---\s*/, "")
end
end

permitted_tags = build_permitted_tags(permitted_classes)
data = nil
while lines.any?
before_count = lines.size
parsed = parse_any(lines, -1, permitted_tags, aliases)
if lines.size == before_count && lines.any?
lines.shift
end

depth = indent.size / 2
if quote.empty? && val.empty?
new_hash = {}
stack[depth][key] = new_hash
stack[depth + 1] = new_hash
last_empty_key = key
last_hash = stack[depth]
if data.is_a?(Hash) && parsed.is_a?(Hash)
data.merge!(parsed)
elsif data.nil?
data = parsed
end
end

if data.is_a?(Hash) && (data[:tag] == "!ruby/object:Gem::Specification" || data["tag"] == "!ruby/object:Gem::Specification")
convert_to_spec(data, permitted_symbols)
else
convert_any(data, permitted_symbols)
end
end

def parse_any(lines, base_indent, permitted_tags, aliases)
while lines.any? && (lines[0].strip.empty? || lines[0].lstrip.start_with?("#"))
lines.shift
end
return nil if lines.empty?

indent = lines[0][/^ */].size
return nil if indent < base_indent

line = lines[0]
if !aliases && line.lstrip.start_with?("*", "&")
raise ArgumentError, "YAML aliases are not allowed"
end
if line.lstrip.start_with?("- ") || line.lstrip == "-"
res = []
while lines.any? && lines[0][/^ */].size == indent && (lines[0].lstrip.start_with?("- ") || lines[0].lstrip == "-")
l = lines.shift
content = l.lstrip[1..-1].strip
if content.empty?
res << parse_any(lines, indent, permitted_tags, aliases)
elsif !aliases && content.start_with?("*", "&")
raise ArgumentError, "YAML aliases are not allowed"
elsif content.start_with?("!ruby/object:")
tag = content.strip
unless permitted_tags.include?(tag)
raise ArgumentError, "Disallowed class: #{tag}"
end
nested = parse_any(lines, indent, permitted_tags, aliases)
if nested.is_a?(Hash)
nested[:tag] = tag
res << nested
else
res << { :tag => tag, "value" => nested }
end
elsif content.start_with?("-")
lines.unshift(" " * (indent + 2) + content)
res << parse_any(lines, indent, permitted_tags, aliases)
elsif content =~ /^((?:[^#:]|:[^ ])+):(?:[ ]+(.*))?$/ && !content.start_with?("!ruby/object:")
lines.unshift(" " * (indent + 2) + content)
res << parse_any(lines, indent, permitted_tags, aliases)
elsif content.start_with?("|")
modifier = content[1..-1].to_s.strip
res << parse_block_scalar(lines, indent, modifier)
else
val = [] if val == "[]" # empty array
stack[depth][key] = val
str = unquote_simple(content)
while lines.any? && !lines[0].strip.empty? && lines[0][/^ */].size > indent
str << " " << lines.shift.strip
end
res << str
end
elsif match = ARRAY_REGEX.match(line)
_, val = match.captures
end
res
elsif line.lstrip =~ /^((?:[^#:]|:[^ ])+):(?:[ ]+(.*))?$/ && !line.lstrip.start_with?("!ruby/object:")
res = Hash.new
while lines.any? && lines[0][/^ */].size == indent && lines[0].lstrip =~ /^((?:[^#:]|:[^ ])+):(?:[ ]+(.*))?$/ && !lines[0].lstrip.start_with?("!ruby/object:")
l = lines.shift
l.lstrip =~ /^((?:[^#:]|:[^ ])+):(?:[ ]+(.*))?$/
key = $1.strip
val = $2.to_s.strip
val = strip_comment(val)

last_hash[last_empty_key] = [] unless last_hash[last_empty_key].is_a?(Array)
if val.start_with?("!ruby/object:")
tag = val.strip
unless permitted_tags.include?(tag)
raise ArgumentError, "Disallowed class: #{tag}"
end
nested = parse_any(lines, indent, permitted_tags, aliases)
if nested.is_a?(Hash)
nested[:tag] = tag
res[key] = nested
else
res[key] = { :tag => tag, "value" => nested }
end
elsif !aliases && val.start_with?("*", "&")
raise ArgumentError, "YAML aliases are not allowed"
elsif val.empty?
if lines.any? && (lines[0].lstrip.start_with?("- ") || lines[0].lstrip == "-") && lines[0][/^ */].size == indent
res[key] = parse_any(lines, indent, permitted_tags, aliases)
else
res[key] = parse_any(lines, indent + 1, permitted_tags, aliases)
end
elsif val == "[]"
res[key] = []
elsif val == "{}"
res[key] = {}
elsif val.start_with?("|")
modifier = val[1..-1].to_s.strip
res[key] = parse_block_scalar(lines, indent, modifier)
else
str = unquote_simple(val)
while lines.any? && !lines[0].strip.empty? && lines[0][/^ */].size > indent
str << " " << lines.shift.strip
end
res[key] = str
end
end
res
elsif line.lstrip.start_with?("!ruby/object:")
tag = lines.shift.lstrip.strip
unless permitted_tags.include?(tag)
raise ArgumentError, "Disallowed class: #{tag}"
end
nested = parse_any(lines, indent, permitted_tags, aliases)
if nested.is_a?(Hash)
nested[:tag] = tag
nested
else
{ :tag => tag, "value" => nested }
end
elsif line.lstrip.start_with?("|")
modifier = line.lstrip[1..-1].to_s.strip
lines.shift
parse_block_scalar(lines, indent, modifier)
else
str = unquote_simple(lines.shift.strip)
while lines.any? && !lines[0].strip.empty? && lines[0][/^ */].size > indent
str << " " << lines.shift.strip
end
str
end
end

last_hash[last_empty_key].push(val)
def parse_block_scalar(lines, base_indent, modifier)
parts = []
block_indent = nil
while lines.any?
if lines[0].strip.empty?
parts << "\n"
lines.shift
else
line_indent = lines[0][/^ */].size
break if line_indent <= base_indent
block_indent ||= line_indent
l = lines.shift
parts << l[block_indent..-1].to_s << "\n"
end
end
res = parts.join
res.chomp! if modifier == "-" && res.end_with?("\n")
res
end

def build_permitted_tags(permitted_classes)
Array(permitted_classes).map do |klass|
name = klass.is_a?(Module) ? klass.name : klass.to_s
"!ruby/object:#{name}"
end
end

def convert_to_spec(hash, permitted_symbols)
spec = Gem::Specification.allocate
return spec unless hash.is_a?(Hash)

converted_hash = {}
hash.each {|k, v| converted_hash[k] = convert_any(v, permitted_symbols) }

# Ensure specification_version is an Integer
if converted_hash["specification_version"] && !converted_hash["specification_version"].is_a?(Integer)
converted_hash["specification_version"] = converted_hash["specification_version"].to_i
end

spec.yaml_initialize("!ruby/object:Gem::Specification", converted_hash)
spec
end

def convert_any(obj, permitted_symbols)
if obj.is_a?(Hash)
if obj[:tag] == "!ruby/object:Gem::Version"
ver = obj["version"] || obj["value"]
Gem::Version.new(ver.to_s)
elsif obj[:tag] == "!ruby/object:Gem::Platform"
if obj["value"]
Gem::Platform.new(obj["value"])
else
Gem::Platform.new([obj["cpu"], obj["os"], obj["version"]])
end
elsif ["!ruby/object:Gem::Requirement", "!ruby/object:Gem::Version::Requirement"].include?(obj[:tag])
r = Gem::Requirement.allocate
reqs = convert_any(obj["requirements"] || obj["value"], permitted_symbols)
if reqs.is_a?(Array)
reqs.map! do |item|
if item.is_a?(Array) && item.size == 2
[item[0], item[1].is_a?(Gem::Version) ? item[1] : Gem::Version.new(item[1].to_s)]
else
item
end
end
end
r.instance_variable_set(:@requirements, reqs)
r
elsif obj[:tag] == "!ruby/object:Gem::Dependency"
d = Gem::Dependency.allocate
d.instance_variable_set(:@name, obj["name"])
d.instance_variable_set(:@requirement, convert_any(obj["requirement"], permitted_symbols))

type = obj["type"]
if type
type = type.to_s.sub(/^:/, "").to_sym
else
type = :runtime
end
if permitted_symbols.any? && !permitted_symbols.include?(type.to_s)
raise ArgumentError, "Disallowed symbol: #{type.inspect}"
end
d.instance_variable_set(:@type, type)

d.instance_variable_set(:@prerelease, ["true", true].include?(obj["prerelease"]))
d.instance_variable_set(:@version_requirements, d.instance_variable_get(:@requirement))
d
else
res = Hash.new
obj.each do |k, v|
next if k == :tag
res[k.to_s] = convert_any(v, permitted_symbols)
end
res
end
elsif obj.is_a?(Array)
obj.map {|i| convert_any(i, permitted_symbols) }
else
obj
end
end

def strip_comment(val)
if val.include?("#") && !val.start_with?("#")
if val.include?("#") && !val.lstrip.start_with?("#")
val.split("#", 2).first.strip
else
val
end
end

class << self
private :dump_hash
def unquote_simple(val)
if val =~ /^"(.*)"$/
$1.gsub(/\\"/, '"').gsub(/\\n/, "\n").gsub(/\\r/, "\r").gsub(/\\t/, "\t").gsub(/\\\\/, "\\")
elsif val =~ /^'(.*)'$/
$1.gsub(/''/, "'")
elsif val == "true"
true
elsif val == "false"
false
elsif val == "nil"
nil
elsif val =~ /^\[(.*)\]$/
val == "[]" ? [] : [$1.strip]
elsif /^\d{4}-\d{2}-\d{2}/.match?(val)
require "time"
begin
Time.parse(val)
rescue ArgumentError
val
end
elsif /^-?\d+$/.match?(val)
val.to_i
else
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a complete rewrite of the YAML parsing and serialization logic, replacing the battle-tested Psych library with a custom implementation. However, there appear to be no dedicated tests for the YAMLSerializer module itself. The existing tests (like test_to_yaml in test_gem_specification.rb) may provide some coverage, but a parser of this complexity requires comprehensive unit tests covering edge cases, error handling, various YAML constructs, and round-trip serialization. Consider adding a dedicated test file test_gem_yaml_serializer.rb with extensive test coverage.

Copilot uses AI. Check for mistakes.
parts.join
end
when Time
" #{obj.utc.strftime("%Y-%m-%d %H:%M:%S.%N Z")}\n"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Time dumping format uses %Y-%m-%d %H:%M:%S.%N Z which outputs nanoseconds with 9 digits. However, the parsing code in unquote_simple uses a simple regex /^\d{4}-\d{2}-\d{2}/ and Time.parse(val) which may not correctly handle the nanosecond precision. This could lead to loss of precision in round-trip serialization/deserialization. Consider using a more consistent format or ensuring the parser can handle the full precision.

Suggested change
" #{obj.utc.strftime("%Y-%m-%d %H:%M:%S.%N Z")}\n"
utc = obj.utc
" #{utc.strftime("%Y-%m-%d %H:%M:%S")}.#{format("%06d", utc.usec)} Z\n"

Copilot uses AI. Check for mistakes.
Comment on lines 288 to 345

spec.yaml_initialize("!ruby/object:Gem::Specification", converted_hash)
spec
end

def convert_any(obj, permitted_symbols)
if obj.is_a?(Hash)
if obj[:tag] == "!ruby/object:Gem::Version"
ver = obj["version"] || obj["value"]
Gem::Version.new(ver.to_s)
elsif obj[:tag] == "!ruby/object:Gem::Platform"
if obj["value"]
Gem::Platform.new(obj["value"])
else
Gem::Platform.new([obj["cpu"], obj["os"], obj["version"]])
end
elsif ["!ruby/object:Gem::Requirement", "!ruby/object:Gem::Version::Requirement"].include?(obj[:tag])
r = Gem::Requirement.allocate
reqs = convert_any(obj["requirements"] || obj["value"], permitted_symbols)
if reqs.is_a?(Array)
reqs.map! do |item|
if item.is_a?(Array) && item.size == 2
[item[0], item[1].is_a?(Gem::Version) ? item[1] : Gem::Version.new(item[1].to_s)]
else
item
end
end
end
r.instance_variable_set(:@requirements, reqs)
r
elsif obj[:tag] == "!ruby/object:Gem::Dependency"
d = Gem::Dependency.allocate
d.instance_variable_set(:@name, obj["name"])
d.instance_variable_set(:@requirement, convert_any(obj["requirement"], permitted_symbols))

type = obj["type"]
if type
type = type.to_s.sub(/^:/, "").to_sym
else
type = :runtime
end
if permitted_symbols.any? && !permitted_symbols.include?(type.to_s)
raise ArgumentError, "Disallowed symbol: #{type.inspect}"
end
d.instance_variable_set(:@type, type)

d.instance_variable_set(:@prerelease, ["true", true].include?(obj["prerelease"]))
d.instance_variable_set(:@version_requirements, d.instance_variable_get(:@requirement))
d
else
res = Hash.new
obj.each do |k, v|
next if k == :tag
res[k.to_s] = convert_any(v, permitted_symbols)
end
res
end
elsif obj.is_a?(Array)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbol is listed in PERMITTED_CLASSES but there's no corresponding handler in convert_any for "!ruby/object:Symbol" tags. If a YAML document contains an explicitly tagged Symbol object, it will pass the security check (lines 227-228, 191-192, 152-153) but won't be properly converted to a Symbol object - it will be returned as a hash with a :tag key. Consider either removing Symbol from PERMITTED_CLASSES (since symbols are typically parsed directly from :symbol syntax) or adding proper Symbol conversion logic.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +15
pend "Psych is not loaded" if defined?(Gem::YAMLSerializer)
assert_predicate Gem::SafeYAML, :aliases_enabled?
assert_equal({ "a" => "a", "b" => "a" }, Gem::SafeYAML.safe_load("a: &a a\nb: *a\n"))
end

def test_aliases_disabled
pend "Psych is not loaded" if defined?(Gem::YAMLSerializer)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for this pend statement is inverted. The test should be skipped when YAMLSerializer IS loaded (since this PR replaces Psych with YAMLSerializer). Currently, the test is only skipped when YAMLSerializer is defined, but since it's now always loaded via Gem.load_yaml, this test will always be skipped. The condition should be checking if Psych is NOT loaded instead, or the entire pend line should be removed since YAMLSerializer is now the default.

Suggested change
pend "Psych is not loaded" if defined?(Gem::YAMLSerializer)
assert_predicate Gem::SafeYAML, :aliases_enabled?
assert_equal({ "a" => "a", "b" => "a" }, Gem::SafeYAML.safe_load("a: &a a\nb: *a\n"))
end
def test_aliases_disabled
pend "Psych is not loaded" if defined?(Gem::YAMLSerializer)
pend "Psych is not loaded" unless defined?(Psych)
assert_predicate Gem::SafeYAML, :aliases_enabled?
assert_equal({ "a" => "a", "b" => "a" }, Gem::SafeYAML.safe_load("a: &a a\nb: *a\n"))
end
def test_aliases_disabled
pend "Psych is not loaded" unless defined?(Psych)

Copilot uses AI. Check for mistakes.
while lines.any?
before_count = lines.size
parsed = parse_any(lines, -1, permitted_tags, aliases)
if lines.size == before_count && lines.any?
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infinite loop protection mechanism that shifts a line when no progress is made could silently swallow parse errors. If parse_any fails to consume any lines (returning nil or empty result), the loop shifts a line and continues, potentially losing data. Consider adding error logging or raising an exception when this fallback is triggered, or at minimum document why silent skipping is acceptable.

Suggested change
if lines.size == before_count && lines.any?
if lines.size == before_count && lines.any?
warn "Gem::YAMLSerializer.load: skipping unparseable line #{lines.first.inspect} because it could not be parsed" if defined?(warn)

Copilot uses AI. Check for mistakes.
Comment on lines 250 to 267
def parse_block_scalar(lines, base_indent, modifier)
parts = []
block_indent = nil
while lines.any?
if lines[0].strip.empty?
parts << "\n"
lines.shift
else
line_indent = lines[0][/^ */].size
break if line_indent <= base_indent
block_indent ||= line_indent
l = lines.shift
parts << l[block_indent..-1].to_s << "\n"
end
end
res = parts.join
res.chomp! if modifier == "-" && res.end_with?("\n")
res
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parse_block_scalar function only handles the "-" modifier (chomping) but YAML block scalars support other modifiers like "+" (keeping final newlines) and numeric indentation indicators (e.g., "|2"). The current implementation will treat "|+" or "|2" as unrecognized modifiers and won't apply the correct semantics, potentially leading to incorrect parsing of multiline strings. Consider implementing full block scalar modifier support or documenting the limitations.

Copilot uses AI. Check for mistakes.
hsbt added 2 commits February 25, 2026 18:25
…n and add SafeYAML tests for specification_version, disallowed classes/symbols, and aliases
…cation_version conversion to numeric strings; update tests to assert error messages and add real gemspec parsing test
…turn {} unless content is a Hash to avoid unsafe loads and ensure a Hash result
lines[0] = line
end

result = nil
…r, ensure requirements/rdoc_options become arrays and tolerate malformed versions, and add tests covering anchors and parsing fixes
…ons and other array fields are correctly typed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants