Description
YAML scanner has problems with unusual mapping keys
Trans reported:
Found an issue with the YAML syntax highlighting in CodeRay. Try it on the following example.
When it hits mri <1.9 it starts reverse highlighting almost every other line.
- package: system_timer
engine : mri <1.9
The problem is that the scanner doesn't allow white space before the colon. That's an easy fix; however, the set of characters allowed in keys is limited and non spec-conform.
See http://yaml.org/spec/1.2/spec.html#id2788859 for further reading.
A patch for the 1.0 trunk:
Index: /Users/murphy/ruby/coderay/lib/coderay/scanners/yaml.rb
===================================================================
--- /Users/murphy/ruby/coderay/lib/coderay/scanners/yaml.rb (revision 580)
+++ /Users/murphy/ruby/coderay/lib/coderay/scanners/yaml.rb (working copy)
@@ -75,20 +75,18 @@
when match = scan(/[,{}\[\]]/)
encoder.text_token match, :operator
next
- when state == :initial && match = scan(/[\w.() ]*\S(?=: |:$)/)
+ when state == :initial && match = scan(/[\w.() ]*\S(?= *:(?: |$))/)
encoder.text_token match, :key
key_indent = column(pos - match.size - 1)
- # encoder.text_token key_indent.inspect, :debug
state = :colon
next
- when match = scan(/(?:"[^"\n]*"|'[^'\n]*')(?=: |:$)/)
+ when match = scan(/(?:"[^"\n]*"|'[^'\n]*')(?= *:(?: |$))/)
encoder.begin_group :key
encoder.text_token match[0,1], :delimiter
encoder.text_token match[1..-2], :content
encoder.text_token match[-1,1], :delimiter
encoder.end_group :key
key_indent = column(pos - match.size - 1)
- # encoder.text_token key_indent.inspect, :debug
state = :colon
next
when match = scan(/(![\w\/]+)(:([\w:]+))?/)
From Redmine: http://odd-eyed-code.org/issues/231
YAML scanner doesn't recognize false and true
It should also recognize -.Inf and such. Some readable part of the spec is about the Core Schema which Ruby seems to use.
Needs more investigation.
Index: lib/coderay/scanners/yaml.rb
===================================================================
--- lib/coderay/scanners/yaml.rb (revision 742)
+++ lib/coderay/scanners/yaml.rb (working copy)
@@ -11,6 +11,11 @@
KINDS_NOT_LOC = :all
+ CONSTANTS = %w[ true True TRUE false False FALSE null Null NULL ] # :nodoc:
+
+ IDENT_KIND = WordList.new(nil).
+ add(CONSTANTS, :pre_constant)
+
protected
def scan_tokens encoder, options
@@ -59,7 +64,11 @@
encoder.text_token matched, :content if scan(/(?:\n+ {#{string_indent + 1}}.*)+/)
encoder.end_group :string
next
- when match = scan(/(?![!"*&]).+?(?=$|\s+#)/)
+ when match = scan(/(?![!*&\[\],{}]|- ).+?(?=$|\s+#)/)
+ if kind = IDENT_KIND[match]
+ encoder.text_token match, kind
+ next
+ end
encoder.begin_group :string
encoder.text_token match, :content
string_indent = key_indent || column(pos - match.size - 1)
@@ -116,6 +125,9 @@
when match = scan(/:\w+/)
encoder.text_token match, :symbol
next
+ when match = scan(/~|\.(?:inf|Inf|INF|nan|NaN|NAN)/)
+ encoder.text_token match, :pre_constant
+ next
when match = scan(/[^:\s]+(:(?! |$)[^:\s]*)* .*/)
encoder.text_token match, :error
next
From Redmine: http://odd-eyed-code.org/issues/234
YAML scanner doesn't recognize - -
required_ruby_version: !ruby/object:Gem::Requirement
requirements:
- - ">="
- !ruby/object:Gem::Version
version: 1.8.2
version:
The third line is currently interpreted as
operator(- )string(- ">=")
, which is wrong. It should beoperator(- )operator(- )string(">=")
.
From Redmine: http://odd-eyed-code.org/issues/237
The yaml.multiline example is broken
The YAML scanner doesn't seem to tokenize multiline strings correctly.
From Redmine: http://odd-eyed-code.org/issues/238
YAML scanner doesn't recognize arrays
...like [1, 2, 3].
From Redmine: http://odd-eyed-code.org/issues/239