Skip to content

YAML scanner improvements #84

Open
@nathany

Description

@nathany

YAML scanner has problems with unusual mapping keys

Trans reported:

Found an issue with the YAML syntax highlighting in CodeRay. Try it on the following example.
When it hits mri <1.9 it starts reverse highlighting almost every other line.

  - package: system_timer
    engine : mri <1.9

The problem is that the scanner doesn't allow white space before the colon. That's an easy fix; however, the set of characters allowed in keys is limited and non spec-conform.

See http://yaml.org/spec/1.2/spec.html#id2788859 for further reading.

A patch for the 1.0 trunk:

Index: /Users/murphy/ruby/coderay/lib/coderay/scanners/yaml.rb
===================================================================
--- /Users/murphy/ruby/coderay/lib/coderay/scanners/yaml.rb (revision 580)
+++ /Users/murphy/ruby/coderay/lib/coderay/scanners/yaml.rb (working copy)
@@ -75,20 +75,18 @@
           when match = scan(/[,{}\[\]]/)
             encoder.text_token match, :operator
             next
-          when state == :initial && match = scan(/[\w.() ]*\S(?=: |:$)/)
+          when state == :initial && match = scan(/[\w.() ]*\S(?= *:(?: |$))/)
             encoder.text_token match, :key
             key_indent = column(pos - match.size - 1)
-            # encoder.text_token key_indent.inspect, :debug
             state = :colon
             next
-          when match = scan(/(?:"[^"\n]*"|'[^'\n]*')(?=: |:$)/)
+          when match = scan(/(?:"[^"\n]*"|'[^'\n]*')(?= *:(?: |$))/)
             encoder.begin_group :key
             encoder.text_token match[0,1], :delimiter
             encoder.text_token match[1..-2], :content
             encoder.text_token match[-1,1], :delimiter
             encoder.end_group :key
             key_indent = column(pos - match.size - 1)
-            # encoder.text_token key_indent.inspect, :debug
             state = :colon
             next
           when match = scan(/(![\w\/]+)(:([\w:]+))?/)

From Redmine: http://odd-eyed-code.org/issues/231


YAML scanner doesn't recognize false and true

It should also recognize -.Inf and such. Some readable part of the spec is about the Core Schema which Ruby seems to use.

Needs more investigation.

Index: lib/coderay/scanners/yaml.rb
===================================================================
--- lib/coderay/scanners/yaml.rb    (revision 742)
+++ lib/coderay/scanners/yaml.rb    (working copy)
@@ -11,6 +11,11 @@

     KINDS_NOT_LOC = :all

+    CONSTANTS = %w[ true True TRUE false False FALSE null Null NULL ]  # :nodoc:
+    
+    IDENT_KIND = WordList.new(nil).
+      add(CONSTANTS, :pre_constant)
+    
   protected

     def scan_tokens encoder, options
@@ -59,7 +64,11 @@
             encoder.text_token matched, :content if scan(/(?:\n+ {#{string_indent + 1}}.*)+/)
             encoder.end_group :string
             next
-          when match = scan(/(?![!"*&]).+?(?=$|\s+#)/)
+          when match = scan(/(?![!*&\[\],{}]|- ).+?(?=$|\s+#)/)
+            if kind = IDENT_KIND[match]
+              encoder.text_token match, kind
+              next
+            end
             encoder.begin_group :string
             encoder.text_token match, :content
             string_indent = key_indent || column(pos - match.size - 1)
@@ -116,6 +125,9 @@
           when match = scan(/:\w+/)
             encoder.text_token match, :symbol
             next
+          when match = scan(/~|\.(?:inf|Inf|INF|nan|NaN|NAN)/)
+            encoder.text_token match, :pre_constant
+            next
           when match = scan(/[^:\s]+(:(?! |$)[^:\s]*)* .*/)
             encoder.text_token match, :error
             next

From Redmine: http://odd-eyed-code.org/issues/234


YAML scanner doesn't recognize - -

required_ruby_version: !ruby/object:Gem::Requirement 
  requirements: 
  - - ">=" 
    - !ruby/object:Gem::Version 
      version: 1.8.2
  version: 

The third line is currently interpreted as operator(- )string(- ">="), which is wrong. It should be operator(- )operator(- )string(">=").

From Redmine: http://odd-eyed-code.org/issues/237


The yaml.multiline example is broken

The YAML scanner doesn't seem to tokenize multiline strings correctly.

From Redmine: http://odd-eyed-code.org/issues/238


YAML scanner doesn't recognize arrays

...like [1, 2, 3].

From Redmine: http://odd-eyed-code.org/issues/239

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions