Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ sudo: false
matrix:
include:
- go: 1.x
- go: 1.11.x
- go: 1.12.x
- go: 1.13.x
- go: 1.x
arch: amd64
Expand Down
2 changes: 2 additions & 0 deletions BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ go_library(
"util.go",
"value.go",
"vm.go",
"yaml.go",
],
importpath = "github.com/google/go-jsonnet",
visibility = ["//visibility:public"],
Expand All @@ -31,6 +32,7 @@ go_library(
"//internal/errors:go_default_library",
"//internal/parser:go_default_library",
"//internal/program:go_default_library",
"@io_k8s_sigs_yaml//:go_default_library",
],
)

Expand Down
11 changes: 8 additions & 3 deletions bazel/deps.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@ def jsonnet_go_dependencies():
sum = "h1:2E4SXV/wtOkTonXsotYi4li6zVWxYlZuYNCXe9XRJyk=",
version = "v1.4.0",
)

go_repository(
name = "in_gopkg_check_v1",
importpath = "gopkg.in/check.v1",
Expand All @@ -90,8 +89,14 @@ def jsonnet_go_dependencies():
go_repository(
name = "in_gopkg_yaml_v2",
importpath = "gopkg.in/yaml.v2",
sum = "h1:/eiJrUcujPVeJ3xlSWaiNi3uSVmDGBK1pDHUHAnao1I=",
version = "v2.2.4",
sum = "h1:VUgggvou5XRW9mHwD/yXxIYSMtY0zoKQf/v226p2nyo=",
version = "v2.2.7",
)
go_repository(
name = "io_k8s_sigs_yaml",
importpath = "sigs.k8s.io/yaml",
sum = "h1:4A07+ZFc2wgJwo8YNlQpr1rVlgUDlxXHhPJciaPY5gs=",
version = "v1.1.0",
)
go_repository(
name = "org_golang_x_sys",
Expand Down
30 changes: 30 additions & 0 deletions builtins.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (
"encoding/hex"
"encoding/json"
"fmt"
"io"
"math"
"reflect"
"sort"
Expand Down Expand Up @@ -1194,6 +1195,34 @@ func builtinParseJSON(i *interpreter, str value) (value, error) {
return jsonToValue(i, parsedJSON)
}

func builtinParseYAML(i *interpreter, str value) (value, error) {
sval, err := i.getString(str)
if err != nil {
return nil, err
}
s := sval.getGoString()

isYamlStream := strings.Contains(s, "---")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can get bitten by some boundary case here. Can it be a part of a multi-line comment? How other implementations handle that?

Copy link
Contributor Author

@groodt groodt Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shotgun parsing is a bit gross, I agree. Anyone adding those characters to a YAML document will be in for a difficult time if they aren't escaping it thoroughly. The .Contains check can probably be made more explicit to check for start of line or end of line characters if that makes you feel more comfortable?

Alternatively, we can adjust the user facing API. The only reason for "sniffing" whether a document is a "yaml stream" or not is to be perhaps more user-friendly by converting a single YAML document to an object {} and a stream of YAML documents into an array []. An alternative user API could be to require a user to choose. e.g. parseYaml that returns a single object {} and parseYamlStream which returns an array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think you could achieve the same effect in a cleaner way just by checking len(elems). The elems will only have multiple elements when there are multiple documents. If it's an array, it will still be just one document. Is that correct?

An alternative user API could be to require a user to choose. e.g. parseYaml that returns a single object {} and parseYamlStream which returns an array.

That could work. It's more complicated, but cleaner in a way. I don't have a strong opinion. We can also add parseYamlStream later (which forces returning of an array, even if there is only one document).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think you could achieve the same effect in a cleaner way just by checking len(elems).

How do we differentiate between a single YAML document with a scalar list vs a YAML stream of documents?

YAML, like JSON, can have a top level sequence or array. Some examples.

// Single doc, scalar array at root
std.parseYaml(
|||
  - {a: 1, b: 2}
  - {a: 3, b: 4}
  - {a: 5, b: 6}
|||
)

[
   {
      "a": 1,
      "b": 2
   },
   {
      "a": 3,
      "b": 4
   },
   {
      "a": 5,
      "b": 6
   }
]
// Mutli doc YAML stream with document start separators.
std.parseYaml(
  |||
   ---
   {a: 1, b: 2}
   ---
   {a: 3, b: 4}
   ---
   {a: 5, b: 6}
  |||
)

[
   {
      "a": 1,
      "b": 2
   },
   {
      "a": 3,
      "b": 4
   },
   {
      "a": 5,
      "b": 6
   }
]
// Single doc, scalar array at root. Indentation instead of {
std.parseYaml(
|||
  - a: 1
    b: 2
  - a: 3
    b: 4
  - a: 5
    b: 6
|||
)

[
   {
      "a": 1,
      "b": 2
   },
   {
      "a": 3,
      "b": 4
   },
   {
      "a": 5,
      "b": 6
   }
]

Since there can be scalar arrays at the root, checking the len(elems) isn't enough because it isn't possible to tell the difference between a yaml stream with 1 document len(elems) == 1 and a single document with a top-level scalar array of length 1.

The approach used in the current implementation assumes that if you specify a YAML stream using --- you know you will receive an array of 1 or more documents. If you do not specify a YAML stream, you will get whatever YAML object you specified, either a Map or a Seq of length, 0, 1 or more elements.

YAML is complicated! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there can be scalar arrays at the root, checking the len(elems) isn't enough because it isn't possible to tell the difference between a yaml stream with 1 document len(elems) == 1 and a single document with a top-level scalar array of length 1.

Please correct me if I misunderstand something. I assumed that d.Decode(&elem) reads one document. So len(elems) will be the number of documents in a stream. If it's only one document (even if it's an array) it would read the whole thing. Is that correct?

Is your point that:

---
foo: bar

and

foo: bar

both produce {foo: "bar"}, but the user might have expected the array in the first case? Hnnnn... I think it's a valid concern, because when processing streams, it makes it hard to handle special case of 1-element stream.

Sniffing for --- still seems pretty bad (even if we made sure that it's a line on its own). First, I'm afraid of special cases. Second, IIUC formally it's not a magic symbol for streams, but an explicit start of the document.

Perhaps an explicit parseYamlStream which always returns the array is the way out of this.

@sparkprime @sh0rez Any thoughts about handling YAML streams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both produce {foo: "bar"}, but the user might have expected the array in the first case? Hnnnn... I think it's a valid concern, because when processing streams, it makes it hard to handle special case of 1-element stream.

Yes, that's my concern. You are correct in that it attempts to read a single document, but should we treat a 1-element stream differently or not is the question? Should the output of the following 2 examples be different or the same?

std.parseYaml(
  |||
   ---
   foo: bar
  |||
)
std.parseYaml(
|||
   foo: bar
|||
)

What about this one?

std.parseYaml(
  |||
   ---
   foo: bar
   ---
   wibble: wobble
  |||
)
std.parseYaml(
|||
  - foo: bar
  - wibble: wobble
|||
)

I think I can be convinced either way to be honest.


elems := []interface{}{}
d := NewYAMLToJSONDecoder(strings.NewReader(s))
for {
var elem interface{}
if err := d.Decode(&elem); err != nil {
if err == io.EOF {
break
}
return nil, i.Error(fmt.Sprintf("failed to parse YAML: %v", err.Error()))
}
elems = append(elems, elem)
}

if isYamlStream {
return jsonToValue(i, elems)
}
return jsonToValue(i, elems[0])
}

func jsonEncode(v interface{}) (string, error) {
buf := new(bytes.Buffer)
enc := json.NewEncoder(buf)
Expand Down Expand Up @@ -1620,6 +1649,7 @@ var funcBuiltins = buildBuiltinMap([]builtin{
&unaryBuiltin{name: "base64Decode", function: builtinBase64Decode, params: ast.Identifiers{"str"}},
&unaryBuiltin{name: "base64DecodeBytes", function: builtinBase64DecodeBytes, params: ast.Identifiers{"str"}},
&unaryBuiltin{name: "parseJson", function: builtinParseJSON, params: ast.Identifiers{"str"}},
&unaryBuiltin{name: "parseYaml", function: builtinParseYAML, params: ast.Identifiers{"str"}},
&generalBuiltin{name: "manifestJsonEx", function: builtinManifestJSONEx, params: []generalBuiltinParameter{{name: "value"}, {name: "indent"},
{name: "newline", defaultValue: &valueFlatString{value: []rune("\n")}},
{name: "key_val_sep", defaultValue: &valueFlatString{value: []rune(": ")}}}},
Expand Down
2 changes: 1 addition & 1 deletion cpp-jsonnet
Submodule cpp-jsonnet updated 308 files
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,6 @@ go 1.13
require (
github.com/fatih/color v1.10.0
github.com/sergi/go-diff v1.1.0
gopkg.in/yaml.v2 v2.2.7 // indirect
sigs.k8s.io/yaml v1.1.0
)
4 changes: 4 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,7 @@ gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.2.4 h1:/eiJrUcujPVeJ3xlSWaiNi3uSVmDGBK1pDHUHAnao1I=
gopkg.in/yaml.v2 v2.2.4/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.2.7 h1:VUgggvou5XRW9mHwD/yXxIYSMtY0zoKQf/v226p2nyo=
gopkg.in/yaml.v2 v2.2.7/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
sigs.k8s.io/yaml v1.1.0 h1:4A07+ZFc2wgJwo8YNlQpr1rVlgUDlxXHhPJciaPY5gs=
sigs.k8s.io/yaml v1.1.0/go.mod h1:UJmg0vDUVViEyp3mgSv9WPwZCDxu4rQW1olrI1uml+o=
1 change: 1 addition & 0 deletions linter/internal/types/stdlib.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ func prepareStdlib(g *typeGraph) {
"parseOctal": g.newSimpleFuncType(numberType, "str"),
"parseHex": g.newSimpleFuncType(numberType, "str"),
"parseJson": g.newSimpleFuncType(jsonType, "str"),
"parseYaml": g.newSimpleFuncType(jsonType, "str"),
"encodeUTF8": g.newSimpleFuncType(arrayOfNumber, "str"),
"decodeUTF8": g.newSimpleFuncType(stringType, "arr"),

Expand Down
10 changes: 10 additions & 0 deletions testdata/parseYaml.golden
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"aaa": { },
"foo": "bar",
"xxx": [
42,
"asdf",
{ }
],
"ąę": "ćż"
}
11 changes: 11 additions & 0 deletions testdata/parseYaml.jsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
std.parseYaml(
|||
foo: bar
aaa: {}
ąę: ćż
xxx:
- 42
- asdf
- {}
|||
)
Empty file.
139 changes: 139 additions & 0 deletions yaml.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
/*
Copyright 2019 Google Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package jsonnet

import (
"bufio"
"bytes"
"io"
"strings"
"unicode"

"sigs.k8s.io/yaml"
)

const separator = "---"

// YAMLToJSONDecoder decodes YAML documents from an io.Reader by
// separating individual documents. It first converts the YAML
// body to JSON, then unmarshals the JSON.
type YAMLToJSONDecoder struct {
reader Reader
}

// NewYAMLToJSONDecoder decodes YAML documents from the provided
// stream in chunks by converting each document (as defined by
// the YAML spec) into its own chunk, converting it to JSON via
// yaml.YAMLToJSON, and then passing it to json.Decoder.
func NewYAMLToJSONDecoder(r io.Reader) *YAMLToJSONDecoder {
reader := bufio.NewReader(r)
return &YAMLToJSONDecoder{
reader: NewYAMLReader(reader),
}
}

// Decode reads a YAML document as JSON from the stream or returns
// an error. The decoding rules match json.Unmarshal, not
// yaml.Unmarshal.
func (d *YAMLToJSONDecoder) Decode(into interface{}) error {
bytes, err := d.reader.Read()
if err != nil && err != io.EOF {
return err
}

if len(bytes) != 0 {
err := yaml.Unmarshal(bytes, into)
if err != nil {
return err
}
}
return err
}

// Reader reads bytes
type Reader interface {
Read() ([]byte, error)
}

// YAMLReader reads YAML
type YAMLReader struct {
reader Reader
}

// NewYAMLReader creates a new YAMLReader
func NewYAMLReader(r *bufio.Reader) *YAMLReader {
return &YAMLReader{
reader: &LineReader{reader: r},
}
}

// Read returns a full YAML document.
func (r *YAMLReader) Read() ([]byte, error) {
var buffer bytes.Buffer
for {
line, err := r.reader.Read()
if err != nil && err != io.EOF {
return nil, err
}

sep := len([]byte(separator))
if i := bytes.Index(line, []byte(separator)); i == 0 {
// We have a potential document terminator
i += sep
after := line[i:]
if len(strings.TrimRightFunc(string(after), unicode.IsSpace)) == 0 {
if buffer.Len() != 0 {
return buffer.Bytes(), nil
}
if err == io.EOF {
return nil, err
}
}
}
if err == io.EOF {
if buffer.Len() != 0 {
// If we're at EOF, we have a final, non-terminated line. Return it.
return buffer.Bytes(), nil
}
return nil, err
}
buffer.Write(line)
}
}

// LineReader reads single lines.
type LineReader struct {
reader *bufio.Reader
}

// Read returns a single line (with '\n' ended) from the underlying reader.
// An error is returned iff there is an error with the underlying reader.
func (r *LineReader) Read() ([]byte, error) {
var (
isPrefix bool = true
err error = nil
line []byte
buffer bytes.Buffer
)

for isPrefix && err == nil {
line, isPrefix, err = r.reader.ReadLine()
buffer.Write(line)
}
buffer.WriteByte('\n')
return buffer.Bytes(), err
}