Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Duplicate key support coming? #64

Open
God-damnit-all opened this issue Aug 17, 2020 · 10 comments
Open

[Question] Duplicate key support coming? #64

God-damnit-all opened this issue Aug 17, 2020 · 10 comments

Comments

@God-damnit-all
Copy link

What I'm working with tends to generate duplicate keys sometimes. It would be awesome if duplicate keys could somehow be referred to like an array, "Key"[0] and "Key"[1], for instance.

The readme says that duplicate keys will throw an exception 'for the time being', but I'm not sure what's implied to be temporary, the incompatibility with duplicate keys, or the thrown exception.

@alexanderameye
Copy link

+1 for this

@gabriel-samfira
Copy link
Member

Hi folks!

I will try to allocate some time next week. In all likelihood, any implementation will probably override previous keys. Would that be enough?

@God-damnit-all
Copy link
Author

That would defeat the purpose in my opinion.

You could use a custom class. One option, which is MIT licensed, is this C# generic implementation of the NameValueCollection class that allows for keys and values to be of any type: https://www.codeproject.com/Articles/5323395/A-Generic-Form-of-the-NameValueCollection

Since you need to register in order to download, here's the file: NameValueCollection.zip

@God-damnit-all
Copy link
Author

Ah, scratch that.

If the specified key already exists in the target NameValueCollection instance, the specified value is added to the existing comma-separated list of values in the form "value1,value2,value3". The values are associated with the same key in the target NameValueCollection instance.

I came across the Lexicon class, but the code is pretty dated (.NET 3.5) and there's no license specified.

@God-damnit-all
Copy link
Author

Actually, it seems like an array of dicts can do the job:

$test = @( @{foo='bar'}, @{foo=$true} )
$test | Out-String

Key Value
--- -----
foo bar
foo bar

$test.foo

bar
True

@gabriel-samfira
Copy link
Member

@ImportTaste We try to adhere to the yaml spec as much as possible. Sadly, merging was never really added to any spec version. There was an attempt to add it in yaml spec version 1.1, but was deprecated shortly after that. So in a sense, merging was released already deprecated.

The proposal for merging yaml is available on the yaml.org website and describes how merging is expected to work. From what I understand, a key will be added to a yaml when merging, only if that key does not already exist in the current yaml. This makes sense in a way, as you have a number of different yaml, each declaring keys which can hold a certain type. If we mutate a scalar into a sequence just because we have a duplicate key when we merge, this would break expectations in terms of what type we expect to find in that key. Anyone that checks that draft and tries to work with the merging parser should get the behavior described there.

Coincidentally, the python yaml parser obeys the same draft:

>>> import yaml
>>> data = """
... ---
... default: &default
...   value1: 1
...   value2: 2
... 
... hoge:
...   <<: *default
...   # this is a duplicate
...   value1: 44
...   value3: 3
... """
>>> converted = yaml.safe_load(data)
>>> print(yaml.dump(converted, default_flow_style=False))
default:
  value1: 1
  value2: 2
hoge:
  value1: 44
  value2: 2
  value3: 3

Which makes sense.

This seems like custom behavior which would probably not be well suited for a general purpose yaml parser.

@God-damnit-all
Copy link
Author

God-damnit-all commented May 28, 2023

Should the values of the keys be merged, though? Some yaml configs are formatted that way for a reason, and doing an array of dicts would maintain the order they're in. I imagine you could use [Linq.Lookup[string,object]] for a merge method.

@gabriel-samfira
Copy link
Member

Should the values of the keys be merged, though?

Merge keys do not care much about values. The draft specifies that if a key already exists in a yaml, it's pretty much left alone. Merging another yaml with the same key, regardless of value, has no effect on the value already contained in an existing key.

The draft available here: https://yaml.org/type/merge.html

Has an example at the end of the document exemplifying how merging 2 yaml that are mappings containing the same key, works. In short, the first key to merge is the key that will remain at the end.

To clarify a bit more, if we create a more complex example with multiple nested mappings, the same rules would apply, as merge keys deals primarily with keys not with values.

Example in above mentioned draft

---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: 10 }
- &SMALL { r: 1 }

# All the following maps are equal:

- # Explicit keys
  x: 1
  y: 2
  r: 10
  label: center/big

- # Merge one map
  << : *CENTER
  r: 10
  label: center/big

- # Merge multiple maps
  << : [ *CENTER, *BIG ]
  label: center/big

- # Override
  << : [ *BIG, *LEFT, *SMALL ]
  x: 1
  label: center/big

Now, if we were to expand this to add some nested mappings it would look something like this:

---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: {world: {hello: there, goodbye: now}} }
- &SMALL { r: {test: override} }

# All the following maps are equal:

- # Explicit keys
  x: 1
  y: 2
  r:
    world:
      hello: there
      goodbye: now
  label: center/big

- # Override
  << : [ *BIG, *LEFT, *SMALL ]
  x: 1
  label: center/big

Some parsers may add an anchor and reference that in later mappings, but they would be equivalent. The idea is that even if there is no conflict between the keys set as a value for r in the *BIG and *SMALL mappings, we still end up with only one r key. The value held by the key we're left with is the first one to merge or exist in our yaml. Values do not merge. Ie: we won't have:

world:
  goodbye: now
  hello: there
test: override

as the value of r.

It's not that it's technically impossible to do. It's just different from what the spec describes. If we're to add something, we should interoperate with other parsers as best we can and this means adhering to whatever spec or draft is implemented by everyone else. I realize this module isn't perfect and that some things do diverge, but we should avoid deviating even more.

Something like the above situation can be solved a different way:

---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG_VALUES
  world:
    hello: there
    goodbye: now
- &SMALL_VALUES
  test:
    override: true
- &BIG
  r:
    << : *BIG_VALUES
  onlyInBIG: test
- &SMALL
  r:
    << : *SMALL_VALUES
  onlyInSMALL: smallTest

# All the following maps are equal:

- # Explicit keys
  x: 1
  y: 2
  r:
    world:
      hello: there
      goodbye: now
    test:
      override: true
  onlyInBIG: test
  onlyInSMALL: smallTest
  label: center/big

- # Combine values
  << : [ *BIG, *SMALL ]
  r:
    << : [ *BIG_VALUES, *SMALL_VALUES ]
  x: 1
  label: center/big

I realize this is not ideal, but it's how the spec describes this feature should work. There are mentions in various places that "something better" should be implemented in a future yaml spec, possibly 1.3. When that happens, I would be more than happy to add that to this module.

Until then, I can attempt to have this module not error out when duplicate keys are encountered and have it work as described in the current spec.

@God-damnit-all
Copy link
Author

I personally feel like acting in pure accordance to the spec kind of misses the point on why people use yaml in the first place, though.

@gabriel-samfira
Copy link
Member

While applications can choose to implement custom behavior that works for them, general purpose modules need to do their best to interoperate with other parsers. Without a standard to follow, this would not be possible. We'd end up with a yaml that is valid in one parser, but completely illegible by other parsers or at the very best you get different behaviors between parsers. This leads to fragmentation and is undesirable for both application developers as well as module maintainers.

Non standard behavior should be implemented in the application that needs it, with the assumption that the format the application chooses to implement will not necessarily be readable (at least not in the same way) by other languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants