[Question] Duplicate key support coming? #64

God-damnit-all · 2020-08-17T16:02:11Z

What I'm working with tends to generate duplicate keys sometimes. It would be awesome if duplicate keys could somehow be referred to like an array, "Key"[0] and "Key"[1], for instance.

The readme says that duplicate keys will throw an exception 'for the time being', but I'm not sure what's implied to be temporary, the incompatibility with duplicate keys, or the thrown exception.

alexanderameye · 2023-05-25T07:35:38Z

+1 for this

gabriel-samfira · 2023-05-28T11:13:41Z

Hi folks!

I will try to allocate some time next week. In all likelihood, any implementation will probably override previous keys. Would that be enough?

God-damnit-all · 2023-05-28T15:42:22Z

That would defeat the purpose in my opinion.

You could use a custom class. One option, which is MIT licensed, is this C# generic implementation of the NameValueCollection class that allows for keys and values to be of any type: https://www.codeproject.com/Articles/5323395/A-Generic-Form-of-the-NameValueCollection

Since you need to register in order to download, here's the file: NameValueCollection.zip

God-damnit-all · 2023-05-28T15:47:02Z

Ah, scratch that.

If the specified key already exists in the target NameValueCollection instance, the specified value is added to the existing comma-separated list of values in the form "value1,value2,value3". The values are associated with the same key in the target NameValueCollection instance.

I came across the Lexicon class, but the code is pretty dated (.NET 3.5) and there's no license specified.

God-damnit-all · 2023-05-28T16:10:25Z

Actually, it seems like an array of dicts can do the job:

$test = @( @{foo='bar'}, @{foo=$true} )
$test | Out-String

Key Value
--- -----
foo bar
foo bar

$test.foo

bar
True

gabriel-samfira · 2023-05-28T17:39:59Z

@ImportTaste We try to adhere to the yaml spec as much as possible. Sadly, merging was never really added to any spec version. There was an attempt to add it in yaml spec version 1.1, but was deprecated shortly after that. So in a sense, merging was released already deprecated.

The proposal for merging yaml is available on the yaml.org website and describes how merging is expected to work. From what I understand, a key will be added to a yaml when merging, only if that key does not already exist in the current yaml. This makes sense in a way, as you have a number of different yaml, each declaring keys which can hold a certain type. If we mutate a scalar into a sequence just because we have a duplicate key when we merge, this would break expectations in terms of what type we expect to find in that key. Anyone that checks that draft and tries to work with the merging parser should get the behavior described there.

Coincidentally, the python yaml parser obeys the same draft:

>>> import yaml
>>> data = """
... ---
... default: &default
...   value1: 1
...   value2: 2
... 
... hoge:
...   <<: *default
...   # this is a duplicate
...   value1: 44
...   value3: 3
... """
>>> converted = yaml.safe_load(data)
>>> print(yaml.dump(converted, default_flow_style=False))
default:
  value1: 1
  value2: 2
hoge:
  value1: 44
  value2: 2
  value3: 3

Which makes sense.

This seems like custom behavior which would probably not be well suited for a general purpose yaml parser.

God-damnit-all · 2023-05-28T19:10:56Z

Should the values of the keys be merged, though? Some yaml configs are formatted that way for a reason, and doing an array of dicts would maintain the order they're in. I imagine you could use [Linq.Lookup[string,object]] for a merge method.

gabriel-samfira · 2023-05-29T07:58:55Z

Should the values of the keys be merged, though?

Merge keys do not care much about values. The draft specifies that if a key already exists in a yaml, it's pretty much left alone. Merging another yaml with the same key, regardless of value, has no effect on the value already contained in an existing key.

The draft available here: https://yaml.org/type/merge.html

Has an example at the end of the document exemplifying how merging 2 yaml that are mappings containing the same key, works. In short, the first key to merge is the key that will remain at the end.

To clarify a bit more, if we create a more complex example with multiple nested mappings, the same rules would apply, as merge keys deals primarily with keys not with values.

Example in above mentioned draft

---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: 10 }
- &SMALL { r: 1 }

# All the following maps are equal:

- # Explicit keys
  x: 1
  y: 2
  r: 10
  label: center/big

- # Merge one map
  << : *CENTER
  r: 10
  label: center/big

- # Merge multiple maps
  << : [ *CENTER, *BIG ]
  label: center/big

- # Override
  << : [ *BIG, *LEFT, *SMALL ]
  x: 1
  label: center/big

Now, if we were to expand this to add some nested mappings it would look something like this:

---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: {world: {hello: there, goodbye: now}} }
- &SMALL { r: {test: override} }

# All the following maps are equal:

- # Explicit keys
  x: 1
  y: 2
  r:
    world:
      hello: there
      goodbye: now
  label: center/big

- # Override
  << : [ *BIG, *LEFT, *SMALL ]
  x: 1
  label: center/big

Some parsers may add an anchor and reference that in later mappings, but they would be equivalent. The idea is that even if there is no conflict between the keys set as a value for r in the *BIG and *SMALL mappings, we still end up with only one r key. The value held by the key we're left with is the first one to merge or exist in our yaml. Values do not merge. Ie: we won't have:

world:
  goodbye: now
  hello: there
test: override

as the value of r.

It's not that it's technically impossible to do. It's just different from what the spec describes. If we're to add something, we should interoperate with other parsers as best we can and this means adhering to whatever spec or draft is implemented by everyone else. I realize this module isn't perfect and that some things do diverge, but we should avoid deviating even more.

Something like the above situation can be solved a different way:

---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG_VALUES
  world:
    hello: there
    goodbye: now
- &SMALL_VALUES
  test:
    override: true
- &BIG
  r:
    << : *BIG_VALUES
  onlyInBIG: test
- &SMALL
  r:
    << : *SMALL_VALUES
  onlyInSMALL: smallTest

# All the following maps are equal:

- # Explicit keys
  x: 1
  y: 2
  r:
    world:
      hello: there
      goodbye: now
    test:
      override: true
  onlyInBIG: test
  onlyInSMALL: smallTest
  label: center/big

- # Combine values
  << : [ *BIG, *SMALL ]
  r:
    << : [ *BIG_VALUES, *SMALL_VALUES ]
  x: 1
  label: center/big

I realize this is not ideal, but it's how the spec describes this feature should work. There are mentions in various places that "something better" should be implemented in a future yaml spec, possibly 1.3. When that happens, I would be more than happy to add that to this module.

Until then, I can attempt to have this module not error out when duplicate keys are encountered and have it work as described in the current spec.

God-damnit-all · 2023-05-29T18:15:52Z

I personally feel like acting in pure accordance to the spec kind of misses the point on why people use yaml in the first place, though.

gabriel-samfira · 2023-05-29T19:06:05Z

While applications can choose to implement custom behavior that works for them, general purpose modules need to do their best to interoperate with other parsers. Without a standard to follow, this would not be possible. We'd end up with a yaml that is valid in one parser, but completely illegible by other parsers or at the very best you get different behaviors between parsers. This leads to fragmentation and is undesirable for both application developers as well as module maintainers.

Non standard behavior should be implemented in the application that needs it, with the assumption that the format the application chooses to implement will not necessarily be readable (at least not in the same way) by other languages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Duplicate key support coming? #64

[Question] Duplicate key support coming? #64

God-damnit-all commented Aug 17, 2020

alexanderameye commented May 25, 2023

gabriel-samfira commented May 28, 2023

God-damnit-all commented May 28, 2023

God-damnit-all commented May 28, 2023

God-damnit-all commented May 28, 2023

gabriel-samfira commented May 28, 2023

God-damnit-all commented May 28, 2023 •

edited

Loading

gabriel-samfira commented May 29, 2023

God-damnit-all commented May 29, 2023

gabriel-samfira commented May 29, 2023

[Question] Duplicate key support coming? #64

[Question] Duplicate key support coming? #64

Comments

God-damnit-all commented Aug 17, 2020

alexanderameye commented May 25, 2023

gabriel-samfira commented May 28, 2023

God-damnit-all commented May 28, 2023

God-damnit-all commented May 28, 2023

God-damnit-all commented May 28, 2023

gabriel-samfira commented May 28, 2023

God-damnit-all commented May 28, 2023 • edited Loading

gabriel-samfira commented May 29, 2023

God-damnit-all commented May 29, 2023

gabriel-samfira commented May 29, 2023

God-damnit-all commented May 28, 2023 •

edited

Loading