JSON Mapping of Morphir tagged unions #135

AttilaMihaly · 2023-01-30T13:29:04Z

AttilaMihaly
Jan 30, 2023
Maintainer

In one of our recent community calls there was a request to review how we map Morphir types to JSON. I'll lay out the current and proposed serialization and open it up for discussion.

Current Approach

Morphir defines a JSON mapping standard that we use consistently across all of our tooling (the IR, tests, backends, ...). Most of the mapping is straightforward and follows what most people would do if they manually mapped their types to JSON. The one exception is tagged unions (referred to as custom types in Elm, sum types in other contexts). Given the following sample type and values:

type Foo
    = Bar Int Bool
    | Baz String

sample1 = Bar 42 True

sample2 = Baz "Hello"

The JSON serialization would be:

["Bar", 42, true]

["Baz", "Hello"]

It's simply an array where the first element is the name of the constructor (tag) the rest are the arguments.

Proposed Approach

The mapping above aligns nicely with most FP languages but a more frequently used representation of the same construct is through inheritance (I'll use Scala syntax for verbosity):

sealed trait Foo
case class Bar(meaningOfLife: Int, isThatRight: Bool) extends Foo
case class Baz(sayHi: String) extends Foo

Which is usually mapped to JSON as:

{ "kind": "Bar", "meaningOfLife": 42, "isThatRight": true }

{ "kind": "Baz", "sayHi": "Hello" }

This is much closer to what most people in mainstream languages would choose when mapping an "or" relationship to JSON.

Comparison

I'll attempt to collect pros and cons but please feel free to extend in the comments as I'm sure I'll miss some. Let's just list out pros and cons for the proposed solution to make things simpler. Just flip them to get the pros/cons for the current solution.

Pros

More readable for a human
Has native support in many existing JSON tools
- The caveat here is that you still need to pick a name for the "kind" field since that's not standard

Cons

JSON size is much larger due to field names
Read performance is impacted by the need to look up the "kind" field before you can process the rest of the object. This can be an issue because fields can appear in any order in an object.

What do you think?

stephengoldbaum · 2023-01-30T21:03:27Z

stephengoldbaum
Jan 30, 2023
Maintainer

In order to properly weigh the pros and cons, we need the proposed alternate format. That will dictate what other tools would be able to work with it, which would give us a better perspective on the benefits.

1 reply

AttilaMihaly Jan 31, 2023
Maintainer Author

I included sample snippets for the proposed format. Did you mean something more specific?

{ "kind": "Bar", "meaningOfLife": 42, "isThatRight": true }

{ "kind": "Baz", "sayHi": "Hello" }

DamianReeves · 2023-01-31T17:15:26Z

DamianReeves
Jan 31, 2023
Maintainer

So I've been thinking through alternatives and the con here and one possible alternative to both solutions above is the following:

For the same structure as above:

sealed trait Foo
case class Bar(meaningOfLife: Int, isThatRight: Bool) extends Foo
case class Baz(sayHi: String) extends Foo

Encode it as the following JSON:

["Bar", {"meaningOfLife": 42, "isThatRight":true}]

["Baz", {"sayHi":"Hello"}

A con of this approach is it leaves ambiguity when encoding certain classes of custom types in Elm.

type Foo
    = Bar Int Bool
    | Baz String
    | Biz { label: String, count: Int }

sample1 = Bar 42 True

sample2 = Baz "Hello"

sample3 = Biz {label = "widget", count = 99 }

3 replies

stephengoldbaum Jan 31, 2023
Maintainer

Interesting. We'll need to pick a concrete specification to adequately weigh the trade-offs.

AttilaMihaly Nov 23, 2023
Maintainer Author

I don't think we should care too much about how it maps to Elm. In Morphir every constructor argument has a name, so we could simply say that Bar 42 True in Elm maps to `["Bar", { "arg1": 42, "arg2": true }] which removes the ambiguity.

DamianReeves Nov 28, 2023
Maintainer

I missed this comment in my below discussion. If from the Morphir perspective we always have a name for the argument then I think that "the better approach" is worth ratifying.

I will say that we need to document, specify, and/or recommend a mechanism for naming in scenarios where no name is provided as is the case for Elm (and actually this is possible in F# as well where union case field labels are optional).

DamianReeves · 2023-01-31T19:09:01Z

DamianReeves
Jan 31, 2023
Maintainer

I am also exploring to see if I can come up with a way to represent the Scala AST in a data structure/model which
aligns closer to the Elm representation. This is possible in F# for example but not by default in Scala.

If there is success there, then I may push less on this initiative.

0 replies

AttilaMihaly · 2023-11-23T09:57:37Z

AttilaMihaly
Nov 23, 2023
Maintainer Author

I think it would be important to make a decision here that we can stick to for the foreseeable future. The fundamental difficulty is that there is no standard solution for encoding tagged unions/inheritance in JSON.

The de facto standard

The most frequently used approach is to map every constructor as a JSON object and use a discriminator field to identify which constructor it is. I laid this out in the original proposal:

{ "$type": "Bar", "meaningOfLife": 42, "isThatRight": true }

{ "$type": "Baz", "sayHi": "Hello" }

One issue with this is that you need to pick a discriminator field name. Sometimes, especially in OOP languages, constructors are represented as subclasses so the JSON serialization tools tend to insert a $type field as the discriminator. In other cases, the discriminator is domain specific (such as 'status') or borrowed from type-theory (such as kind, tag, $tag).

So picking one standard field name that will work everywhere out-of-the-box is difficult. At the same time, most serialization libraries allow you to customize the field name so if we pick a name that doesn't align

Another issue with this encoding is performance and modularity of serialization code. Since the discriminator field is part of the same object that contains the specific fields for each constructor as well, and fields don't have a guaranteed ordering, it's possible that you need to track back after finding the discriminator field to read the specific constructor fields.

A better approach

Some serialization libraries (ZIO JSON for example) decided to go with a serialization format that addresses both of the issues mentioned above:

{ "Bar": { "meaningOfLife": 42, "isThatRight": true } }

{ "Baz": { "sayHi": "Hello" } }

There is no performance penalty here since there's always only one field at the top which is always the discriminator field. Also, we don't need to think about field names because the constructor name is the field name.

From a purely technical perspective this is a clearly better approach, but most serialization libraries won't support it out-of-the-box.

This is the fundamental dilemma. At least in my head. Any thoughts?

4 replies

AttilaMihaly Nov 23, 2023
Maintainer Author

We could also decide to not make a decision on this and support both. The only thing we would need to do is to include the discriminator field name in the type definition to make the standard approach work. There's no easy way to do this in Elm but we could do it as a decoration and go with a default if the decoration is not present..

AttilaMihaly Nov 25, 2023
Maintainer Author

@DamianReeves, @stephengoldbaum, what do you think?

DamianReeves Nov 28, 2023
Maintainer

I like the better approach as well.

One possible thing we could consider is to add more flexibility at the expense of added implementation overhead.

If we allowed the encoder/decoder to support both of the following, but choose one as "canonical" or "canonized" we might get the best user experience:

Object encoding:

{ "Bar": { "meaningOfLife": 42, "isThatRight": true } }

{ "Baz": { "sayHi": "Hello" } }

Positional/Array encoding:

{ "Bar": [42, true ]}

{ "Baz": ["Hello" ] }

In general JSON libraries tend to be able to make decisions on the type of JSON element in an efficient manner, so we could in the case of a language like Elm allow for defaulting to the positional syntax.

We could then have a standard conversion that says something like the above can be converted to:

{ "Bar": { "_1": 42, "_2": true } }

{ "Baz": { "_1": "Hello" } }

Or perhaps:

{ "Bar": { "$1": 42, "$2": true } }

{ "Baz": { "$1": "Hello" } }

The exact conversion is not as important as having the conversion clearly defined, much like how the JSON-LD clearly defines the algorithm for expansion, compaction, and flattening.

That being said I am also fine with always using the objects but being explicit in our step in how we derive names for a language like Elm and actually in some cases like F# or OCaml which makes naming the fields optional.

DamianReeves Nov 28, 2023
Maintainer

Please note if this added discussion is deemed to be just extra noise, or not worth it, I am fine with the decision to use the "better approach" encoding, it puts us in a much better place than we've been in, IMHO.

stephengoldbaum · 2023-11-25T22:23:21Z

stephengoldbaum
Nov 25, 2023
Maintainer

I like the approach. It gets my vote.

1 reply

AttilaMihaly Nov 27, 2023
Maintainer Author

Which one? The one I titled "A better approach"?

stephengoldbaum · 2023-11-27T21:54:18Z

stephengoldbaum
Nov 27, 2023
Maintainer

Yes, the better one :)

0 replies

DamianReeves · 2023-11-28T11:46:31Z

DamianReeves
Nov 28, 2023
Maintainer

I am also in favor of "A better approach".

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Mapping of Morphir tagged unions #135

{{title}}

Replies: 7 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

JSON Mapping of Morphir tagged unions #135

AttilaMihaly Jan 30, 2023 Maintainer

Current Approach

Proposed Approach

Comparison

Pros

Cons

Replies: 7 comments · 9 replies

stephengoldbaum Jan 30, 2023 Maintainer

AttilaMihaly Jan 31, 2023 Maintainer Author

DamianReeves Jan 31, 2023 Maintainer

stephengoldbaum Jan 31, 2023 Maintainer

AttilaMihaly Nov 23, 2023 Maintainer Author

DamianReeves Nov 28, 2023 Maintainer

DamianReeves Jan 31, 2023 Maintainer

AttilaMihaly Nov 23, 2023 Maintainer Author

The de facto standard

A better approach

AttilaMihaly Nov 23, 2023 Maintainer Author

AttilaMihaly Nov 25, 2023 Maintainer Author

DamianReeves Nov 28, 2023 Maintainer

DamianReeves Nov 28, 2023 Maintainer

stephengoldbaum Nov 25, 2023 Maintainer

AttilaMihaly Nov 27, 2023 Maintainer Author

stephengoldbaum Nov 27, 2023 Maintainer

DamianReeves Nov 28, 2023 Maintainer

AttilaMihaly
Jan 30, 2023
Maintainer

Replies: 7 comments 9 replies

stephengoldbaum
Jan 30, 2023
Maintainer

AttilaMihaly Jan 31, 2023
Maintainer Author

DamianReeves
Jan 31, 2023
Maintainer

stephengoldbaum Jan 31, 2023
Maintainer

AttilaMihaly Nov 23, 2023
Maintainer Author

DamianReeves Nov 28, 2023
Maintainer

DamianReeves
Jan 31, 2023
Maintainer

AttilaMihaly
Nov 23, 2023
Maintainer Author

AttilaMihaly Nov 23, 2023
Maintainer Author

AttilaMihaly Nov 25, 2023
Maintainer Author

DamianReeves Nov 28, 2023
Maintainer

DamianReeves Nov 28, 2023
Maintainer

stephengoldbaum
Nov 25, 2023
Maintainer

AttilaMihaly Nov 27, 2023
Maintainer Author

stephengoldbaum
Nov 27, 2023
Maintainer

DamianReeves
Nov 28, 2023
Maintainer