Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Root "data" or "datasets" #508

Open
manzt opened this issue Sep 14, 2021 · 7 comments
Open

Root "data" or "datasets" #508

manzt opened this issue Sep 14, 2021 · 7 comments
Labels
D? Difficulty not sure enhancement New feature or request P? Priority needs to be decided

Comments

@manzt
Copy link
Member

manzt commented Sep 14, 2021

Beyond the reactive use-case, it would be interesting to allow for data to be referenced by ID from some root source.

{
	"data": {
		"my-id": { "url": ..., ... },
	},
    "tracks": [
   		{ "data": "my-id", ... },
        { "data": "my-id", ... },
    ],
}

This would reduce a lot of code duplication, and in gos we could "lift" data definitions to the root of the chart. We could then encode a dataframe as json and embed the data in the chart once.

gos.Track(df.gos.json())
@sehilyi
Copy link
Member

sehilyi commented Sep 17, 2021

I see the need for supporting data reference, but I am also a bit concerned about using IDs since keeping track of multiple IDs in the grammar can make writing/maintaining specs more complex.

If we want to support this only for overlaid tracks, which I think would be the major use cases, would it address the issue if one just define the data in the parent and override it in child tracks:

{
	"alignment": "overlay",
	"data": {
		"type": "json", "values": { ... }
	},
	"tracks": [
   		{ ... }, { ... } // use the data defined by the parent
	]
}

This will not allow defining multiple datasets in the parent, but since I expect a single data is used in overlaid tracks in most cases, it might be okay to support only single data?

@manzt
Copy link
Member Author

manzt commented May 19, 2022

I see the need for supporting data reference, but I am also a bit concerned about using IDs since keeping track of multiple IDs in the grammar can make writing/maintaining specs more complex.

I think there is an argument to be made that root "datasets" allow for better re-use of Gosling specifications and easier maintenance. Users can replace a data definition in one place rather than needing to find and replace the same data definition throughout the specification (like using a variable in a programming language).

FWIW, Vega-Lite implements a top-level datasets. From the docs...

Vega-Lite supports a top-level datasets property. This can be useful when the same data should be inlined in different places in the spec. Instead of setting values inline, specify datasets at the top level and then refer to the named datasource in the rest of the spec. datasets is a mapping from name to an inline dataset.

    "datasets": {
      "somedata": [1,2,3]
    },
    "data": {
      "name": "somedata"
    }

This would reduce the size of the specifications and provide the ability to (optionally) identify datasets by a unique key. We could use this identifier to build an API to update track data on-demand (like Vega View API)

@sehilyi
Copy link
Member

sehilyi commented May 23, 2022

I think there is an argument to be made that root "datasets" allow for better re-use of Gosling specifications and easier maintenance. Users can replace a data definition in one place rather than needing to find and replace the same data definition throughout the specification (like using a variable in a programming language).

Agreed.

Do you have a specific use case in mind for re-using json data? I can think of a use case that renders 1D or 2D annotations (i.e., rule marks using JSON data) in the same way across multiple tracks. Perhaps, there are more useful/frequent use cases than this.

I also think extending this reusability functionality to other data specs (reusing data def.) or even beyond the "data" specs (reusing track/view def.) would be an interesting and useful function. (For example, #88)

@sehilyi sehilyi added enhancement New feature or request P? Priority needs to be decided D? Difficulty not sure labels May 23, 2022
@ThHarbig
Copy link
Collaborator

ThHarbig commented May 5, 2023

I ran into the same issue of having to include data sets multiple times.

For example, when having multiple stacked tracks of which two use data set A and two use data set B, I need to add the data to each track separately or define it in the view and overwrite it for one set of tracks.

Another way of reducing the number of data redefinitions could be grouping tracks or in general being more flexible with nesting tracks (related to #884)

e.g.

"tracks":[
{
  "data": {
    }
    "tracks": [
    ]
},
{
  "data": {
    }
    "tracks": [
    ]
} 
]

@manzt
Copy link
Member Author

manzt commented May 6, 2023

Another way of reducing the number of data redefinitions could be grouping tracks or in general being more flexible with nesting tracks (related to #884)

I'm open to this idea. One challenge with Gosling is that it adopted a very "flattened" spec (rather than nested fields), which in some cases removes boilerplate but others makes things difficult. We've discussed adding an encoding field to group encodings in the past as well: gosling-lang/gos#34 (comment)

@sehilyi
Copy link
Member

sehilyi commented May 7, 2023

I am open to making the track nesting more flexible, and it will be beneficial to define shared encodings across tracks as well (#88).

Current Grammar

Track nesting can happen only when overlaid tracks are stacked.

"tracks": [
   { "alignment": "overlay", "tracks": [/* multiple track defs to be overlaid */] },
   { "alignment": "overlay", "tracks": [/* multiple track defs to be overlaid */] },
   { /* track def */ },
]

But, a similar example with stacked tracks is not allowed. I think this restriction makes the spec less consistent and more complicated.

"tracks": [
   { "alignment": "stack", "tracks": [/* multiple track defs to be stacked */] },
   { "alignment": "stack", "tracks": [/* multiple track defs to be stacked */] },
   { /* track def */ },
]

This is due to the following schema, which we will need to update to provide more freedom:

export interface StackedTracks extends CommonViewDef, Partial<SingleTrack> {
alignment?: 'stack';
tracks: (PartialTrack | OverlaidTracks)[];
}

Open Issue

  • In theory, we could provide much more flexibility by allowing to define tracks in multiple levels beyond two levels, but I think two levels (like the example above) already provide sufficient flexibility in defining shared encoding/data.
  • Ideally, we can have a single data object for such shared data spec to save memory space, which we do not support.

@ThHarbig
Copy link
Collaborator

ThHarbig commented May 8, 2023

I agree that beyond two levels is probably not necessary, I think if we encounter more complicated setups it might be more intuitive to have the suggested data ID approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D? Difficulty not sure enhancement New feature or request P? Priority needs to be decided
Projects
None yet
Development

No branches or pull requests

3 participants