Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some errors in layered or faceted specs raise the wrong error message #2915

Closed
joelostblom opened this issue Feb 22, 2023 · 10 comments · Fixed by #3009
Closed

Some errors in layered or faceted specs raise the wrong error message #2915

joelostblom opened this issue Feb 22, 2023 · 10 comments · Fixed by #3009
Assignees
Labels

Comments

@joelostblom
Copy link
Contributor

A spec that is not layered will return to correct error in this case:

alt.Chart().mark_point().encode(tooltip=[{'wrong'}])
SchemaValidationError: '[{'field': {'wrong'}}]' is an invalid value for `tooltip`:

[{'field': {'wrong'}}] is not of type 'object'
[{'field': {'wrong'}}] is not of type 'object'
{'wrong'} is not valid under any of the given schemas
[{'field': {'wrong'}}] is not of type 'null'

However when the same error occurs within a layered spec, then the return error points in the wrong direction:

alt.layer(
    alt.Chart().mark_point().encode(tooltip=[{'wrong'}])
)
SchemaValidationError: `LayerChart` has no parameter named 'mark'

Existing parameter names are:
layer        data          height    projection   usermeta   
autosize     datasets      name      resolve      view       
background   description   padding   title        width      
config       encoding      params    transform               

See the help for `LayerChart` to read the full description of these parameters

Same with a facet (but concatenation works fine)the:

alt.Chart(pd.DataFrame({'a': [1]})).mark_point().encode(tooltip=[{'wrong'}]).facet()
SchemaValidationError: `Spec` has no parameter named 'mark'

Existing parameter names are:
args   

See the help for `Spec` to read the full description of these parameters

I don't have time to investigate right now but this might be due to the specific error handling of parameters rather than the new we have retreating errors without deep validation. In other words, we might need a more specific check for when to include the suggested parameters in #2565.

@mattijn
Copy link
Contributor

mattijn commented Feb 23, 2023

The following spec:

import altair as alt
from vega_datasets import data

def make_example(selector):
    cars = data.cars.url

    return alt.Chart(cars).mark_rect().encode(
        x="Cylinders:O",
        y="Origin:N",
        color=alt.condition(selector, 'count()', alt.value('lightgray'))
    ).properties(
        width=300,
        height=180
    ).add_params(
        selector
    )
    
point_mouseover = alt.selection_interval(on='mouseover', toggle=False, empty=False, mark={'cursor':'pointer'})
make_example(point_mouseover)

Gives me the following error message:

File ~/mattijn/altair/altair/vegalite/v5/api.py:399, in selection(type, **kwds)
    396         param_kwds[kwd] = kwds.pop(kwd)
    398 if type == "interval":
--> 399     select = core.IntervalSelectionConfig(type=type, **kwds)
    400 elif type == "point":
    401     select = core.PointSelectionConfig(type=type, **kwds)
...
type        mark      translate   
clear       on        zoom        
encodings   resolve               

See the help for `IntervalSelectionConfig` to read the full description of these parameters

It seems I'm missing the first part.

@joelostblom
Copy link
Contributor Author

Hmm is that a VS code truncation @mattijn ? I see the full message in JupyterLab:

image

@mattijn
Copy link
Contributor

mattijn commented Feb 23, 2023

Hm, seems so indeed. Full error message in VSCode is:

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?48b105de-e52e-4ed1-9014-a01b04a44b6a)
---------------------------------------------------------------------------
SchemaValidationError                     Traceback (most recent call last)
/Users/mattijnvanhoek/binste/altair/Untitled.ipynb Cell 1 in <cell line: 18>()
      [5](vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=4)     cars = data.cars.url
      [7](vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=6)     return alt.Chart(cars).mark_rect().encode(
      [8](vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=7)         x="Cylinders:O",
      [9](vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=8)         y="Origin:N",
   (...)
     [15](vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=14)         selector
     [16](vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=15)     )
---> [18](vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=17) point_mouseover = alt.selection_interval(on='mouseover', toggle=False, empty=False, mark={'cursor':'pointer'})
     [19](vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=18) make_example(point_mouseover)

File ~/binste/altair/altair/vegalite/v5/api.py:419, in selection_interval(**kwargs)
    416 @utils.use_signature(core.IntervalSelectionConfig)
    417 def selection_interval(**kwargs):
    418     """Create a selection parameter with type='interval'"""
--> 419     return selection(type="interval", **kwargs)

File ~/binste/altair/altair/vegalite/v5/api.py:400, in selection(type, **kwds)
    397         param_kwds[kwd] = kwds.pop(kwd)
    399 if type == "interval":
--> 400     select = core.IntervalSelectionConfig(type=type, **kwds)
    401 elif type == "point":
    402     select = core.PointSelectionConfig(type=type, **kwds)
...
type        mark      translate   
clear       on        zoom        
encodings   resolve               

See the help for `IntervalSelectionConfig` to read the full description of these parameters

And when I click the full error message it shows me:

{
	"name": "SchemaValidationError",
	"message": "`IntervalSelectionConfig` has no parameter named 'toggle'\n\nExisting parameter names are:\ntype        mark      translate   \nclear       on        zoom        \nencodings   resolve               \n\nSee the help for `IntervalSelectionConfig` to read the full description of these parameters",
	"stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mSchemaValidationError\u001b[0m                     Traceback (most recent call last)\n\u001b[1;32m/Users/mattijnvanhoek/binste/altair/Untitled.ipynb Cell 1\u001b[0m in \u001b[0;36m<cell line: 18>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=4'>5</a>\u001b[0m     cars \u001b[39m=\u001b[39m data\u001b[39m.\u001b[39mcars\u001b[39m.\u001b[39murl\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=6'>7</a>\u001b[0m     \u001b[39mreturn\u001b[39;00m alt\u001b[39m.\u001b[39mChart(cars)\u001b[39m.\u001b[39mmark_rect()\u001b[39m.\u001b[39mencode(\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=7'>8</a>\u001b[0m         x\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mCylinders:O\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=8'>9</a>\u001b[0m         y\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mOrigin:N\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=14'>15</a>\u001b[0m         selector\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=15'>16</a>\u001b[0m     )\n\u001b[0;32m---> <a href='vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=17'>18</a>\u001b[0m point_mouseover \u001b[39m=\u001b[39m alt\u001b[39m.\u001b[39;49mselection_interval(on\u001b[39m=\u001b[39;49m\u001b[39m'\u001b[39;49m\u001b[39mmouseover\u001b[39;49m\u001b[39m'\u001b[39;49m, toggle\u001b[39m=\u001b[39;49m\u001b[39mFalse\u001b[39;49;00m, empty\u001b[39m=\u001b[39;49m\u001b[39mFalse\u001b[39;49;00m, mark\u001b[39m=\u001b[39;49m{\u001b[39m'\u001b[39;49m\u001b[39mcursor\u001b[39;49m\u001b[39m'\u001b[39;49m:\u001b[39m'\u001b[39;49m\u001b[39mpointer\u001b[39;49m\u001b[39m'\u001b[39;49m})\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/mattijnvanhoek/binste/altair/Untitled.ipynb#W0sZmlsZQ%3D%3D?line=18'>19</a>\u001b[0m make_example(point_mouseover)\n\nFile \u001b[0;32m~/binste/altair/altair/vegalite/v5/api.py:419\u001b[0m, in \u001b[0;36mselection_interval\u001b[0;34m(**kwargs)\u001b[0m\n\u001b[1;32m    416\u001b[0m \u001b[39m@utils\u001b[39m\u001b[39m.\u001b[39muse_signature(core\u001b[39m.\u001b[39mIntervalSelectionConfig)\n\u001b[1;32m    417\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mselection_interval\u001b[39m(\u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs):\n\u001b[1;32m    418\u001b[0m     \u001b[39m\"\"\"Create a selection parameter with type='interval'\"\"\"\u001b[39;00m\n\u001b[0;32m--> 419\u001b[0m     \u001b[39mreturn\u001b[39;00m selection(\u001b[39mtype\u001b[39;49m\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39minterval\u001b[39;49m\u001b[39m\"\u001b[39;49m, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\nFile \u001b[0;32m~/binste/altair/altair/vegalite/v5/api.py:400\u001b[0m, in \u001b[0;36mselection\u001b[0;34m(type, **kwds)\u001b[0m\n\u001b[1;32m    397\u001b[0m         param_kwds[kwd] \u001b[39m=\u001b[39m kwds\u001b[39m.\u001b[39mpop(kwd)\n\u001b[1;32m    399\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mtype\u001b[39m \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39minterval\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[0;32m--> 400\u001b[0m     select \u001b[39m=\u001b[39m core\u001b[39m.\u001b[39;49mIntervalSelectionConfig(\u001b[39mtype\u001b[39;49m\u001b[39m=\u001b[39;49m\u001b[39mtype\u001b[39;49m, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwds)\n\u001b[1;32m    401\u001b[0m \u001b[39melif\u001b[39;00m \u001b[39mtype\u001b[39m \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mpoint\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[1;32m    402\u001b[0m     select \u001b[39m=\u001b[39m core\u001b[39m.\u001b[39mPointSelectionConfig(\u001b[39mtype\u001b[39m\u001b[39m=\u001b[39m\u001b[39mtype\u001b[39m, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwds)\n\nFile \u001b[0;32m~/binste/altair/altair/vegalite/v5/schema/core.py:6789\u001b[0m, in \u001b[0;36mIntervalSelectionConfig.__init__\u001b[0;34m(self, type, clear, encodings, mark, on, resolve, translate, zoom, **kwds)\u001b[0m\n\u001b[1;32m   6787\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__init__\u001b[39m(\u001b[39mself\u001b[39m, \u001b[39mtype\u001b[39m\u001b[39m=\u001b[39mUndefined, clear\u001b[39m=\u001b[39mUndefined, encodings\u001b[39m=\u001b[39mUndefined, mark\u001b[39m=\u001b[39mUndefined,\n\u001b[1;32m   6788\u001b[0m              on\u001b[39m=\u001b[39mUndefined, resolve\u001b[39m=\u001b[39mUndefined, translate\u001b[39m=\u001b[39mUndefined, zoom\u001b[39m=\u001b[39mUndefined, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwds):\n\u001b[0;32m-> 6789\u001b[0m     \u001b[39msuper\u001b[39;49m(IntervalSelectionConfig, \u001b[39mself\u001b[39;49m)\u001b[39m.\u001b[39;49m\u001b[39m__init__\u001b[39;49m(\u001b[39mtype\u001b[39;49m\u001b[39m=\u001b[39;49m\u001b[39mtype\u001b[39;49m, clear\u001b[39m=\u001b[39;49mclear, encodings\u001b[39m=\u001b[39;49mencodings,\n\u001b[1;32m   6790\u001b[0m                                                   mark\u001b[39m=\u001b[39;49mmark, on\u001b[39m=\u001b[39;49mon, resolve\u001b[39m=\u001b[39;49mresolve,\n\u001b[1;32m   6791\u001b[0m                                                   translate\u001b[39m=\u001b[39;49mtranslate, zoom\u001b[39m=\u001b[39;49mzoom, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwds)\n\nFile \u001b[0;32m~/binste/altair/altair/utils/schemapi.py:334\u001b[0m, in \u001b[0;36mSchemaBase.__init__\u001b[0;34m(self, *args, **kwds)\u001b[0m\n\u001b[1;32m    331\u001b[0m \u001b[39mobject\u001b[39m\u001b[39m.\u001b[39m\u001b[39m__setattr__\u001b[39m(\u001b[39mself\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39m_kwds\u001b[39m\u001b[39m\"\u001b[39m, kwds)\n\u001b[1;32m    333\u001b[0m \u001b[39mif\u001b[39;00m DEBUG_MODE \u001b[39mand\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_class_is_valid_at_instantiation:\n\u001b[0;32m--> 334\u001b[0m     \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mto_dict(validate\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m)\n\nFile \u001b[0;32m~/binste/altair/altair/utils/schemapi.py:519\u001b[0m, in \u001b[0;36mSchemaBase.to_dict\u001b[0;34m(self, validate, ignore, context)\u001b[0m\n\u001b[1;32m    517\u001b[0m         \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mvalidate(result)\n\u001b[1;32m    518\u001b[0m     \u001b[39mexcept\u001b[39;00m jsonschema\u001b[39m.\u001b[39mValidationError \u001b[39mas\u001b[39;00m err:\n\u001b[0;32m--> 519\u001b[0m         \u001b[39mraise\u001b[39;00m SchemaValidationError(\u001b[39mself\u001b[39m, err)\n\u001b[1;32m    520\u001b[0m \u001b[39mreturn\u001b[39;00m result\n\n\u001b[0;31mSchemaValidationError\u001b[0m: `IntervalSelectionConfig` has no parameter named 'toggle'\n\nExisting parameter names are:\ntype        mark      translate   \nclear       on        zoom        \nencodings   resolve               \n\nSee the help for `IntervalSelectionConfig` to read the full description of these parameters"
}

Not really readable either.

Strange that it truncate the error message from the start and not the end, but keeps the traceback complete. Wish it reduced the traceback and showed the full error message instead..

@mattijn
Copy link
Contributor

mattijn commented Feb 23, 2023

Related: microsoft/vscode-jupyter#7096

@joelostblom
Copy link
Contributor Author

joelostblom commented Mar 20, 2023

I explored this a bit further and it does not seem to be caused by #2565, since removing the changes introduced in that PR yields the following error messages which are still unhelpful:

Layer:

SchemaValidationError: '{'data': {'name': 'empty'}, 'mark': {'type': 'point'}, 'encoding': {'tooltip': [{'field': {'wrong'}}]}}' is an invalid value for `0`:

Additional properties are not allowed ('mark' was unexpected)

Facet:

SchemaValidationError: '{'mark': {'type': 'point'}, 'encoding': {'tooltip': [{'field': {'wrong'}}]}}' is an invalid value for `spec`:

Additional properties are not allowed ('mark' was unexpected)

What seems to be happening is that the wrong error is detected in the error hierarchy introduced in #2842. Specifically the check if lowest_level.validator == "additionalProperties" in https://github.com/binste/altair/blob/d1729b9f809528d76348888f9145bee35d15d40d/altair/utils/schemapi.py#L99 might need to be changed for layered and faceted specs. When I try to remove this check, the correct error does indeed show up for both the layered and the faceted specs (second row of errors below):

Layer

SchemaValidationError: '{'data': {'name': 'empty'}, 'mark': {'type': 'point'}, 'encoding': {'tooltip': [{'field': {'wrong'}}]}}' is an invalid value for `0`:

Additional properties are not allowed ('mark' was unexpected)
[{'field': {'wrong'}}] is not valid under any of the given schemas
'layer' is a required property

Facet

SchemaValidationError: '{'mark': {'type': 'point'}, 'encoding': {'tooltip': [{'field': {'wrong'}}]}}' is an invalid value for `spec`:

Additional properties are not allowed ('mark' was unexpected)
[{'field': {'wrong'}}] is not valid under any of the given schemas
'layer' is a required property

We might still need to do some additional changes since the error above [{'field': {'wrong'}}] is not valid under any of the given schemas is not as informative as the full error message for the same issue in a non-layered/non-faceted spec as in the example in my first comment above:

SchemaValidationError: '[{'field': {'wrong'}}]' is an invalid value for `tooltip`:

[{'field': {'wrong'}}] is not of type 'object'
{'wrong'} is not valid under any of the given schemas
[{'field': {'wrong'}}] is not of type 'null'

cc @binste here since you have the most knowledge about the hierarchical error reporting and might have an idea what could be done here.

@binste
Copy link
Contributor

binste commented Mar 20, 2023

As a side note, I just discovered the Atlassian json schema viewer and I find it very useful to explore the VL schema, see the links below.

The layers property in TopLevelLayerSpec needs to match either LayerSpec or GenericUnitSpec<Encoding,AnyMark>. The Additional properties are not allowed ('mark' was unexpected) comes from LayerSpec, the more helpful error on the tooltip from the other spec. Both specifications are not valid. Here, it's sort of clear that the user wants to match the second spec but I don't think we can build a logic to guess this during validation so we should show both errors. When building it, I wasn't aware of this case, I only tested against examples such as:

import altair as alt
from vega_datasets import data

(
    alt.Chart(data.barley())
    .mark_bar()
    .encode(
        x=alt.X("variety", unknown=2),
        y=alt.Y("sum(yield)", stack="asdf"),
    )
)

Which gives the following errors at the lowest level:

Additional properties are not allowed ('unknown' was unexpected)
Additional properties are not allowed ('field', 'unknown' were unexpected)
Additional properties are not allowed ('field', 'type', 'unknown' were unexpected)
'value' is a required property

which led me to implement that it only selects the first error. So I agree with you @joelostblom that we need to change https://github.com/binste/altair/blob/d1729b9f809528d76348888f9145bee35d15d40d/altair/utils/schemapi.py#L99. We also need to change https://github.com/binste/altair/blob/d1729b9f809528d76348888f9145bee35d15d40d/altair/utils/schemapi.py#L272 as it again selects only the first error if it's an additionalProperties error as it ignores self._additional_errors. If I change L99 to never evaluate as true but keep L272 then I get:

SchemaValidationError: `LayerChart` has no parameter named 'mark'

Existing parameter names are:
layer        data          height    projection   usermeta   
autosize     datasets      name      resolve      view       
background   description   padding   title        width      
config       encoding      params    transform               

See the help for `LayerChart` to read the full description of these parameters

What we could try is to check if the errors are from validating the same "object"/"instance". This is information is available in the instance attribute of an error:

  • For the ('unknown' was unexpected) example further above, all 4 errors are coming from evaluating the same instance {'field': 'variety', 'type': 'nominal', 'unknown': 2} -> We could restrict ourselves to showing just the first error with the formatted table. This is the current implementation
  • For the tooltip example, the instances differ:
    • {'data': {'name': 'empty'}, 'mark': {'type': 'point'}, 'encoding': {'tooltip': [{'field': {'wrong'}}]}} <- comes from validating against LayerSpec
    • [{'field': {'wrong'}}]
    • -> Here maybe we don't want to show the table as the error might not be about that additional property but instead just the two error messages.

This would entail building in this logic in _get_most_relevant_errors. It should probably select one error per instance. When crafting the message in SchemaValidationError check if there are any self._additional_errors and if yes, don't show the table but show the messages (which are then one message per instance), else if its an additionalProperties error show the messages.

What do you think? I can submit a PR if you agree.


Update: Instead of .instance, we could group errors with .json_path which gives a JSONPath to instance as a string. Maybe a bit faster to compare. Values are then '$.layer[0]' and '$.layer[0].encoding.tooltip' for the tooltip example.

@binste
Copy link
Contributor

binste commented Mar 20, 2023

On that note, what do you think about start to assign issues so it's easier to keep track of who works on what?

@joelostblom
Copy link
Contributor Author

Sounds like a great idea to assign issues, just leave a comment on the ones you want to work on and we can assign you (and it probably makes sense to also increase you repo permissions at some point if you would be interested in that, but I don't think either Mattijn or I have the correct permissions for that at the moment).


Regarding the issue here, I think your solution sounds like a feasible one as long as the table is still shown in all/most of the cases when it is useful. At the same time, it also sounds intriguing to me to always validate against "the most useful spec". I saw that you wrote it will be hard to write a logic to identify which spec that actually is and it makes me wonder if we could have some logic along these lines (probably overly simplistic):

  1. Evaluate every individual spec in a layered spec against GenericUnitSpec.
  2. If no error is raised in the previous step, evaluate the entire spec against LayerSpec and GenericUnitSpec as per your approach.

(I am not sure how this would work for facets, since there are no individual specs to evaluate there.)

I also wonder if what some examples are when LayerSpec is the most useful one to validate against? Is that when data is specified at the top level or other layer-specific syntax? And do you have a good understand for why the specs can be evaluated against two schema instead of there being just one (I don't)?

And thanks for linking to that JSON schema viewer, very helpful indeed!

@binste
Copy link
Contributor

binste commented Mar 26, 2023

Sure I'd be open to that in case you have the relevant permissions. We can also have a call beforehand to get to know each other a bit better in case you'd prefer that before granting any rights.


Actually, the specs are already evaluated against just one schema. In the layer example above the Altair class is api.LayerChart which inherits from core.TopLevelLayerSpec which references the definition '#/definitions/TopLevelLayerSpec' in the VL schema. Therefore, a specification generated from api.LayerChart.to_dict is evaluated against the '#/definitions/TopLevelLayerSpec' definition in the VL schema. In that definition, there exists a layer property which is an array and each value in that array needs to match either (i.e. anyOf) LayerSpec or GenericUnitSpec so that the validation for the whole TopLevelLayerSpec passes. In Python, the layer attribute is api.LayerChart.layer. So the good thing is that we can stick to the current approach of validating specifications as it returns all “errors” where a specification does not match the schema. However, whenever a part of the specification can match one of multiple definitions, i.e. it’s of type anyOf, then we can’t know which definition the user wanted to achieve. Hope this makes it a bit clearer.


I think LayerSpec is used if one layer of the chart is again a layered chart itself. For example https://vega.github.io/vega-lite/examples/layer_bar_annotations.html is a chart with 2 layers which are again layered charts with each having 2 layers. Translated to Altair it looks like this (I might clean this up and add it to the gallery):

import altair as alt
import pandas as pd

source = pd.DataFrame([
          {"Day": 1, "Value": 54.8},
          {"Day": 2, "Value": 112.1},
          {"Day": 3, "Value": 63.6},
          {"Day": 4, "Value": 37.6},
          {"Day": 5, "Value": 79.7},
          {"Day": 6, "Value": 137.9},
          {"Day": 7, "Value": 120.1},
          {"Day": 8, "Value": 103.3},
          {"Day": 9, "Value": 394.8},
          {"Day": 10, "Value": 199.5},
          {"Day": 11, "Value": 72.3},
          {"Day": 12, "Value": 51.1},
          {"Day": 13, "Value": 112.0},
          {"Day": 14, "Value": 174.5},
          {"Day": 15, "Value": 130.5}
        ])

blue_bars = (
    alt.Chart(source)
    .encode(alt.X("Day:O").axis(labelAngle=0), alt.Y("Value:Q"))
    .mark_bar()
)
red_bars = (
    alt.Chart(source)
    .transform_filter(alt.datum.Value >= 300)
    .transform_calculate(as_="baseline", calculate="300")
    .encode(
        alt.X("Day:O").axis(labelAngle=0),
        alt.Y("baseline:Q").title("PM2.5 Value"),
        alt.Y2("Value:Q"),
        color=alt.value("#e45755"),
    )
    .mark_bar()
)

bars = blue_bars + red_bars

base = alt.Chart().encode(y=alt.datum(300))

rule = base.mark_rule()
text = base.mark_text(
    align="right", baseline="bottom", dx=-2, dy=-2, x="width", text="hazardous"
)

rule_text = rule + text

chart = bars + rule_text

chart.layer looks like this:

[alt.Chart(...), alt.Chart(...), alt.LayerChart(...)]

So it's not quite the same as the VL example but the last element is a LayerChart which should match LayerSpec. The same could also be written as chart_2 = blue_bars + red_bars + rule + text where chart_2.layer shows that it's now a layered chart with 4 layers: [alt.Chart(...), alt.Chart(...), alt.Chart(...), alt.Chart(...)]. So not sure when the first version is ever beneficial but it can happen and we should account for it in the error handling.

@joelostblom
Copy link
Contributor Author

...So the good thing is that we can stick to the current approach of validating specifications as it returns all “errors” where a specification does not match the schema. However, whenever a part of the specification can match one of multiple definitions, i.e. it’s of type anyOf, then we can’t know which definition the user wanted to achieve.

Ah ok that makes sense, thanks for explaining it in more detail!

I think LayerSpec is used if one layer of the chart is again a layered chart itself.

I see, when I first read was going to suggest that maybe we could flatten all nested layered charts, but it seems like the error are equally unhelpful for regular layered charts and nested ones.

Going back to the example we discussed above, it seems like the less specific error message is longer and contains the more specific error message:

# Less specific
'{'mark': {'type': 'point'}, 'encoding': {'tooltip': [{'field': {'wrong'}}]}}'

# More specific
[{'field': {'wrong'}}]

Do you think it would be safe to assume that if the shorter error message is contained within the longer one, then it would be more helpful to show the shorter message as it would point more directly towards the cause of the error? If this is true, maybe we could check this first, and then as a fallback for the cases when the shorter message is not contained with the longer message, follow the approach you suggested earlier?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants