Skip to content

Conversation

@msdemlei
Copy link
Collaborator

As proposed in the IVOA Note "Towards Blind Discovery 2" http://ivoa.net/documents/Notes/colstatnote/, this PR adds basic statistical information to VOSI tablesets. The proposed new mechanism can already be seen in action at the GAVO Data Centre, e.g., through its Registry interface at http://dc.g-vo.org/oai.xml.

Also, updating samples/catalog.xml to exercise stats (and fixing its nrows).
@mbtaylor
Copy link
Member

Is there a reason why the field value for the <option> element is represented as its content rather than as an attribute? Since it's a single structureless token I feel like it would be cleaner, as well as more semantically transparent, to define it as an attribute. That would also align better with the VOTable OPTION element, which I think is partly where this comes from. And it would (not necessarily a good reason) make it easier for my code to parse.

To be clear, I'm suggesting

   <option freq="0.0247" value="K0"/>

instead of

   <option freq="0.0247">K0</option>

unless there's a good reason why not.

@mbtaylor
Copy link
Member

oh, maybe that's a dumb comment - I suppose as it stands it's aligned with the other stats children and the style of the rest of the schema in general. So you can ignore what I just said unless for some reason you find yourself agreeing with it.

@msdemlei
Copy link
Collaborator Author

msdemlei commented Jan 29, 2026 via email

@mbtaylor
Copy link
Member

Overall, I think this looks good.

Only a couple things I think should get clarified:

  1. It doesn't say anywhere whether the max/min/quantile etc statistics are supposed to be reliable/exact, or whether it's OK to provide a best-efforts value. I think the latter would be reasonable, but if so that should be made explicit.

  2. From the XSD and usage at GAVO DC I see that the freq attribute of option is optional, but from reading the text I had the idea that it was required. There should be a note that options can exist without associated freq values, and what that means.

I have implemented client code in topcat that consumes this new statistical metadata as provided from a TAP /tables endpoint, and displays it in the Table and Columns tabs of the TAP window. That works OK, except that option values if present end up on one long line and so if there are many of them it will be hard for the user to see them all.

The prototype stats-capable topcat is currently available at https://www.star.bristol.ac.uk/mbt/releases/topcat/pre/topcat-full_colstats.jar. The taplint from this version (topcat -stilts taplint) can also be used to validate TAP services providing this metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants