-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.0 as a string should be a valid version for a dataset #609
Comments
Note the long comment about the perils of doing this. Let's hope to get some attention on our appeal to allow 1.0 to be a valid version for a dataset: mlcommons/croissant#609
Over at gdcc/dataverse-exporters@3e95c97 I pushed a commit to simply append ".0" to the version of Dataverse datasets, but I'm pretty grumpy about it because it's really suboptimal for us. Here's the comment I left in the code:
|
Today I played around with where to put the real version of a Dataverse dataset (e.g. "1.0") but I can't find a good place without triggering schema.org violations. Again, users need to real version number if they want to use Dataverse APIs successfully, as described in my comment above. Right now my Croissant implementation appends an extra ".0" but I'd rather just put "1.0" and be able to configure the Croissant validator to ignore the warning, so I just opened this related issue: |
@ccl-core Can you please take a look at this issue? |
Hi @pdurbin , thank you for reporting this issue! Just to clarify, The mlcroissant library takes already care of the version casting, so you should be able to use your croissant file with Unless your issue is about the warning itself: do you think that the warning message is wrong or incomplete? |
@ccl-core thanks for getting back to me. First, in terms of validity, the spec implies that ANY string is valid, right? "foobar" and "supercalifragilisticexpialidocious" are both valid versions. This is because version in schema.org can be "Number" or "Text". The table below summarizes my understanding of the how the validator and the spec treat the following values. (I put strings in quotes and left quotes off numbers.) Current state
This feels off to me. I think a step in the right direction would be to allow the validator to not throw a warning for "1.0" (as a string), just like how it doesn't throw a warning for 1 or 1.0 (as numbers). Further, perhaps the spec should be updated to mention that 1 and 1.0 (as numbers) get special treatment (at least by the validator). And perhaps "1.0" (as a string) would get the same special treatment in the future. Here's an updated table: Option 1: treat "1.0" like 1.0
If you're thinking, "just use 1.0 as a number instead of a string", the reason I'm reluctant to do this is that if Dataverse ever does switch from MAJOR.MINOR to MAJOR.MINOR.PATCH (as Croissant recommends) it would mean switching the type from number to string, which would be a breaking change for our users. I'd rather use strings in our Croissant export from the start so we can more easily switch from "1.0" to "1.0.0" in the future. I think we all know that in the real world, versions are usually strings, not numbers. SemVer is a good example. Option 2: pass a flag to ignore version warningsAnother perfectly valid option could be to allow people like me who don't want to see the warning about version to pass a flag to the validator to suppress the "Version doesn't follow MAJOR.MINOR.PATCH" warning. I'm happy to see you're working on this (thanks!): I imagine it will resolve the issue I opened: With this option, I still think it would be good to update the spec to explain that 1 and 1.0 are given special treatment (no warnings shown):
I hope this helps! Thanks again! |
Drive-by comment: I dislike the fact that versions can be numbers... As numbers, there is no difference between 1 and 1.0, even though we may consider them to be different values in terms of versions. For the next version of Croissant, I wonder if we should constrain version to always be a string and recommend that it follows one of the patterns: "x", "x.y" or "x.y.z", where x y and z are integers, and maybe allow other strings but issue a (suppressible) warning if it doesn't follow one of these patterns. What do you all think? |
@benjelloun yeah, it sounds like we're in agreement that versions should be strings, not numbers. In the schema.org world, I'm not familiar with how best to constrain a field like version that allows both Text and Number. I suppose we could have If we want to stick with SemVer, please keep in mind that it's not just integers and dots (periods). You'll see examples like "1.0.0-alpha" and "1.0.0+21AF26D3" in the spec (under "MAY" for pre-release and build metadata). |
We can constrain sc:version by changing its documentation in our spec, and validate the field for datasets that specify that they comply with Croissant 1.0. Indeed, for SemVer we would need some regexp like https://semver.org/#is-there-a-suggested-regular-expression-regex-to-check-a-semver-string |
Remove warning when version doesn't conform to MAJOR.MINOR.PATCH Ref. ##609
Dataverse uses MAJOR.MINOR for dataset versions like 1.0, 1.1, 2.0. I believe this should be valid. I'm aware that the validator wants MAJOR.MINOR.PATCH (output below) and that under the Version section, the Croissant 1.0 spec says the same (Semantic Versioning 2.0.0).
I'm also aware that 1.0 as number instead of a string is considered valid (thanks @goeffthomas for pointing this out), but I don't see this in the spec and I think versions should be strings (1.0.0 isn't a number anyway). I see from #593 and #594 that just 1 as a number is also valid but this isn't a solution for Dataverse.
Below is the warning I see as of 1.0.3 from the validator when testing my work-in-progress Croissant file (also below). The relevant line in the JSON is, of course, the following:
So! Can we consider 1.0 valid?
Thanks for your consideration!
The text was updated successfully, but these errors were encountered: