-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix discrepancies with the specs #742
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
json_copy = json_ld.copy() | ||
for _, record_set in enumerate(json_copy.get("recordSet", [])): | ||
if record_set["@id"] != record_set["name"]: | ||
record_set["name"] = record_set["@id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is not a requirement of the specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it is not a requirement... but it seems to be the recommended way though, given that it is so for all given examples :)
{"name": "person7", "age": 7} | ||
{"name": "person8", "age": 8} | ||
{"name": "person9", "age": 9} | ||
{"persons/name": "person0", "persons/age": 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was feature to not have the RecordSet ID here - otherwise, it's repeated:
- The user is asking for the
persons
RecordSet - => All fields will start with
persons
Maybe, it's a consequence of my other comment where you set name == @id
for all datasets. It's the case for Hugging Face datasets, but it may also not be the case per the specs.
So it could be good to keep at least one dataset with name != @id
for testing purposes. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I updated all datasets because of the migration script, but good point to keep one different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment in the migration script and restore one dataset with names != ids.
get_column
method ofSource
, we return the node'suuid
if no extract method is specified;data
, we look atfield.id
and notfield.name
to get the expected keys;