Skip to content

Reduce Schema.org jsonld for datasets with many files#12118

Draft
qqmyers wants to merge 1 commit intoIQSS:developfrom
QualitativeDataRepository:shorter_schema.org_export
Draft

Reduce Schema.org jsonld for datasets with many files#12118
qqmyers wants to merge 1 commit intoIQSS:developfrom
QualitativeDataRepository:shorter_schema.org_export

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Jan 27, 2026

What this PR does / why we need it: At QDR, we noticed a dataset with ~10K files was not getting indexed by Google. Investigating, we say that Google was failing to read the schema.org information in the page header, apparently due to it's length.

After finding some guidance on an alternate way to provide file download URLs - https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md#accessing-data-through-a-service-endpoint - I went ahead and implemented that approach. It essentially replaces writing a json object per file with one object and a list of fileIds which is significantly shorter.

Subsequent testing at QDR shows that this was effective in allowing Google to index that dataset again.

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer: posting as a draft in case people want to go forward with it while I'm out. Probably needs a release note about the new jvmoption and doc updates for it. It should probably go in the new https://guides.dataverse.org/en/latest/admin/big-data-administration.html as well.

Suggestions on how to test this: Should be able to set the flag to a low number to trigger the new format with just a few files and could then use Google's tools to verify it can parse the output (I forget which tool/the URL for it).

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@coveralls
Copy link

Coverage Status

coverage: 24.333% (-0.001%) from 24.334%
when pulling 0adb604 on QualitativeDataRepository:shorter_schema.org_export
into 9d613f8 on IQSS:develop.

@pdurbin
Copy link
Member

pdurbin commented Jan 28, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants