Reduce Schema.org jsonld for datasets with many files by qqmyers · Pull Request #12118 · IQSS/dataverse

qqmyers · 2026-01-27T17:39:47Z

What this PR does / why we need it: At QDR, we noticed a dataset with ~10K files was not getting indexed by Google. Investigating, we say that Google was failing to read the schema.org information in the page header, apparently due to it's length.

After finding some guidance on an alternate way to provide file download URLs - https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md#accessing-data-through-a-service-endpoint - I went ahead and implemented that approach. It essentially replaces writing a json object per file with one object and a list of fileIds which is significantly shorter.

Subsequent testing at QDR shows that this was effective in allowing Google to index that dataset again.

Which issue(s) this PR closes:

Closes #

Special notes for your reviewer: posting as a draft in case people want to go forward with it while I'm out. Probably needs a release note about the new jvmoption and doc updates for it. It should probably go in the new https://guides.dataverse.org/en/latest/admin/big-data-administration.html as well.

Suggestions on how to test this: Should be able to set the flag to a low number to trigger the new format with just a few files and could then use Google's tools to verify it can parse the output (I forget which tool/the URL for it).

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

coveralls · 2026-01-27T17:54:05Z

coverage: 24.333% (-0.001%) from 24.334%
when pulling 0adb604 on QualitativeDataRepository:shorter_schema.org_export
into 9d613f8 on IQSS:develop.

pdurbin · 2026-01-28T19:20:33Z

guidance on large Croissant files, especially in <head> mlcommons/croissant#646

try potential action

0adb604

qqmyers added this to IQSS Dataverse Project Jan 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Schema.org jsonld for datasets with many files#12118

Reduce Schema.org jsonld for datasets with many files#12118
qqmyers wants to merge 1 commit intoIQSS:developfrom
QualitativeDataRepository:shorter_schema.org_export

qqmyers commented Jan 27, 2026

Uh oh!

coveralls commented Jan 27, 2026

Uh oh!

pdurbin commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qqmyers commented Jan 27, 2026

Uh oh!

coveralls commented Jan 27, 2026

Uh oh!

pdurbin commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants