Schema Requirement Updates: 14 → 8 Required Fields & UI Alignment #14

jimnoneill · 2026-02-05T23:53:52Z

PR Review: Schema Requirement Updates & Prisma, UI Alignment

Quick summary of current PR & UI plans

14 down to 5 required fields
Strip UI clutter - no alternativeTitle, no affiliation IDs, no publisher field, no alternateIdentifiers
Naming alignment w schema vs Prisma - domain to researchField, posterContent to content, captions structure in Prisma to match schema?
GUI labels don't have to match schema field names - keywords/abstract on the frontend, subjects/descriptions in the API. Document the mapping.
Consideration of creators as a relation table for author-based search
Add more stuff to auto-populate what we can - formats, types, language, publisher, identifiers

1. Required Fields: Reducing from 14 to 5

After reviewing the staging form UX and the schema, proposing we reduce root-level mandatory fields to :

Required	Optional (moved)
`creators`	`identifiers` / DOI (auto-extract if provided)
`titles`	`publisher` (auto-populate as "posters.science")
`publicationYear`	`dates`
`subjects`	`language` (default "en", auto-detect)
`descriptions`	`types` (always "Poster", auto-populate)
	`formats` (auto-detect from upload)
	`rightsList` (default CC-BY-4.0 suggestion)
	`fundingReferences`
	`conference`

Schema PR incoming with updated required array and relaxed minItems constraints on optional arrays.

2. Fields to Remove from Staging UI

alternativeTitle - drop from the form entirely. Schema can retain titleType as optional for API consumers but the UI should only show one title field.
affiliationIdentifier / affiliationIdentifierScheme / schemeURI (within creator affiliations) - remove from UI. Plain-text affiliation strings only. Institutional IDs (ROR, ISNI) can be resolved programmatically downstream.
publisher - remove from UI. Keep publicationYear as required.
alternateIdentifiers - system-generated only, never user-facing.

3. Prisma Model vs Schema Alignment - Discussion Points

3a. `publicationYear Int?` - should this be non-nullable?

Should publicationYear stay in our mandatory 5, should this be Int rather than Int?? Or do we want the DB to allow null for draft/incomplete records and enforce required at the API validation layer instead?

3b. `domain` naming

The JSON schema uses researchField (we renamed it from domain to avoid collision with biological taxonomy where "domain" is the highest classification rank above kingdom). The Prisma model still uses domain. We should align to researchField in Prisma to avoid a terminology split. This was from the previous PR with UC3 & dataCite feedback.

3c. `tableCaptions` / `imageCaptions` structure

Schema (current):

{ "captions": ["Figure 1. Overview of the Task Force", "Shows approximately 12 active members..."] }

Prisma:

{ tableTitle: string, tableDescription: string }
{ imageTitle: string, imageDescription: string }

Prisma should probably align to the schema pattern here rather than the other way around. The captions: string[] array-of-segments is more forgiving ??

3e. `content` vs `posterContent` naming

Schema uses content. From feedback UC3 and DataCite to keep it generic (supports future expansion to presentations, infographics, one-pagers).

3f. `subjects` vs `keywords` / `descriptions` vs `abstract` - GUI aliasing

The schema sticks with DataCite terminology (subjects, descriptions). But the GUI should use the more intuitive terms (keywords, abstract). These are display labels, and the mapping between GUI label and schema field should be handled at the form/API layer:

GUI: "Keywords" maps to schema: subjects[].subject
GUI: "Abstract" maps to schema: descriptions[0] with descriptionType: "Abstract"

No schema change needed for this. Just a clear convention that the frontend uses friendly labels while the API/schema stays DataCite-compliant. We should document this mapping somewhere.

3g. `creators` as JSON blob vs. relation table

Current Prisma model stores creators as Json @default("[]"). Since creators/authors are one of the 5 required fields and the most likely thing users will search by ("find all posters by author X"), worth discussing whether this should be a proper PosterCreator relation table for queryability.

4. Auto-population & AI Extraction Candidates

Some of this Jamey neeeds to add to the extraction prompts.

Field	Source	User-facing?
`formats`	Detect from uploaded file	No, auto-fill
`types`	Always `"Poster"`	No, auto-fill
`language`	LLM detection, default `"en"`	Optional override
`identifiers`	Auto-extract if DOI provided	Show DOI input only
`subjects`/keywords	AI suggestion from title + content	User confirms/edits
`descriptions`/abstract	LLM extraction of abstract section	User confirms/edits
`fundingReferences`	LLM extraction from acknowledgments	User confirms/edits
`conference`	LLM extraction from header/footer	User confirms/edits

Summary by Sourcery

Update poster JSON schema to reduce required fields and better align with planned UI and data model changes.

New Features:

Relax schema requirements to only enforce a smaller core set of mandatory poster metadata fields.

Enhancements:

Adjust optional/required status and structure of schema fields to support auto-population and streamlined UI.
Align schema field naming and structures with DataCite conventions and planned Prisma/GUI terminology.

- Required fields now: creators, titles, publicationYear, subjects, descriptions - Removed from required: identifiers, publisher, dates, language, types, formats, rightsList, fundingReferences, conference - Removed minItems:1 from optional arrays (identifiers, dates, formats, rightsList, fundingReferences) - Simplified descriptions: only 'description' required, not 'descriptionType' (defaults to Abstract) - Updated DOI field comment to clarify auto-extraction behavior - Version remains at 0.1

fairdataihub-bot · 2026-02-05T23:53:56Z

Thank you for submitting this pull request! We appreciate your contribution to the project. Before we can merge it, we need to review the changes you've made to ensure they align with our code standards and meet the requirements of the project. We'll get back to you as soon as we can with feedback. Thanks again!

sourcery-ai · 2026-02-05T23:53:59Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Updates the poster JSON schema to reduce the number of required fields, align naming with UI/Prisma plans, and relax constraints to support more auto-populated and optional metadata.

File-Level Changes

Change	Details	Files
Reduce the number of required root-level fields from 14 to 5 and relax minimum item constraints for optional arrays.	Update the schema `required` array to only include `creators`, `titles`, `publicationYear`, `subjects`, and `descriptions`. Adjust `minItems` or similar constraints on optional array properties so they can be empty or omitted without failing validation. Ensure non-required metadata like identifiers, dates, types, formats, rightsList, fundingReferences, conference, language, and publisher are modeled as optional with sensible defaults handled outside the schema.	`poster_schema.json`
Align schema structure and naming with DataCite and planned Prisma/UI mappings while simplifying the user-facing surface area.	Confirm that schema fields retain DataCite-style naming (`subjects`, `descriptions`, `creators`, etc.) while allowing UI aliases (keywords/abstract) to be documented and handled in application code, not the schema. Keep a single primary title structure and drop alternative UI-only fields like `alternativeTitle` while leaving room in the schema for optional title types, if needed. Ensure schema structure for captions and content (`captions`, `content`) is stable so Prisma and frontend implementation can standardize against it.	`poster_schema.json`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

fairdataihub-bot · 2026-02-05T23:54:06Z

Thanks for making updates to your pull request. Our team will take a look and provide feedback as soon as possible. Please wait for any GitHub Actions to complete before editing your pull request. If you have any additional questions or concerns, feel free to let us know. Thank you for your contributions!

megasanjay · 2026-02-06T00:21:03Z

For my comments;

remove doi (prefix,suffix) altogether. It can be part of identifiers.
remove alternateIdentifiers altogether
publisher is not us. It should be blank since we are not providing a doi
rightsList shouldn't have a default
imo conference should be mandatory

… to required Changes based on Sanjay Soundarajan's review: - Removed: doi, prefix, suffix fields (DOI should be part of identifiers array if needed) - Removed: alternateIdentifiers field entirely (system-internal only) - Updated: publisher description to clarify it should be blank unless poster has formal DOI - Added: conference to required fields (now 6 required: creators, titles, publicationYear, subjects, descriptions, conference) - Clarified: rightsList has no default (already correct, just confirming) Version remains at 0.1

Per Sanjay feedback: changed from object arrays with nested 'captions' property to simple string arrays. Before: [{"captions": ["Table 1...", "Description..."]}] After: ["Table 1. Summary of Results", "Table 2. Comparison..."] This simplifies the structure and aligns better with typical extraction output.

jimnoneill · 2026-02-06T00:35:53Z

Schema Changes Implemented (3 commits)

Commit 1: Required fields reduction (14 → 5)

Removed from required: identifiers, publisher, dates, language, types, formats, rightsList, fundingReferences, conference
Removed minItems: 1 from optional arrays
Simplified descriptions to only require description (not descriptionType)

Commit 2: Sanjay feedback

Removed: doi, prefix, suffix fields (use identifiers array instead)
Removed: alternateIdentifiers field entirely
Updated: publisher description - leave blank unless formal DOI exists
Added: conference back to required

Commit 3: Caption simplification

tableCaptions: Changed from [{"captions": [...]}] to ["caption text"]
imageCaptions: Changed from [{"captions": [...]}] to ["caption text"]

Final Required Fields (6 total)

creators
titles
publicationYear
subjects
descriptions
conference

Prisma Model Alignment Needed (separate PR for web app)

Schema Field	Current Prisma	Action Needed
`content`	`posterContent`	Rename to `content`
`researchField`	`domain`	Rename to `researchField`
`tableCaptions`	Object array with title/description	Change to `String[]`
`imageCaptions`	Object array with title/description	Change to `String[]`
`publicationYear`	`Int?`	Consider making non-nullable
`creators`	`Json` blob	Consider relation table for search

extraction prompts updated in poster2json

poster2json/poster2json/extract.py updated to match new schema
Both EXTRACTION_PROMPT and FALLBACK_PROMPT now request all 6 required fields
posterContent → content in prompts
Caption format simplified to string arrays
Post-processing migrates old field names to new ones

megasanjay · 2026-02-06T00:36:28Z

@sourcery-ai dismiss

megasanjay · 2026-02-06T00:39:28Z

Could we make rightsList mandatory? i think that will, at the minimum, allow for reuse based on metadata. And I think formats should remain mandatory as well since its automatable (and helps for machine reuse)

slugb0t · 2026-02-06T01:04:40Z

I agree with Sanjay's points. rightsList should be mandatory if we want to be FAIR-aligned.

Although flattening imageCaptions and tableCaptions to string arrays could make our lives easier I have concerns about referencing associated images and captions. This could happen when the model fails to find an image but still finds a caption or vice versa, the array indexes will fall out of sync. In a nested objected we could include a reference to each other.

…id field Per Sanjay/slugb0t feedback: - Added rightsList to required (FAIR compliance for reuse) - Added formats to required (auto-detectable, helps machine reuse) - Restored imageCaptions/tableCaptions as object arrays with: - required 'caption' field (string) - optional 'id' field for cross-referencing images with captions - Added minItems:1 to formats and rightsList - Updated descriptions for clarity Final required fields (8): creators, titles, publicationYear, subjects, descriptions, conference, rightsList, formats

jimnoneill · 2026-02-06T01:08:55Z

Additional Schema Changes (Commit 4)

Per @megasanjay and @slugb0t feedback:

Required Fields Added

rightsList: Now mandatory for FAIR compliance and enabling metadata-based reuse
formats: Now mandatory (auto-detectable from upload, helps machine reuse)

Caption Structure Restored

Per slugb0t's concern about image/caption cross-referencing, restored object format:

{
  "imageCaptions": [
    {"id": "fig1", "caption": "Figure 1. Overview of workflow"},
    {"id": "fig2", "caption": "Figure 2. Results distribution"}
  ],
  "tableCaptions": [
    {"id": "table1", "caption": "Table 1. Summary statistics"}
  ]
}

id field: Optional, for cross-referencing when images and captions are extracted separately
caption field: Required, the full caption text

Final Required Fields (8 total)

creators
titles
publicationYear
subjects
descriptions
conference
rightsList ← NEW
formats ← NEW

megasanjay · 2026-02-06T01:18:14Z

I like the imageCaptions idea. I guess it will make it flexible enough so that anyone can use their own id on there. plus if we decide to extract images and tables we can label them with that id.

megasanjay · 2026-02-06T01:19:50Z

poster_schema.json

        "required": ["identifier", "identifierType"],
        "additionalProperties": false
      },
-      "minItems": 1,


we should keep minItems and uniqueItems (its there to prevent empty arrays Or unique items only if we want) ideally the identifiers key should not be there if they don't have any items

We do uniqueItems only for dates so lets follow the same one as that

megasanjay · 2026-02-06T01:21:53Z

poster_schema.json

    "formats": {
      "type": "array",
-      "description": "Technical format of the data files in the dataset. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg.",
+      "description": "Technical format of the poster file. Use file extension or MIME type where possible. This field is auto-detected from the uploaded file.",


I think this should be labelled as Technical format of the poster file. Use file extension or MIME type where possible. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg. don't need to mention the auto detected part since that is more for our platform and not for other peoples reuse

megasanjay · 2026-02-06T01:23:27Z

poster_schema.json

          }
        },
-        "required": ["descriptionType", "description"]
+        "required": ["description"]


Let's keep descriptionType required. I could see users using the other ones as needed( for our platform we can only require abstract but keeping the schema more flexible would be nice)

megasanjay · 2026-02-06T01:24:46Z

The only one that we might need guidance on is ethicsApproval Is this a common field at the poster stage? Might need @bvhpatel guidance on this one. Its optional so i'm okay with it

megasanjay · 2026-02-06T01:26:30Z

Also unrelated but should we keep schema v0.1 on this repository as well (as an additional folder) or does the zenodo entry count for that? @bvhpatel

- identifiers: Added minItems:1 back (prevent empty arrays) - formats: Updated description with examples, removed platform-specific text - descriptions: Made descriptionType required again (flexibility for other uses)

jimnoneill · 2026-02-06T21:45:53Z

Also unrelated but should we keep schema v0.1 on this repository as well (as an additional folder) or does the zenodo entry count for that? @bvhpatel

We have a copy in a folder of the repo currently. I did it for redundancy sake. But don't feel strongly either way

jimnoneill · 2026-02-06T23:45:02Z

Additional Schema Changes (Commit 5)

Per @megasanjay review feedback:

`identifiers` Array Constraints Restored

"identifiers": {
  "type": "array",
  ...
  "minItems": 1,
  "uniqueItems": true
}

minItems: 1 prevents empty arrays—ideally the identifiers key should not be present if there are no items
uniqueItems: true ensures no duplicate identifiers (following same pattern as dates)

`formats` Description Updated

Removed platform-specific language for better reusability:

"formats": {
  "description": "Technical format of the poster file. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg."
}

(Removed "auto-detected from the uploaded file" since that's platform-specific)

`descriptionType` Remains Required

"descriptions": {
  "items": {
    "required": ["descriptionType", "description"]
  }
}

Keeps schema flexible for users who may want to use other description types beyond "Abstract" (e.g., "Methods", "TechnicalInfo", etc.)

No Changes to Required Fields

Still 8 required root fields:

creators
titles
publicationYear
subjects
descriptions
conference
rightsList
formats

slugb0t · 2026-02-11T07:05:22Z

Another thing I've been thinking about is if we should consider removing the dates key entirely.

conference already contains conferenceStartDate and conferenceEndDate.
Looking at the options for dates they don't seem to have much relevancy for posters.

"Accepted" / "Submitted" - Not sure where this would be needed
"Created" - Less meaningful than when it was presented
"Issued" - Covered by db timestamps
"Updated" / "Withdrawn" - Edge cases handled by versioning

jimnoneill · 2026-02-11T18:10:17Z

I hear you ... I recall strong feelings on both sides regarding the conference-specific dates being important vs. sticking closely to DataCite ... we'll talk about it later today. One of needs to be removed for sure

fairdataihub-bot · 2026-02-11T18:37:59Z

Thanks for making updates to your pull request. Our team will take a look and provide feedback as soon as possible. Please wait for any GitHub Actions to complete before editing your pull request. If you have any additional questions or concerns, feel free to let us know. Thank you for your contributions!

Jamey O'Neill added 2 commits February 5, 2026 16:25

megasanjay reviewed Feb 6, 2026

View reviewed changes

Address Sanjay review comments

80668b0

- identifiers: Added minItems:1 back (prevent empty arrays) - formats: Updated description with examples, removed platform-specific text - descriptions: Made descriptionType required again (flexibility for other uses)

jimnoneill changed the title ~~Schema Requirement Updates: 14 → 5 Required Fields & UI Alignment~~ Schema Requirement Updates: 14 → 8 Required Fields & UI Alignment Feb 11, 2026

Schema Requirement Updates: 14 → 8 Required Fields & UI Alignment #14

Are you sure you want to change the base?

Schema Requirement Updates: 14 → 8 Required Fields & UI Alignment #14

Conversation

jimnoneill commented Feb 5, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Schema Requirement Updates & Prisma, UI Alignment

Quick summary of current PR & UI plans

1. Required Fields: Reducing from 14 to 5

2. Fields to Remove from Staging UI

3. Prisma Model vs Schema Alignment - Discussion Points

3a. publicationYear Int? - should this be non-nullable?

3b. domain naming

3c. tableCaptions / imageCaptions structure

3e. content vs posterContent naming

3f. subjects vs keywords / descriptions vs abstract - GUI aliasing

3g. creators as JSON blob vs. relation table

4. Auto-population & AI Extraction Candidates

Summary by Sourcery

Uh oh!

fairdataihub-bot bot commented Feb 5, 2026

Uh oh!

sourcery-ai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

fairdataihub-bot bot commented Feb 5, 2026

Uh oh!

megasanjay commented Feb 6, 2026

Uh oh!

jimnoneill commented Feb 6, 2026

Schema Changes Implemented (3 commits)

Commit 1: Required fields reduction (14 → 5)

Commit 2: Sanjay feedback

Commit 3: Caption simplification

Final Required Fields (6 total)

Prisma Model Alignment Needed (separate PR for web app)

extraction prompts updated in poster2json

Uh oh!

megasanjay commented Feb 6, 2026

Uh oh!

megasanjay commented Feb 6, 2026

Uh oh!

slugb0t commented Feb 6, 2026

Uh oh!

jimnoneill commented Feb 6, 2026

Additional Schema Changes (Commit 4)

Required Fields Added

Caption Structure Restored

Final Required Fields (8 total)

Uh oh!

megasanjay commented Feb 6, 2026

Uh oh!

megasanjay Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

megasanjay Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

megasanjay Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

megasanjay Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

megasanjay commented Feb 6, 2026

Uh oh!

megasanjay commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimnoneill commented Feb 6, 2026

Uh oh!

jimnoneill commented Feb 6, 2026

Additional Schema Changes (Commit 5)

identifiers Array Constraints Restored

formats Description Updated

jimnoneill commented Feb 5, 2026 •

edited by sourcery-ai bot

Loading

3a. `publicationYear Int?` - should this be non-nullable?

3b. `domain` naming

3c. `tableCaptions` / `imageCaptions` structure

3e. `content` vs `posterContent` naming

3f. `subjects` vs `keywords` / `descriptions` vs `abstract` - GUI aliasing

3g. `creators` as JSON blob vs. relation table

sourcery-ai bot commented Feb 5, 2026 •

edited

Loading

megasanjay commented Feb 6, 2026 •

edited

Loading

`identifiers` Array Constraints Restored

`formats` Description Updated

`descriptionType` Remains Required

slugb0t commented Feb 11, 2026 •

edited

Loading