Skip to content

Conversation

@jimnoneill
Copy link
Collaborator

@jimnoneill jimnoneill commented Feb 5, 2026

PR Review: Schema Requirement Updates & Prisma, UI Alignment

Quick summary of current PR & UI plans

  1. 14 down to 5 required fields
  2. Strip UI clutter - no alternativeTitle, no affiliation IDs, no publisher field, no alternateIdentifiers
  3. Naming alignment w schema vs Prisma - domain to researchField, posterContent to content, captions structure in Prisma to match schema?
  4. GUI labels don't have to match schema field names - keywords/abstract on the frontend, subjects/descriptions in the API. Document the mapping.
  5. Consideration of creators as a relation table for author-based search
  6. Add more stuff to auto-populate what we can - formats, types, language, publisher, identifiers

1. Required Fields: Reducing from 14 to 5

After reviewing the staging form UX and the schema, proposing we reduce root-level mandatory fields to :

Required Optional (moved)
creators identifiers / DOI (auto-extract if provided)
titles publisher (auto-populate as "posters.science")
publicationYear dates
subjects language (default "en", auto-detect)
descriptions types (always "Poster", auto-populate)
formats (auto-detect from upload)
rightsList (default CC-BY-4.0 suggestion)
fundingReferences
conference

Schema PR incoming with updated required array and relaxed minItems constraints on optional arrays.


2. Fields to Remove from Staging UI

  • alternativeTitle - drop from the form entirely. Schema can retain titleType as optional for API consumers but the UI should only show one title field.
  • affiliationIdentifier / affiliationIdentifierScheme / schemeURI (within creator affiliations) - remove from UI. Plain-text affiliation strings only. Institutional IDs (ROR, ISNI) can be resolved programmatically downstream.
  • publisher - remove from UI. Keep publicationYear as required.
  • alternateIdentifiers - system-generated only, never user-facing.

3. Prisma Model vs Schema Alignment - Discussion Points

3a. publicationYear Int? - should this be non-nullable?

Should publicationYear stay in our mandatory 5, should this be Int rather than Int?? Or do we want the DB to allow null for draft/incomplete records and enforce required at the API validation layer instead?

3b. domain naming

The JSON schema uses researchField (we renamed it from domain to avoid collision with biological taxonomy where "domain" is the highest classification rank above kingdom). The Prisma model still uses domain. We should align to researchField in Prisma to avoid a terminology split. This was from the previous PR with UC3 & dataCite feedback.

3c. tableCaptions / imageCaptions structure

Schema (current):

{ "captions": ["Figure 1. Overview of the Task Force", "Shows approximately 12 active members..."] }

Prisma:

{ tableTitle: string, tableDescription: string }
{ imageTitle: string, imageDescription: string }

Prisma should probably align to the schema pattern here rather than the other way around. The captions: string[] array-of-segments is more forgiving ??

3e. content vs posterContent naming

Schema uses content. From feedback UC3 and DataCite to keep it generic (supports future expansion to presentations, infographics, one-pagers).

3f. subjects vs keywords / descriptions vs abstract - GUI aliasing

The schema sticks with DataCite terminology (subjects, descriptions). But the GUI should use the more intuitive terms (keywords, abstract). These are display labels, and the mapping between GUI label and schema field should be handled at the form/API layer:

  • GUI: "Keywords" maps to schema: subjects[].subject
  • GUI: "Abstract" maps to schema: descriptions[0] with descriptionType: "Abstract"

No schema change needed for this. Just a clear convention that the frontend uses friendly labels while the API/schema stays DataCite-compliant. We should document this mapping somewhere.

3g. creators as JSON blob vs. relation table

Current Prisma model stores creators as Json @default("[]"). Since creators/authors are one of the 5 required fields and the most likely thing users will search by ("find all posters by author X"), worth discussing whether this should be a proper PosterCreator relation table for queryability.


4. Auto-population & AI Extraction Candidates

Some of this Jamey neeeds to add to the extraction prompts.

Field Source User-facing?
formats Detect from uploaded file No, auto-fill
types Always "Poster" No, auto-fill
language LLM detection, default "en" Optional override
identifiers Auto-extract if DOI provided Show DOI input only
subjects/keywords AI suggestion from title + content User confirms/edits
descriptions/abstract LLM extraction of abstract section User confirms/edits
fundingReferences LLM extraction from acknowledgments User confirms/edits
conference LLM extraction from header/footer User confirms/edits

Summary by Sourcery

Update poster JSON schema to reduce required fields and better align with planned UI and data model changes.

New Features:

  • Relax schema requirements to only enforce a smaller core set of mandatory poster metadata fields.

Enhancements:

  • Adjust optional/required status and structure of schema fields to support auto-population and streamlined UI.
  • Align schema field naming and structures with DataCite conventions and planned Prisma/GUI terminology.

- Required fields now: creators, titles, publicationYear, subjects, descriptions
- Removed from required: identifiers, publisher, dates, language, types, formats, rightsList, fundingReferences, conference
- Removed minItems:1 from optional arrays (identifiers, dates, formats, rightsList, fundingReferences)
- Simplified descriptions: only 'description' required, not 'descriptionType' (defaults to Abstract)
- Updated DOI field comment to clarify auto-extraction behavior
- Version remains at 0.1
@fairdataihub-bot
Copy link

Thank you for submitting this pull request! We appreciate your contribution to the project. Before we can merge it, we need to review the changes you've made to ensure they align with our code standards and meet the requirements of the project. We'll get back to you as soon as we can with feedback. Thanks again!

@sourcery-ai
Copy link

sourcery-ai bot commented Feb 5, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Updates the poster JSON schema to reduce the number of required fields, align naming with UI/Prisma plans, and relax constraints to support more auto-populated and optional metadata.

File-Level Changes

Change Details Files
Reduce the number of required root-level fields from 14 to 5 and relax minimum item constraints for optional arrays.
  • Update the schema required array to only include creators, titles, publicationYear, subjects, and descriptions.
  • Adjust minItems or similar constraints on optional array properties so they can be empty or omitted without failing validation.
  • Ensure non-required metadata like identifiers, dates, types, formats, rightsList, fundingReferences, conference, language, and publisher are modeled as optional with sensible defaults handled outside the schema.
poster_schema.json
Align schema structure and naming with DataCite and planned Prisma/UI mappings while simplifying the user-facing surface area.
  • Confirm that schema fields retain DataCite-style naming (subjects, descriptions, creators, etc.) while allowing UI aliases (keywords/abstract) to be documented and handled in application code, not the schema.
  • Keep a single primary title structure and drop alternative UI-only fields like alternativeTitle while leaving room in the schema for optional title types, if needed.
  • Ensure schema structure for captions and content (captions, content) is stable so Prisma and frontend implementation can standardize against it.
poster_schema.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@fairdataihub-bot
Copy link

Thanks for making updates to your pull request. Our team will take a look and provide feedback as soon as possible. Please wait for any GitHub Actions to complete before editing your pull request. If you have any additional questions or concerns, feel free to let us know. Thank you for your contributions!

@megasanjay
Copy link
Member

For my comments;

  1. remove doi (prefix,suffix) altogether. It can be part of identifiers.
  2. remove alternateIdentifiers altogether
  3. publisher is not us. It should be blank since we are not providing a doi
  4. rightsList shouldn't have a default
  5. imo conference should be mandatory

Jamey O'Neill added 2 commits February 5, 2026 16:25
… to required

Changes based on Sanjay Soundarajan's review:
- Removed: doi, prefix, suffix fields (DOI should be part of identifiers array if needed)
- Removed: alternateIdentifiers field entirely (system-internal only)
- Updated: publisher description to clarify it should be blank unless poster has formal DOI
- Added: conference to required fields (now 6 required: creators, titles, publicationYear, subjects, descriptions, conference)
- Clarified: rightsList has no default (already correct, just confirming)

Version remains at 0.1
Per Sanjay feedback: changed from object arrays with nested 'captions' property to simple string arrays.

Before: [{"captions": ["Table 1...", "Description..."]}]
After:  ["Table 1. Summary of Results", "Table 2. Comparison..."]

This simplifies the structure and aligns better with typical extraction output.
@jimnoneill
Copy link
Collaborator Author

Schema Changes Implemented (3 commits)

Commit 1: Required fields reduction (14 → 5)

  • Removed from required: identifiers, publisher, dates, language, types, formats, rightsList, fundingReferences, conference
  • Removed minItems: 1 from optional arrays
  • Simplified descriptions to only require description (not descriptionType)

Commit 2: Sanjay feedback

  • Removed: doi, prefix, suffix fields (use identifiers array instead)
  • Removed: alternateIdentifiers field entirely
  • Updated: publisher description - leave blank unless formal DOI exists
  • Added: conference back to required

Commit 3: Caption simplification

  • tableCaptions: Changed from [{"captions": [...]}] to ["caption text"]
  • imageCaptions: Changed from [{"captions": [...]}] to ["caption text"]

Final Required Fields (6 total)

  1. creators
  2. titles
  3. publicationYear
  4. subjects
  5. descriptions
  6. conference

Prisma Model Alignment Needed (separate PR for web app)

Schema Field Current Prisma Action Needed
content posterContent Rename to content
researchField domain Rename to researchField
tableCaptions Object array with title/description Change to String[]
imageCaptions Object array with title/description Change to String[]
publicationYear Int? Consider making non-nullable
creators Json blob Consider relation table for search

extraction prompts updated in poster2json

  • poster2json/poster2json/extract.py updated to match new schema
  • Both EXTRACTION_PROMPT and FALLBACK_PROMPT now request all 6 required fields
  • posterContentcontent in prompts
  • Caption format simplified to string arrays
  • Post-processing migrates old field names to new ones

@megasanjay
Copy link
Member

@sourcery-ai dismiss

@megasanjay
Copy link
Member

Could we make rightsList mandatory? i think that will, at the minimum, allow for reuse based on metadata. And I think formats should remain mandatory as well since its automatable (and helps for machine reuse)

@slugb0t
Copy link
Member

slugb0t commented Feb 6, 2026

I agree with Sanjay's points. rightsList should be mandatory if we want to be FAIR-aligned.

Although flattening imageCaptions and tableCaptions to string arrays could make our lives easier I have concerns about referencing associated images and captions. This could happen when the model fails to find an image but still finds a caption or vice versa, the array indexes will fall out of sync. In a nested objected we could include a reference to each other.

…id field

Per Sanjay/slugb0t feedback:
- Added rightsList to required (FAIR compliance for reuse)
- Added formats to required (auto-detectable, helps machine reuse)
- Restored imageCaptions/tableCaptions as object arrays with:
  - required 'caption' field (string)
  - optional 'id' field for cross-referencing images with captions
- Added minItems:1 to formats and rightsList
- Updated descriptions for clarity

Final required fields (8): creators, titles, publicationYear, subjects, descriptions, conference, rightsList, formats
@jimnoneill
Copy link
Collaborator Author

Additional Schema Changes (Commit 4)

Per @megasanjay and @slugb0t feedback:

Required Fields Added

  • rightsList: Now mandatory for FAIR compliance and enabling metadata-based reuse
  • formats: Now mandatory (auto-detectable from upload, helps machine reuse)

Caption Structure Restored

Per slugb0t's concern about image/caption cross-referencing, restored object format:

{
  "imageCaptions": [
    {"id": "fig1", "caption": "Figure 1. Overview of workflow"},
    {"id": "fig2", "caption": "Figure 2. Results distribution"}
  ],
  "tableCaptions": [
    {"id": "table1", "caption": "Table 1. Summary statistics"}
  ]
}
  • id field: Optional, for cross-referencing when images and captions are extracted separately
  • caption field: Required, the full caption text

Final Required Fields (8 total)

  1. creators
  2. titles
  3. publicationYear
  4. subjects
  5. descriptions
  6. conference
  7. rightsList ← NEW
  8. formats ← NEW

@megasanjay
Copy link
Member

I like the imageCaptions idea. I guess it will make it flexible enough so that anyone can use their own id on there. plus if we decide to extract images and tables we can label them with that id.

"required": ["identifier", "identifierType"],
"additionalProperties": false
},
"minItems": 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should keep minItems and uniqueItems (its there to prevent empty arrays Or unique items only if we want) ideally the identifiers key should not be there if they don't have any items

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do uniqueItems only for dates so lets follow the same one as that

"formats": {
"type": "array",
"description": "Technical format of the data files in the dataset. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg.",
"description": "Technical format of the poster file. Use file extension or MIME type where possible. This field is auto-detected from the uploaded file.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be labelled as Technical format of the poster file. Use file extension or MIME type where possible. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg. don't need to mention the auto detected part since that is more for our platform and not for other peoples reuse

}
},
"required": ["descriptionType", "description"]
"required": ["description"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep descriptionType required. I could see users using the other ones as needed( for our platform we can only require abstract but keeping the schema more flexible would be nice)

@megasanjay
Copy link
Member

The only one that we might need guidance on is ethicsApproval Is this a common field at the poster stage? Might need @bvhpatel guidance on this one. Its optional so i'm okay with it

@megasanjay
Copy link
Member

megasanjay commented Feb 6, 2026

Also unrelated but should we keep schema v0.1 on this repository as well (as an additional folder) or does the zenodo entry count for that? @bvhpatel

- identifiers: Added minItems:1 back (prevent empty arrays)
- formats: Updated description with examples, removed platform-specific text
- descriptions: Made descriptionType required again (flexibility for other uses)
@jimnoneill
Copy link
Collaborator Author

Also unrelated but should we keep schema v0.1 on this repository as well (as an additional folder) or does the zenodo entry count for that? @bvhpatel

We have a copy in a folder of the repo currently. I did it for redundancy sake. But don't feel strongly either way

@jimnoneill
Copy link
Collaborator Author

Additional Schema Changes (Commit 5)

Per @megasanjay review feedback:

identifiers Array Constraints Restored

"identifiers": {
  "type": "array",
  ...
  "minItems": 1,
  "uniqueItems": true
}
  • minItems: 1 prevents empty arrays—ideally the identifiers key should not be present if there are no items
  • uniqueItems: true ensures no duplicate identifiers (following same pattern as dates)

formats Description Updated

Removed platform-specific language for better reusability:

"formats": {
  "description": "Technical format of the poster file. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg."
}

(Removed "auto-detected from the uploaded file" since that's platform-specific)

descriptionType Remains Required

"descriptions": {
  "items": {
    "required": ["descriptionType", "description"]
  }
}

Keeps schema flexible for users who may want to use other description types beyond "Abstract" (e.g., "Methods", "TechnicalInfo", etc.)

No Changes to Required Fields

Still 8 required root fields:

  1. creators
  2. titles
  3. publicationYear
  4. subjects
  5. descriptions
  6. conference
  7. rightsList
  8. formats

@slugb0t
Copy link
Member

slugb0t commented Feb 11, 2026

Another thing I've been thinking about is if we should consider removing the dates key entirely.

conference already contains conferenceStartDate and conferenceEndDate.
Looking at the options for dates they don't seem to have much relevancy for posters.

  • "Accepted" / "Submitted" - Not sure where this would be needed
  • "Created" - Less meaningful than when it was presented
  • "Issued" - Covered by db timestamps
  • "Updated" / "Withdrawn" - Edge cases handled by versioning

@jimnoneill
Copy link
Collaborator Author

I hear you ... I recall strong feelings on both sides regarding the conference-specific dates being important vs. sticking closely to DataCite ... we'll talk about it later today. One of needs to be removed for sure

@jimnoneill jimnoneill changed the title Schema Requirement Updates: 14 → 5 Required Fields & UI Alignment Schema Requirement Updates: 14 → 8 Required Fields & UI Alignment Feb 11, 2026
@fairdataihub-bot
Copy link

Thanks for making updates to your pull request. Our team will take a look and provide feedback as soon as possible. Please wait for any GitHub Actions to complete before editing your pull request. If you have any additional questions or concerns, feel free to let us know. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants