-
Notifications
You must be signed in to change notification settings - Fork 0
Schema Requirement Updates: 14 → 8 Required Fields & UI Alignment #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Required fields now: creators, titles, publicationYear, subjects, descriptions - Removed from required: identifiers, publisher, dates, language, types, formats, rightsList, fundingReferences, conference - Removed minItems:1 from optional arrays (identifiers, dates, formats, rightsList, fundingReferences) - Simplified descriptions: only 'description' required, not 'descriptionType' (defaults to Abstract) - Updated DOI field comment to clarify auto-extraction behavior - Version remains at 0.1
|
Thank you for submitting this pull request! We appreciate your contribution to the project. Before we can merge it, we need to review the changes you've made to ensure they align with our code standards and meet the requirements of the project. We'll get back to you as soon as we can with feedback. Thanks again! |
Reviewer's guide (collapsed on small PRs)Reviewer's GuideUpdates the poster JSON schema to reduce the number of required fields, align naming with UI/Prisma plans, and relax constraints to support more auto-populated and optional metadata. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Thanks for making updates to your pull request. Our team will take a look and provide feedback as soon as possible. Please wait for any GitHub Actions to complete before editing your pull request. If you have any additional questions or concerns, feel free to let us know. Thank you for your contributions! |
|
For my comments;
|
… to required Changes based on Sanjay Soundarajan's review: - Removed: doi, prefix, suffix fields (DOI should be part of identifiers array if needed) - Removed: alternateIdentifiers field entirely (system-internal only) - Updated: publisher description to clarify it should be blank unless poster has formal DOI - Added: conference to required fields (now 6 required: creators, titles, publicationYear, subjects, descriptions, conference) - Clarified: rightsList has no default (already correct, just confirming) Version remains at 0.1
Per Sanjay feedback: changed from object arrays with nested 'captions' property to simple string arrays.
Before: [{"captions": ["Table 1...", "Description..."]}]
After: ["Table 1. Summary of Results", "Table 2. Comparison..."]
This simplifies the structure and aligns better with typical extraction output.
Schema Changes Implemented (3 commits)Commit 1: Required fields reduction (14 → 5)
Commit 2: Sanjay feedback
Commit 3: Caption simplification
Final Required Fields (6 total)
Prisma Model Alignment Needed (separate PR for web app)
extraction prompts updated in poster2json
|
|
@sourcery-ai dismiss |
|
Could we make rightsList mandatory? i think that will, at the minimum, allow for reuse based on metadata. And I think formats should remain mandatory as well since its automatable (and helps for machine reuse) |
|
I agree with Sanjay's points. rightsList should be mandatory if we want to be FAIR-aligned. Although flattening imageCaptions and tableCaptions to string arrays could make our lives easier I have concerns about referencing associated images and captions. This could happen when the model fails to find an image but still finds a caption or vice versa, the array indexes will fall out of sync. In a nested objected we could include a reference to each other. |
…id field Per Sanjay/slugb0t feedback: - Added rightsList to required (FAIR compliance for reuse) - Added formats to required (auto-detectable, helps machine reuse) - Restored imageCaptions/tableCaptions as object arrays with: - required 'caption' field (string) - optional 'id' field for cross-referencing images with captions - Added minItems:1 to formats and rightsList - Updated descriptions for clarity Final required fields (8): creators, titles, publicationYear, subjects, descriptions, conference, rightsList, formats
Additional Schema Changes (Commit 4)Per @megasanjay and @slugb0t feedback: Required Fields Added
Caption Structure RestoredPer slugb0t's concern about image/caption cross-referencing, restored object format: {
"imageCaptions": [
{"id": "fig1", "caption": "Figure 1. Overview of workflow"},
{"id": "fig2", "caption": "Figure 2. Results distribution"}
],
"tableCaptions": [
{"id": "table1", "caption": "Table 1. Summary statistics"}
]
}
Final Required Fields (8 total)
|
|
I like the imageCaptions idea. I guess it will make it flexible enough so that anyone can use their own id on there. plus if we decide to extract images and tables we can label them with that id. |
| "required": ["identifier", "identifierType"], | ||
| "additionalProperties": false | ||
| }, | ||
| "minItems": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should keep minItems and uniqueItems (its there to prevent empty arrays Or unique items only if we want) ideally the identifiers key should not be there if they don't have any items
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do uniqueItems only for dates so lets follow the same one as that
poster_schema.json
Outdated
| "formats": { | ||
| "type": "array", | ||
| "description": "Technical format of the data files in the dataset. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg.", | ||
| "description": "Technical format of the poster file. Use file extension or MIME type where possible. This field is auto-detected from the uploaded file.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be labelled as Technical format of the poster file. Use file extension or MIME type where possible. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg. don't need to mention the auto detected part since that is more for our platform and not for other peoples reuse
poster_schema.json
Outdated
| } | ||
| }, | ||
| "required": ["descriptionType", "description"] | ||
| "required": ["description"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep descriptionType required. I could see users using the other ones as needed( for our platform we can only require abstract but keeping the schema more flexible would be nice)
|
The only one that we might need guidance on is |
|
Also unrelated but should we keep schema v0.1 on this repository as well (as an additional folder) or does the zenodo entry count for that? @bvhpatel |
- identifiers: Added minItems:1 back (prevent empty arrays) - formats: Updated description with examples, removed platform-specific text - descriptions: Made descriptionType required again (flexibility for other uses)
We have a copy in a folder of the repo currently. I did it for redundancy sake. But don't feel strongly either way |
Additional Schema Changes (Commit 5)Per @megasanjay review feedback:
|
|
Another thing I've been thinking about is if we should consider removing the
|
|
I hear you ... I recall strong feelings on both sides regarding the conference-specific dates being important vs. sticking closely to DataCite ... we'll talk about it later today. One of needs to be removed for sure |
|
Thanks for making updates to your pull request. Our team will take a look and provide feedback as soon as possible. Please wait for any GitHub Actions to complete before editing your pull request. If you have any additional questions or concerns, feel free to let us know. Thank you for your contributions! |
PR Review: Schema Requirement Updates & Prisma, UI Alignment
Quick summary of current PR & UI plans
domaintoresearchField,posterContenttocontent, captions structure in Prisma to match schema?creatorsas a relation table for author-based search1. Required Fields: Reducing from 14 to 5
After reviewing the staging form UX and the schema, proposing we reduce root-level mandatory fields to :
creatorsidentifiers/ DOI (auto-extract if provided)titlespublisher(auto-populate as "posters.science")publicationYeardatessubjectslanguage(default "en", auto-detect)descriptionstypes(always "Poster", auto-populate)formats(auto-detect from upload)rightsList(default CC-BY-4.0 suggestion)fundingReferencesconferenceSchema PR incoming with updated
requiredarray and relaxedminItemsconstraints on optional arrays.2. Fields to Remove from Staging UI
alternativeTitle- drop from the form entirely. Schema can retaintitleTypeas optional for API consumers but the UI should only show one title field.affiliationIdentifier/affiliationIdentifierScheme/schemeURI(within creator affiliations) - remove from UI. Plain-text affiliation strings only. Institutional IDs (ROR, ISNI) can be resolved programmatically downstream.publisher- remove from UI. KeeppublicationYearas required.alternateIdentifiers- system-generated only, never user-facing.3. Prisma Model vs Schema Alignment - Discussion Points
3a.
publicationYear Int?- should this be non-nullable?Should
publicationYearstay in our mandatory 5, should this beIntrather thanInt?? Or do we want the DB to allow null for draft/incomplete records and enforce required at the API validation layer instead?3b.
domainnamingThe JSON schema uses
researchField(we renamed it fromdomainto avoid collision with biological taxonomy where "domain" is the highest classification rank above kingdom). The Prisma model still usesdomain. We should align toresearchFieldin Prisma to avoid a terminology split. This was from the previous PR with UC3 & dataCite feedback.3c.
tableCaptions/imageCaptionsstructureSchema (current):
{ "captions": ["Figure 1. Overview of the Task Force", "Shows approximately 12 active members..."] }Prisma:
Prisma should probably align to the schema pattern here rather than the other way around. The
captions: string[]array-of-segments is more forgiving ??3e.
contentvsposterContentnamingSchema uses
content. From feedback UC3 and DataCite to keep it generic (supports future expansion to presentations, infographics, one-pagers).3f.
subjectsvskeywords/descriptionsvsabstract- GUI aliasingThe schema sticks with DataCite terminology (
subjects,descriptions). But the GUI should use the more intuitive terms (keywords,abstract). These are display labels, and the mapping between GUI label and schema field should be handled at the form/API layer:subjects[].subjectdescriptions[0]withdescriptionType: "Abstract"No schema change needed for this. Just a clear convention that the frontend uses friendly labels while the API/schema stays DataCite-compliant. We should document this mapping somewhere.
3g.
creatorsas JSON blob vs. relation tableCurrent Prisma model stores
creatorsasJson @default("[]"). Since creators/authors are one of the 5 required fields and the most likely thing users will search by ("find all posters by author X"), worth discussing whether this should be a properPosterCreatorrelation table for queryability.4. Auto-population & AI Extraction Candidates
Some of this Jamey neeeds to add to the extraction prompts.
formatstypes"Poster"language"en"identifierssubjects/keywordsdescriptions/abstractfundingReferencesconferenceSummary by Sourcery
Update poster JSON schema to reduce required fields and better align with planned UI and data model changes.
New Features:
Enhancements: