Skip to content

Conversation

@gouttegd
Copy link
Contributor

  • docs/ have been added/updated if necessary
  • make test has been run locally
  • [ ] tests have been added/updated (if applicable)
  • [ ] CHANGELOG.md has been updated. Already covered by previous changes

The added_in annotation is used to indicate that a slot has been introduced in a specific version of the specification. For now, that annotation is only used in the model-wide definition of slots, which means it can only indicate when a brand new slot has been introduced – it can not be used to indicate that a pre-existing slot has been added to a class in which it was not used before.

For example, the similarity_measure slot has existed since before 1.0, but has been only been added to the MappingSet class for version 1.1. Currently that information is not contained in the model, implementers need to check the Model changes across versions section of the spec to be aware of that.

Likewise for curation_rule and curation_rule_text, which have just been added to the MappingSet class.

In this PR, we make use of LinkML's slot_usage mechanism to add the added_in annotation not only to the general (model-wide) definition of a slot, but also to the specific instance of a slot within a class.

This allows to encode, directly into the model, that the use of similarity_measure, curation_rule, and curation_rule_text at the level of the MappingSet class is only possible starting from version 1.1.

(This really should have been done directly in the PRs that added those slots to the MappingSet class – #412 and #472 –, but at the time I had not realized that it was possible to use slot_usage to add annotations to a slot only within the context of a particular class.)

The `added_in` annotation is used to indicate that a slot has been
introduced in a specific version of the specification. For now, that
annotation is only used in the model-wide definition of slots, which
means it can only indicate when a brand new slot has been introduced --
it can not be used to indicate that a pre-existing slot has been added
to a class in which it was not used before.

For example, the `similarity_measure` slot has existed since before 1.0,
but has been only been added to the MappingSet class for version 1.1.

In this commit, we make use of LinkML's `slot_usage` mechanism to add
the `added_in` annotation not only to the general (model-wide)
definition of a slot, but also to the specific instance of a slot within
a class.
@gouttegd gouttegd requested a review from matentzn August 13, 2025 17:51
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@matentzn matentzn requested a review from ehartley August 13, 2025 18:10
@gouttegd gouttegd self-assigned this Aug 13, 2025
@cthoyt
Copy link
Member

cthoyt commented Aug 13, 2025

can we add a unit test that checks that added in is annotated on all fields? This would also require back-annotating all old fields to be added in 1.0

@gouttegd
Copy link
Contributor Author

This would also require back-annotating all old fields to be added in 1.0

(1) This would mean having to add to every single slot the following stanza

instantiates: sssom:Versionable
annotations:
  added_in: "1.0"

Rather than cluttering the schema, it is much easier and much more readable in my opinion to keep the current and reasonable assumption that, if a slot does not have an explicit added_in annotation, it is assumed to have been there (at least) since version 1.0.

I wouldn’t be against a test that checks the diff on the src/sssom_schema/schema/sssom_schema.yml file to detect if the PR is adding a new slot, and triggers a failure if that new slot does not have a added_in annotation – though I do feel such a check would be overkill since we are not adding a new slot every other day.

(2) This would not have helped in the slightest with the issue that the present PR is addressing, which is the fact that slots that were already existing in 1.0 but were only used in the Mapping class, and have now been added to the MappingSet class, were not marked as such.

@matentzn
Copy link
Collaborator

matentzn commented Aug 14, 2025

Would it be possible to make the implicit assumption explicit by setting added_in: 1.0 as a default value? This way we could use SchemaView to write a simple test that no new slot should have a added_in annotation < current release

@gouttegd
Copy link
Contributor Author

How would that “simple” test work? How would it know which slots are new (and therefore should not have a added_in value less than the current release) and which slots are old (and therefore should have the default value of 1.0)?

Let‘s say we add a new foo slot and somehow forget to tag it with the proper added_in annotation (despite the fact that the pull request template explicitly has a check box to remind contributors of that, and also despite the fact that I’ll be very attentive to that thing, since SSSOM-Java needs the new slots to be properly annotated to be able to handle several versions of the spec at once).

The “simple” test will find that the foo slot has a added_in value of 1.0 (default value). Why would it flag that as an error? Or rather, why would it not flag as an error the fact that all the other old slots have a value of 1.0?

How does the test distinguish between

  • foo has a added_in annotation of 1.0, but that is a new slot, so that’s an error;
  • mapping_cardinality has a added_in annotation of 1.0, but that is an old slot, so it’s fine.

@gouttegd
Copy link
Contributor Author

gouttegd commented Aug 14, 2025

If you want to test that new slots have an added_in annotation, I can think of two ways to do it that require neither explicitly setting a added_in annotation of 1.0 to old slots, or explicitly setting the default value of added_in to 1.0 (assuming that is possible in LinkML).

(a) On every PR, compare the old schema with the new schema, get the list of slots in both to figure out which slots are added in the new schema (if any), then check whether all those added slots have a added_in annotation.

(b) Compile once and for all the list of 1.0 slots (it just needs to be done once since that list will by definition never change), and on every PR check that all slots that are not on that list (which are all the post-1.0 slots) have a added_in annotation.

And this will still not catch the issue that the present PR is addressing.

@gouttegd gouttegd mentioned this pull request Aug 14, 2025
2 tasks
Copy link
Contributor

@ehartley ehartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@gouttegd gouttegd merged commit 05b2fd5 into master Aug 17, 2025
4 checks passed
@gouttegd gouttegd deleted the add-added_in-annotations-to-slots-added-to-mappingset branch August 17, 2025 09:23
matentzn pushed a commit that referenced this pull request Aug 20, 2025
This PR adds a test that any new slot must

(1) carry a `added_in` annotation that
(2) is set to the current version of the spec.

The test works by preparing a fixed compiled list of all the “non-new”
slots (all slots present in the 1.0 release) for both the `Mapping` and
`MappingSet` classes. It then loads the current schema and iterates over
all slots for those two classes, checking that every slot that is not a
pre-existing slot is annotated as expected.

Of note, the test currently fails (as it should), because the
`similarity_measure`, `curation_rule`, and `curation_rule_text` slots
are _new_ on the `MappingSet` class but are currently not annotated as
such (this is what #482 is fixing).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants