Skip to content

Conversation

@gouttegd
Copy link
Contributor

@gouttegd gouttegd commented Jun 1, 2025

Resolves [#448]

  • docs/ have been added/updated if necessary
  • make test has been run locally
  • [ ] tests have been added/updated (if applicable)
  • CHANGELOG.md has been updated.
  • run SSSOM-Py test suite against the updated model

URI-typed slots are expected to contain (non-relative) URIs only, but the LinkML uri type accepts both URIs and relative URI references (something that is not very clear in the LinkML’s metamodel, but the implementation of the URI type leaves no doubt that relative URI references are considered valid values).

So we add a new NonRelativeURI type, based on the LinkML uri type but with the added restriction that the URI must not be a relative URI reference. Of note, that restriction is only documented, not enforced: it is up to implementations to actually check that the value of a NonRelativeURI-typed slot is indeed a non-relative URI, as the underlying LinkML implementation will accept both non-relative and relative URI references.

All slots that were previously defined with a uri range are redefined to use the new NonAbsoluteURI type.

Since it has never been clear, in SSSOM 1.0, that URI-typed slots were supposed to contain only non-relative URIs, for backwards compatibility it is RECOMMENDED that implementations still accept relative URI references when processing sets compliant with version 1.0 of the spec.

Add a new type `AbsoluteURI`, based on the LinkML type `uri` but with
the added restriction that the URI must be absolute. Update `uri`-typed
slots to make them use the `AbsoluteURI` type instead. Add a
recommendation that implementations SHOULD still accept relative URIs in
all `AbsoluteURI`-typed slots, when they are processing a SSSOM 1.0 set.

closes #448
@gouttegd gouttegd self-assigned this Jun 1, 2025
@gouttegd gouttegd requested a review from matentzn June 1, 2025 19:39
…RI).

We rename the `sssom:AbsoluteURI` to `sssom:NonRelativeURI`. "Absolute
URI" is a misnomer and a possible source of confusion, because the RFC
3986 uses that name to refer to URIs that both (1) have a scheme
component and (2) do not have a fragment component. In the context of
SSSOM, we only want (1): we want to forbid relative URI references that
do not have a scheme, but we do _not_ want to forbid non-relative URIs
that have a fragment.

(The name `NonRelativeURI` is somewhat ridiculous, as per the RFC 3986
all URIs are necessarily non-relative, and only "URI references" can be
relative; but we cannot name our type "URI" because of the risk of
confusion with the LinkML type, which despite its name refers in fact to
URI references... ><)

We also explicitly clarify that the type is specifically intended to
exclude relative URI references, and provide some examples of both valid
URIs and invalid relative URI references.
@gouttegd gouttegd changed the title Clarify that URI-typed slots expect absolute URI values. Clarify that URI-typed slots expect non-relative URI values. Jun 2, 2025
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is across the board better than what we currently have - I wonder if we should technically speaking separate nonRelativeURI from URL, as the latter excludes URNs? Some of the slots (about a third I think) clearly want URLs, not arbitrary non relative URIs. Happy to wave this one through though, just suggesting!

@gouttegd
Copy link
Contributor Author

I wonder if we should technically speaking separate nonRelativeURI from URL, as the latter excludes URNs

I’d say no.

The distinction between a URI and a relative URI reference is a clear and non-ambiguous one: if there is no scheme component (i.e. if the string does not start with xxx:), then it’s a relative URI reference.

The distinction between a URI and a URL is much less clear (as the RFC 3986 itself says). In fact, it is clear only for URNs (URIs with a urn: scheme), which by definition are not URLs. For many other schemes, whether URIs can be considered as “locators” or not depends solely on whether your implementation is aware of the scheme and knows to how to locate the indicated resources.

Examples of URIs that, strictly speaking, are also locators, but that may not be considered as such depending on the capabilities of the program manipulating them:

  • gopher://flybase.org/ – you would probably be hard-pressed to resolve such a URI with a modern browser, so in effect it is not a URL;
  • doi:10.1093/database/baac087 (yes, it is a URI: doi is a valid URI scheme, so this is a full URI, not a CURIE) – in most browsers this will be a mere identifier, because most browsers don’t know about the doi: scheme; but Protégé for example does know about that scheme, so in Protégé this is in fact a locator;
  • nntp://example.org/comp.os.linux – again, whether this is a locator or not will depend on whether the implementation knows how to fetch Newsgroups resources.

So, I think SSSOM should stay clear of this and only ever mention URI, without trying to draw a line between URLs and non-URLs.

@matentzn matentzn requested a review from ehartley July 8, 2025 14:57
Damien Goutte-Gattat and others added 2 commits July 8, 2025 16:09
When describing the typing change of the `see_also` slot, the "changes in SSSOM 1.1" section should correctly refer to the new type as `sssom:NonRelativeURI` instead of `sssom:AbsoluteURI`.

Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>
Copy link
Contributor

@ehartley ehartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the range of the prefix_url slot also be changed to NonRelativeURI?

Co-authored-by: Emily Hartley <ehartley@c-path.org>
@gouttegd
Copy link
Contributor Author

Should the range of the prefix_url slot also be changed to NonRelativeURI?

I sure hope no one would ever even think of using relative URI references in a curie map! What would that even mean? :D

But if we don’t explicitly forbid it, then it is almost guaranteed that someone, somewhere, will do exactly that just because they (think they) can, so I agree. I will change the range of prefix_url now.

(As an aside, it is unfortunate that this slot has been named prefix_urL, as prefix_urI would have been better. Well, too late for that now.)

@gouttegd
Copy link
Contributor Author

I sure hope no one would ever even think of using relative URI references in a curie map! What would that even mean? :D

… But if we don’t explicitly forbid it, then it is almost guaranteed that someone, somewhere, will do exactly that just because they (think they) can, so I agree. I will change the range of prefix_url now.

Hum, on second thought, not so sure about that.

The prefix_url values in the curie map are used to construct the IRIs in all slots typed as EntityReference. The way the linkml:Uriorcurie type (which is the underlying type of EntityReference) is defined, it has always been possible to have “IRIs” that are in fact relative URI references and not (absolute) URIs.

So, something like this:

#curie_map:
#   PFX1: relative/uri/prefix
#   PFX2: another/relative/uri/prefix
#subject_id   predicate_id      object_id   mapping_justification
PFX1:1234     skos:exactMatch   PFX2:5678   semapv:manualMappingCuration

where the final identifiers are of the form relative/uri/prefix/1234 or another/relative/uri/prefix/5678 is in fact (and has always been) completely valid and accepted.

So I don’t think we should change that now. We may think that semantic identifiers should always be fully resolvable IRIs, and that relative identifiers like in the example above are not great, but if people do have such identifiers they should be allowed to use them in SSSOM.

Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

@matentzn matentzn requested a review from ehartley July 10, 2025 13:52
Copy link
Contributor

@ehartley ehartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@matentzn matentzn merged commit 27ca352 into master Jul 11, 2025
4 checks passed
@matentzn matentzn deleted the absolute-uri-slots branch July 11, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants