Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new "literal profile" solution based on "subject_label" and "object_label" #384

Merged
merged 16 commits into from
Aug 9, 2024

Conversation

matentzn
Copy link
Collaborator

@matentzn matentzn commented Aug 4, 2024

Resolves #234

  • docs/ have been added/updated if necessary
  • make test has been run locally
  • tests have been added/updated (if applicable)
  • CHANGELOG.md has been updated.
  • Change base branch to main

If you are proposing a change to the SSSOM metadata model, you must

  • provide a full, working and valid example in examples/
  • provide a link to the related GitHub issue in the see_also field of the linkml model
  • provide a link to a valid example in the see_also field of the linkml model
  • run SSSOM-Py test suite against the updated model

This PR introduces an alternative model to the "literal mapping" proposal we have previously added. It is built on the following principles:

  • Introducing a profile causes a needless overhead which requires specialised tooling
  • We can achieve the same functionality with a very minimal intervention in the main metadatamodel

I do understand that there are various opposing views on the need of a "literal" profile, but I think this super minimal intervention will satisfy both sides.

Huge thanks to @gouttegd who managed to steer this massive carrier ship after it has left the harbour. This is rarely successful and I am very happy we managed to make it!

@matentzn matentzn mentioned this pull request Aug 4, 2024
8 tasks
@gouttegd
Copy link
Contributor

gouttegd commented Aug 4, 2024

Two more questions:

  1. Do we want to represent unmappable literal mappings? If yes, how?

This is basically a question of how the “literal profile” and sssom:NoTermFound should play together.

There are two distinct cases.

1.a) One side is a literal, the other side is sssom:NoTermFound.

Unless I am missing something, this case should be straightforward and should not require any special treatment. That’s a good thing, because it’s the case that is most likely to be useful: if you have to map a list of literals (say, a list of scRNAseq cluster names) to a target vocabulary (say, the CL ontology), you want to be able to say that a given literal could not be mapped to any term in the target vocabulary.

1.b) The literal and the “term not found” are on the same side.

Say you want to map a vocabulary (again, let’s say CL) to a list of literals. You want to say that a given term in CL is not represented in the list of literals. We have a problem here, because the way to represent a “term not found” is to use sssom:NoTermFound as the object_id (assuming the literals are on the object side – it would not change anything if they were on the subject side), but object_id is not used for a literal mapping (literals have no IDs, almost by definition).

So what to do here? I can imagine 3 possibilities:

1.b.1) We don’t support that case. Seems quite reasonable to me a priori, as I don’t think it is a very relevant use case (contrary to the 1.a case). That would be the current situation (with the profile as defined in this PR).

1.b.2) Use sssom:NoTermFound as the object_id, as we would do for non-literal mappings. This would mean that, even for literal mappings (where we are normally concerned only with object_label), we would need to lookup object_id just to check whether it contains the special sssom:NoTermFound value. It leaves open the question of what to put in the object_label slot in that case (we have to put something in that slot: it cannot be empty).

1.b.3) Put in the object_label slot a special value similar to sssom:NoTermFound, intended to indicate that there is no mapped literal. It could even be the value sssom:NoTermFound itself (probably the easiest way).

  1. Computing mapping_cardinality for literal mappings

Not really a question, but when computing the mapping_cardinality, if we are dealing with literal mappings, we must deal with the _label slots rather than the _id slots.

That is, for example: if we are dealing with a set of mappings with literals on the subject side and entities on the object side, then a cardinality of 1:n means that one subject_label (NOT one subject_id) maps to several object_id.

That is, I think, quite obvious, but it should still be mentioned explicitly somewhere.

@matentzn
Copy link
Collaborator Author

matentzn commented Aug 4, 2024

Ok, I will fold that in (think about it first). As an aside, I like it if actionable comments to be added as comments on the diffs so that they show up as "resolveable", so that they also block PRs being merged unless they are explicitly resolved. If there is no good place for such a comment you can just put in on the Changelog diff or on any other file that makes a little bit of sense.

@matentzn matentzn self-assigned this Aug 4, 2024
@matentzn matentzn added the schema label Aug 4, 2024
@gouttegd
Copy link
Contributor

gouttegd commented Aug 4, 2024

As an aside, I like it if actionable comments to be added as comments on the diffs so that they show up as "resolveable"

Duly noted. :)

@matentzn matentzn marked this pull request as ready for review August 6, 2024 13:51
@matentzn matentzn marked this pull request as draft August 6, 2024 13:52
matentzn added a commit that referenced this pull request Aug 6, 2024
Intermediate PR to make it easier to review the new solution for
representing the literal profile.

Note to reviewers: Whoever approves this, intrinsically approves also
#384 (modulo details). The point to merging this first is to make
reviewing #384 much easier.
Base automatically changed from remove-literal-profile to master August 6, 2024 17:48
Instead of creating a new entity reference type, we re-use the existing "rdfs literal". It is widely understood.
This commit amends the #384 PR to:

* describe what literal mappings are and how they should be used;
* remove the old page about “SSSOM profiles”, which is no longer
relevant (literal mappings are not a “profile”);
* attempt to better describe `rdfs literal` in the built-in schema
documentation.
@gouttegd
Copy link
Contributor

gouttegd commented Aug 7, 2024

The SSSOM-Py test suite fails on three tests:

  • test_utils.py::TestUtils::test_get_dict_from_mapping fails because it expects a semantic_similarity_score slot. This has nothing to do with this PR and is instead obviously the result of Replace semantic_similarity_* slots by similarity_* slots #386, which was not tested against SSSOM-Py before being merged.

  • test_parsers.py::TestParse::test_parse_alignment_xml fails when parsing this:

<Cell>"
  <entity1 rdf:resource="http://randomurlwithnochancetobeinprefixmap.org/ID_123"/>
  <entity2 rdf:resource="http://purl.obolibrary.org/obo/WBbt_0005815"/>
  <measure rdf:datatype="xsd:float">0.5</measure>
  <relation>=</relation>
</Cell>

My understanding is that, after some checking against a prefix map, the entity1 ends up being being empty/None/null, and prior to that PR SSSOM-Py could count on the Mapping constructor to automatically reject the mapping as invalid (since the Python generated code included a non-null check because of the required=true attribute on subject_id). With this PR, subject_id is no longer required, so the mapping is “successfully” constructed – which is wrong, and the test catches that.

It looks like SSSOM-Py cannot simply rely on the LinkML runtime implementing the rules set forth in this PR, so it will have to implement its own validation of subject_id / object_id.

  • test_parsers.py::TestParse::test_parse_obographs: I have not looked in details, but it’s most likely a similar problem as above.

@gouttegd
Copy link
Contributor

gouttegd commented Aug 7, 2024

The issues above are putatively fixed by mapping-commons/sssom-py#552.

Copy link
Collaborator Author

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @gouttegd , I was out these past days, now I am getting back :P Some more comments, looking at sssom py now!

CHANGELOG.md Show resolved Hide resolved
src/docs/spec-model.md Outdated Show resolved Hide resolved
src/sssom_schema/schema/sssom_schema.yaml Show resolved Hide resolved
Change the phrasing of the description of "literal mappings" to make it
clear that it is perfectly feasible and allowed to have literal on
_both_ sides of a mapping.
src/docs/spec-model.md Outdated Show resolved Hide resolved
When describing how to represent literal mappings, use BCP 14 keywords
to clarify what exactly is required.
@matentzn matentzn marked this pull request as ready for review August 9, 2024 10:45
Copy link
Collaborator Author

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED (I cant formally approve it, so @gouttegd you will have to).

@jamesamcl jamesamcl self-requested a review August 9, 2024 13:09
Copy link
Contributor

@cmungall cmungall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting to see changes to the derived python too - or will this be done as a separate PR?

@matentzn matentzn merged commit 73a7f66 into master Aug 9, 2024
3 checks passed
@matentzn matentzn deleted the new-literal-profile branch August 9, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSSOM: Separate profile for Literal Mappings
5 participants