Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganise docs #368

Merged
merged 20 commits into from
Jul 11, 2024
Merged

Reorganise docs #368

merged 20 commits into from
Jul 11, 2024

Conversation

gouttegd
Copy link
Contributor

Resolves [#330]

  • docs/ have been added/updated if necessary
  • make test has been run locally
  • tests have been added/updated (not applicable)
  • CHANGELOG.md has been updated (not applicable)

This PR reorganises the documentation, especially the specification part, as suggested in #330.

More precisely:

  • Several bits of informations that were scattered throughout the website are now regrouped on the index page (e.g. “contacts” and “credits”).
  • The home page gets a “SSSOM at a glance” section to give an immediate glimpse of what the standard is about.
  • The specification is broken down as follows:
    • general introduction
    • specification of the data model
      • introduction and notes complementary to the LinkML-generated documentation
      • LinkML-generated documentation
    • specification of the serialisation formats

The “resources for users” section is left untouched for now. The urgent part was reorganising the specification, so that we can start enriching it to make it ready for 1.0.

gouttegd and others added 13 commits April 3, 2024 22:11
Use the 'home.md' file, renamed to 'index.md', as the "index page" (that
is, the page shown when the visitor does not request an explicit page),
instead of the LinkML-generated index page (which is not suitable to be
the first thing the visitor sees upon arriving on the website).

The LinkML-generated index page is renamed to 'linkml-index.md'. The
name does not really matter, as long as it is *not* 'index.md'.

This requires updating LinkML to version 1.7.0 at least, because prior
versions hardcoded the name of the generated index page to 'index.md',
thereby forcing that page to be the site index page (unless the web
server is configured differently). This in turns requires bumping
slightly the minimum Python version from 3.8 to 3.8.1.
The contents of the "About" page (about.md) was redundant with the index
page, so we remove it.

We rewrite the index/about page to:

* separate the description of the standard from the description of what
  the Mapping Commons project does (previous description was conflating
  the two things; for example, providing reference tools and software
  libraries is *not* part of the standard; it's part of the efforts to
  promote the use of the standard);
* fix the basic description of what a SSSOM mapping is, and also add a
  mention of what a "mapping set" (the second most important core
  concept) is;
* slightly re-organise the list of "quick links".
Update the index page to:

* add the SSSOM logo on top (it makes more sense to put it there than at
  the top of the "overview" page; visitors will see it first);
* rename the top section "SSSOM at a glance", and add to it an example
  of a file in the SSSOM/TSV format; the idea of that section is to give
  a quick overview of what we are talking about (so that the readers can
  decide immediately whether SSSOM is what they were looking for);
* add infos about the team (contact and list of editors/contributors);
* add acknowledgements section for listing funding sources and
  significant contributions.

The last two points are moved from other pages of the doc (notably
contact.md, credits.md). Better to have them on the first page so that
they are out the way.
Split the existing "spec.md" file into two components:

- a general introduction on mappings;
- the actual specification, which is itself split in several parts:
    - the specification of the data model;
    - the specification of the serialisation formats.

This commit creates placeholder files that will hold those different
sections. The "general introduction" file is pre-filled with the
contents of the "Introduction" section of the original "spec.md" file.

(I believe that introduction should be entirely re-written from scratch,
as it sometimes reads like a patchwork of unrelated pieces pasted
together. But that will be for later work. The most urgent for now is to
have a place where we can write the actual *specification* with all its
details.)
We update the link to the logo used at the top of the index page to
point to a local copy of the logo, rather than to its original online
location. This insulates the documentatiom from any unexpected change in
said original remote location.
Replace the placeholder links on the index page by actual links to the
appropriate sections of the documentation.
Add an introductory paragraph at the beginning of the "specification"
section, along with a paragraph (copied from BCP 14) explaining the
meaning of the MUST/SHOULD/etc. keywords.

Add an overall overview of the data model and a subsection explaining
what the "propagatable slots" are.
At the beginning of the specification, we add a table with the list of
all the prefix names that are used throughout the specification.

This will also act as the list of "built-in" prefix names, which will be
referred to from the spec of the SSSOM/TSV format.
In the section about the data model, we add a list of the mapping
predicates that are considered common and that are recommended.

This is mostly taken from the old spec.md document, except that we also
mention the predicates defined in the SEMAPV vocabulary, that the old
spec was not mentioning at all.
Add a complete and workable specification for the SSSOM/TSV format. This
is an original version that takes very little from the old spec.md
document, except the examples.

The specification for the OWL/RDF format, on the other hand, is directly
taken from the old spec, almost "as is".

The "specification" for the JSON format is currently merely a
placeholder, since that format is NOT specified for now.
All bits of the old spec.md document have now been moved (or rewritten)
elsewhere, so we can remove it.
The `code_of_conduct.md` file is an exact duplicate of
`contributing.md`, so we remove it.
@gouttegd gouttegd self-assigned this Jun 27, 2024
@gouttegd gouttegd requested a review from matentzn June 27, 2024 16:47
@gouttegd gouttegd added the documentation Improvements or additions to documentation label Jun 27, 2024
Fix some typos, missing words, and inconsistent names of placeholder
variables in the spec.
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely fantastic work @gouttegd

I think our main disagreement right now is the condensation requirement. I love that the concept of condensation was introduced, I love the name, and I love the specification - I just disagree with mandating it in "strict" mode at the moment, or encouraging it in "non-strict" mode. (Unless, of course, I have already agreed to it elsewhere and just forgot about it)

I hope we did

src/docs/index.md Outdated Show resolved Hide resolved
src/docs/index.md Show resolved Hide resolved
src/docs/introduction.md Show resolved Hide resolved
src/docs/introduction.md Show resolved Hide resolved
src/docs/introduction.md Show resolved Hide resolved
src/docs/spec-formats-tsv.md Show resolved Hide resolved
src/docs/spec-formats-tsv.md Show resolved Hide resolved
src/docs/spec-formats-tsv.md Outdated Show resolved Hide resolved
src/docs/spec-formats-tsv.md Show resolved Hide resolved
src/docs/spec-formats-tsv.md Show resolved Hide resolved
When showing examples of SSSOM/TSV files (on the index page and in the
spec for the SSSOM/TSV format), use the mapping set from the "basic
tutorial".
Add a section listing YAML features that MUST NOT be used in the
metadata section of the SSSOM/TSV file.

Those features are not uniformly supported even among high-quality YAML
implementations and do not bring much. SSSOM is supposed to be _simple_,
so we forbid them entirely.
Add a requirement that condensation, when supported, MUST be
deactivatable.

Also clarify that propagation and condensation go together, so that an
implementation that supports one MUST support the other.
Amend the canonical rule for serialising floating point values with UP
TO 3 digits after the decimal point AS NEEDED.

That is, if more than 3 digits would be needed to write the value, then
the writer MUST truncate after the third digit, but if the value can be
written (without loss of precision) with less than 3 digits, the writer
MUST NOT right-pad the value with zeroes.

So a value like 0.9 is to be written as "0.9", NOT as "0.900".
@gouttegd gouttegd requested a review from matentzn July 5, 2024 09:50
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve:

  1. The reorganisation overall
  2. The general sentiment expressed by the spec-formats sections that are new

I reserve myself the right the revisit details on the implementation side if (and only if) putting them into action will result in a violation of current common practice. I don't think there is anything, I just want to be super transparent.

Thank you @gouttegd I am happy with this. When you are happy, what do you suggest:

  1. Merge bypassing the need for a second review on the grounds that little new is added to the spec (only clarifications and interpretation of the spirit)
  2. Me to find a second reviewer to fulfill the 2-reviewer requirement

I will leave the choice to you, I am ok with either.

@gouttegd
Copy link
Contributor Author

gouttegd commented Jul 8, 2024

Given that the “reorganisation of the docs” part does not actually change any content (it merely puts the doc in a shape that it will make it easier to work on it), I am fine with that part not being reviewed by a second reviewer.

For the spec-formats-tsv.md part, however, I don’t think it is fitting that the first real formal specification of the SSSOM/TSV format has been written by the developer of only one of the two “major” implementations. I’d like a SSSOM-Py developer to have at least a cursory look at it.

I don’t foresee any problem since the new spec should be fully compatible with existing behaviours in SSSOM-Py. What the new spec does add:

  • Propagation/condensation: As far as I know SSSOM-Py has currently no support for these concepts. The spec proposes them as recommended (SHOULD), so I’d like to know whether SSSOM-Py developers think they can implement them (alternatively, it could be done by an additional layer on top of SSSOM-Py).
  • “Canonical format”: I’ve tried to select canonical rules that would minimize the work needed for SSSOM-Py to generate canonical output. For example, the rule about the formatting of floating point values has been directly inspired by what SSSOM-Py already does, so no change in SSSOM-Py should be needed here.
  • Backwards compatibility with SSSOM < 0.9.1: SSSOM-Py does not support that, but that is very clearly an entirely optional feature (we both agreed on that when we discussed backwards compatibility options), so that shouldn’t be a problem. SSSOM-Py developers can decide that they won’t ever do that.

@matentzn
Copy link
Collaborator

I feel myself responsible for the sssom py implementation, even though the majority of the work has been done by @hrshdhgd.

@hrshdhgd - feel free to review the file called spec-formats-tsv.md in this PR, with a specific emphasis on the points @gouttegd made in his last comment above. We will have to implement condensation and propagation at some point soon after, so it is in any case good if you are familiar with it. Let us know if you have any major qualms!

Thanks!

Copy link
Contributor

@hrshdhgd hrshdhgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so thorough , I love it! Thank you so much @gouttegd for putting this together. It certainly is a lot of hard work and we are sincerely appreciative of the same!

I don't think I follow the condensation and propagation concepts though. Could either of you provide examples so I understand what to implement?

@gouttegd
Copy link
Contributor Author

gouttegd commented Jul 10, 2024

@hrshdhgd

I don't think I follow the condensation and propagation concepts though. Could either of you provide examples so I understand what to implement?

Let’s consider the following set:

#curie_map:
#  COMENT: https://example.com/entities/
#  ORGENT: https://example.org/entities/
#mapping_provider: https://example.org/provider
#mapping_tool: foo mapper
subject_id    subject_label   predicate_id      object_id     object_label   mapping_justification          mapping_tool
ORGENT:0001   alice           skos:closeMatch   COMENT:0011   alpha          semapv:ManualMappingCuration   
ORGENT:0002   bob             skos:closeMatch   COMENT:0012   beta           semapv:ManualMappingCuration   bar mapper
ORGENT:0004   daphne          skos:closeMatch   COMENT:0014   delta          semapv:ManualMappingCuration   
ORGENT:0005   eve             skos:closeMatch   COMENT:0015   epsilon        semapv:ManualMappingCuration   

The set-level metadata contain a value for the mapping_provider and mapping_tool slots. These slots are considered “propagatable“, which means that they really apply to individual mappings, and that putting them at the level of the set is just a “shortcut” to avoid repeating the same value for all mappings.

So in this example, all mappings should be considered to have a mapping_provider of https://example.org/provider.

Propagation is the act of taking the values of propagatable slots at the set level, and filling the corresponding slots in each individual mappings.

After propagation, the above set should look like this:

#curie_map:
#  COMENT: https://example.com/entities/
#  ORGENT: https://example.org/entities/
#mapping_tool: foo mapper
subject_id    subject_label   predicate_id      object_id     object_label   mapping_justification          mapping_tool   mapping_provider
ORGENT:0001   alice           skos:closeMatch   COMENT:0011   alpha          semapv:ManualMappingCuration                  https://example.org/provider
ORGENT:0002   bob             skos:closeMatch   COMENT:0012   beta           semapv:ManualMappingCuration   bar mapper     https://example.org/provider
ORGENT:0004   daphne          skos:closeMatch   COMENT:0014   delta          semapv:ManualMappingCuration                  https://example.org/provider
ORGENT:0005   eve             skos:closeMatch   COMENT:0015   epsilon        semapv:ManualMappingCuration                  https://example.org/provider

Notice that that the set no longer has a mapping_provider value, and conversely that all mappings have one.

Also note that the value of the mapping_tool at the set level (“foo mapper”) has not been propagated, even though mapping_tool is a propagatable slot. This is because one of the mappings already had a value for that slot (mapping #2, which has the value “bar mapper“), and propagation is only allowed when no mappings at all have a value for the propagatable slot.

@gouttegd
Copy link
Contributor Author

gouttegd commented Jul 10, 2024

Condensation is the exact opposite of propagation. It’s taking the values of “propagatable slots” that are set on the mappings, and moving them (if possible, that is if all mappings have the same value) to the level of the set instead.

For example, to condense the second example from my previous message, you would observe that all mappings have the same value for the mapping_provider slot, so you would set a single mapping_provider slot at the level of the set and remove the entire mapping_provider column. You would also observe that not all mappings have the same value for mapping_tool (one mapping has the value “bar mapper“, whereas other mappings have no value), so you would not do anything special for that slot (it is not condensable).

@hrshdhgd
Copy link
Contributor

That makes perfect sense! Thank you for explaining this patiently and perfectly @gouttegd ! I truly appreciate it.

@gouttegd gouttegd merged commit 7c55f8f into master Jul 11, 2024
3 checks passed
@gouttegd gouttegd deleted the reorganise-docs branch July 11, 2024 09:28
gouttegd added a commit that referenced this pull request Jul 19, 2024
Resolves [#305]

- [x] `docs/` have been added/updated if necessary
- [x] `make test` has been run locally
- [ ] tests have been added/updated (not applicable)
- [x]
[CHANGELOG.md](https://github.com/mapping-commons/sssom/blob/master/CHANGELOG.md)
has been updated.

If you are proposing a change to the SSSOM metadata model, you must 

- [ ] provide a full, working and valid example in `examples/` (**not
applicable**: no new example needed as the change only affects how some
slots should be interpreted; it does not add or remove slots, nor does
it change how the propagated slots are used)
- [x] provide a link to the related GitHub issue in the `see_also` field
of the linkml model
- [ ] provide a link to a valid example in the `see_also` field of the
linkml model (**not applicable**, same reason as above)

This PR finalises the fix to #305, by explicitly specifying, directly
within the LinkML model, which slots are considered “propagatable”
(previously this was only informally described in the spec, since #368).
This is done by:

* adding a “metamodel extension class“ (`sssom:Propagatable`) with a
single boolean-ranged attributed `propagated`;
* amending the slots that must be considered propagatable by making them
instantiate the `sssom:Propagatable` extension.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants