Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check where data-deposition centres point for format recommendations #14

Open
bansp opened this issue Mar 30, 2021 · 16 comments
Open

Check where data-deposition centres point for format recommendations #14

bansp opened this issue Mar 30, 2021 · 16 comments
Labels
centre data issues regarding data provided by individual centres (name them in the issue description) cyclic task task that needs to be repeated cyclically, e.g. before a release

Comments

@bansp
Copy link
Member

bansp commented Mar 30, 2021

Date stamp: 23 October 2024

0. Introduction

0.1. About this very note

This lead comment of ticket #14 in the repository of the Standards Information System (SIS) is part of the release packages of CLARIN recommendations for data-deposition formats, and as such it gets updated at least once per release. The minimal amount of information expected to change cyclically is an update of the date string, after having confirmed that the information is current. Ideally, we are hoping for the centres listed in sections 1-4 below to gravitate towards section 5, which lists centres that maintain information in the SIS.

The information contained herein is meant to assist various bodies in the broadly conceived CLARIN governance (notably the Standards and Interoperability Committee, the BoD and the NCF, the Assessment Committee, and especially the Technical Centres Committee) in their review- and decision-making processes.

This page is located at https://github.com/clarin-eric/standards/issues/14 . Please post remarks, updates and/or corrections in the comments section at the bottom.

0.2. General introduction (a.k.a. "Why bother")

CLARIN centres often offer deposition services.
B-centres that offer such services are obligated (this is an (re-)assessment precondition, formulated as part of the CoreTrustSeal requirements) to publish explicit information about data formats that they recommend for depositions. For non-B-centres, this is not a requirement, but it is not uncommon, depending on the centre's profile and infrastructure. That obligation/practice has been encoded in one of CLARIN's Key Performance Indicators, using the following measurement: "percentage of centres offering repository services that have published an overview of formats that can be processed in their repository". (Thus the KPI measurement encompasses centres with deposition services, whereas the CTS requirement pertains to B-centres with deposition services, i.e., a subset of the KPI target group; for more details, quotes and references, see section 4 of chapter "Standards in CLARIN", by Piotr Bański and Hanna Hedeland (2022), in the CLARIN Book.)

Before the SIS took wing, the requirement / good practice of publishing explicit information on recommended formats had been addressed in the following ways:

  1. publishing the information somewhere at the centre (or consortium) homepage;
  2. not publishing that information and instead directing users to by now obsolete sets of recommendations (called "external guidelines" in what follows) that are far too general to represent the given centre's research profile;
  3. using a mixture of the above approaches.

There was/is also a fourth group, consisting of centres with deposition services that wouldn't publish such information at all, not even as a link. It is hoped that this group is going to dissipate soon, especially thanks to the recent initiative by the Technical Centres Committee, inviting centres to deposit the relevant information in the SIS.

This very ticket is devoted mainly to collecting information on centres with depositing services that point to external sources of information on format recommendations, especially if those sources are not very informative. In other words, we are looking mainly at groups 2 and 3.

The reason for listing this information is:

  • to have general info on the "trends" in handling the assessment requirement,
  • to produce information potentially useful at least to the Technical Centres Committee and to the Assessment Committee,
  • to identify the specific targets to see if something needs/may be done about them in order to tighten the information system,
  • to assist centres in publishing centre-specific data-deposition format recommendations.

Eventually, the format recommendations are expected to be collected in the Standards Information System. It is possible for centres to store that information in the SIS, and to present it to users with a dedicated link, such as:

https://standards.clarin.eu/sis/views/view-centre.xq?id=IDS

It is also possible to retrieve the information from the SIS already pre-structured, as XML, to be styled according to the given centre's guidelines and publish on that centre's pages, this way avoiding the chore of maintaining two separate sets of data (for more on that, see the API section of the SIS).

0.3. Methodology

The primary resource assumed for this task is the CLARIN centre registry at https://centres.clarin.eu/ .

Two secondary resources are:

The secondary resources appear to depend on the CLARIN centre registry and a degree of hand-crafting (and therefore a potential update lag) may probably be assumed of the depositing-services page.

A tertiary resource is the list provided by the SIS, at https://standards.clarin.eu/sis/views/list-centres.xq . While one might be tempted to assume that that list should be at least semi-automatically derived from the centre registry, it actually provides a small potential layer of indirection, at least in two aspects: firstly, we allow centres to override the shorthand handles that are listed in the registry (and thus, for example, at the centre's request, "CLARINSI" is listed as "CLARIN.SI") and, secondly, we are prepared for a degree of "ontological" or organisational variability in the case of centres that act as nodes in more than a single research-infrastructure network. In short: centres can influence their listings in the SIS in various ways, independent of the CLARIN registry.

The B-centre status is conditioned upon a successful round of certification, managed internally by the Assessment Committee, and externally by a certification authority, currently the CTS and, in the future, also nestor. (Note, incidentally, that full-fledged methodology would probably ideally start from the CTS database as the primary source, but do forgive us for not trying to shoot gnats with rockets -- the amount of time allocated to this already extensive exercise should be reasonable). The CLARIN registry has various status strings for centres that wish to achieve the B-status, whether for the first time or having lost it and preparing for another certification round -- it is, as of June 2024, "Aiming for B", "Aiming for B.", "aiming for B" (kudos for consistency) but also "none" or "Certification expired, renewal planned". In the present note, all such centres, together with "regular" C-centres, are going to be treated as "Non-B centres". Note that, especially for centres tagged as "none" in the registry, some degree of network-internal knowledge is going to be necessary for stating which of the centres are temporarily not B only because they are getting, or preparing to get, re-certified. There's no guarantee that that knowledge is perfect, so this is a weak point in the methodology.

The tables below are constructed by scrolling along the CLARIN registry and the SIS list in parallel, taking into account (a) B-centres and (b) non-B centres with deposition services that are known as such (note: this is a weak spot, some may escape, and the secondary CLARIN list is not trusted fully). In the process, the SIS list is updated wrt the CLARIN registry, and the result is sorted into the categories provided in the sections that follow. Doubts that arise wrt to the nature of individual centres are usually signalled by GitHub issues that use the "centre data" label.

0.4. Terminology

When, in what follows, a centre is said to "point to external guidelines", those guidelines are in too many cases general, top-down, coarse-grained standards recommendations that were formulated well over 10 years ago and were meant for a purpose different than informing users about centre-particular recommendations on what kinds of data the given centre can handle or is interested in handling. While such pointers are surely provided in good faith, they can at best be considered tricks for passing CTS certification. Otherwise, for practical purposes, they don't get the thing done.

Another piece of terminology: "listed in the SIS" vs "curated": some of the content in the SIS comes from rather quick import of information that was structured rather differently back when the Standards Committee worked with spreadsheets. A lot of interpretation happened on the way between spreadsheets and the SIS, justified by the hope that the centres would quickly want to fix that if they were not happy with the outcome. It later turned out that we were a non-tiny bit too hopeful about that. The "legacy" listings, not approved by the particular centres, are accompanied by a warning in red. When, on the other hand, a centrer decides to hold an inputhon and submits the result to the SIS, such recommendations are considered curated and the red warning is replaced with the name(s) of the curator(s) (see Section 5 for examples).

$${\text{* * *}}$$

What follows is information on how the particular CLARIN centres publish format recommendations or how they do not publish that info while nevertheless trying to satisfy the CLARIN-internal as well as CTS-imposed requirements. Note the date stamp at the top of this note and please do not hesitate to let us know (ideally: in the comments below) if you see that some info can/should be updated or fixed.

1. Centres that point solely to external guidelines

This section lists centres that do not provide information specific to their research profiles but rather point to general and coarse-grained information provided by CLARIN quite a while ago, in most cases in 2009 (the "LRT Standards" document, which simply doesn't help and mentions obsolete standards).

Note: it is good not to be mentioned in this section.

Methodological note: it is possible that the centres below also provide their own recommendations or even point at the SIS for that -- and that that mention has been overlooked (or it has been added after the date stamp in the table below). Gathering data for this ticket has shown that some such information can be located in non-obvious places (that is very rare, but it has happened). In such cases, a question would arise as to how effective is a hidden pointer to the SIS or a hidden table of recommendations, and how that corresponds to the need of satisfying a KPI or a CTS/assessment requirement.

1.1. B-centres

Recall that B-centres are obligated (by CTS Requirement 8) to provide explicit information on what formats they are willing to process in the deposition process. The centres below instead point to general and at least partly obsolete guidelines. Amending this situation is at this point easy: deposit the relevant information directly in the SIS -- and then point to that description.

Centre LastChecked LinkTarget SourcePage
CLARIN-LV 05-10-2024 http://www.clarin.eu/sites/default/files/Standards%20for%20LRT-v6.pdf https://repository.clarin.lv/repository/xmlui/page/faq
CLARIN-PL1 05-10-2024 http://www.clarin.eu/sites/default/files/Standards%20for%20LRT-v6.pdf https://clarin-pl.eu/dspace/page/faq#what-submissions-do-you-accept
ILC4CLARIN 05-10-2024 http://www.clarin.eu/sites/default/files/Standards%20for%20LRT-v6.pdf https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/page/faq#what-submissions-do-you-accept
LINDAT 05-10-2024 http://www.clarin.eu/sites/default/files/Standards%20for%20LRT-v6.pdf https://lindat.mff.cuni.cz/repository/xmlui/page/faq?locale-attribute=en#what-submissions-do-you-accept

1.2. Non-B centres

These centres are not obligated to explicitly publish information about what formats they recommend for deposition. However, that is both useful for the users themselves, and also crucial for satisfying the relevant CLARIN KPI. Also, these centres are listed as aiming for the "B" status (click on "Type status" to sort them at the top), so at some point they will need to undergo CTS assessment -- why not be proactive in this respect.

Centre LastChecked LinkTarget SourcePage
CLARIN-LT 05-10-2024 http://www.clarin.eu/sites/default/files/Standards%20for%20LRT-v6.pdf https://clarin.vdu.lt/xmlui/page/faq#what-submissions-do-you-accept
ERCC 05-10-2024 http://www.clarin.eu/sites/default/files/Standards%20for%20LRT-v6.pdf https://clarin.eurac.edu/repository/xmlui/page/faq#what-submissions-do-you-accept
SADiLaR 05-10-2024 http://www.clarin.eu/recommendations
https://archive.mpi.nl/accepted-file-formats
https://sadilar.org/en/submit-a-resource/
  • SADiLaR points to mpi.nl recommendations, which is definitely more helpful than just pointing to the "LRT standards" document, although a question arises as to whether the recommendations pointed to are actually the centre's own recommendations, given that the centre doesn't have any control over the referenced list. But that is for the CTS to assess. Pointing to the "recommendations" at clarin.eu (after that page has been changed) is a big plus.

2. Centres that point to external guidelines in addition to publishing own information locally

Just like centres listed in section 4 below, those listed in this section fulfil the CTS requirements by publishing explicit requirements concerning formats in which data can be deposited with them. They have done a splendid job. The place where their won recommendations are published are listed in the last column of the table below.

The focus of this note is to see where centres point for external information, and, in particular, to catalogue the (let's call them) suboptimal places where users are directed, so that something can be done about that. There is nothing wrong in pointing to an external source in addition to the centre's own recommendations published on the centre's own pages, especially if the external resource brings in some extra value (see ACDH-ARCHE for an example, pointing to Archeology Data Service recommendations). On the other hand, it is not so good to point to obsolete, unhelpful or misleading documents, such as the "LRT standards" PDF.

Thus, the role of this section is basically informative, though with a request directed at the centres enumerated below, to consider sharing at least the positive recommendations in the SIS, in order to enable aggregation of this information and publishing it for the benefit of the community.

(There seems to be no need to split the centres listed here into B- and non-B-. Centres that point to external info AND at the same time maintain their recommendations in the SIS, such as CLARIN-CH, are only listed in Section 5 below, and potential "suboptimal" external links are enumerated under that last table.)

Centre LastChecked LinkTarget SourcePage Own info
ACDH-ARCHE 05-10-2024 a.o. https://www.clarin.eu/content/standard-recommendations https://arche.acdh.oeaw.ac.at/browser/formats-filenames-and-metadata same
BBAW 05-10-2024 https://www.clarin-d.net/en/language-resources-and-services/user-guide https://clarin.bbaw.de/en/repo/ same
CLARIN.SI 05-10-2024 http://www.clarin.eu/sites/default/files/Standards%20for%20LRT-v6.pdf
https://www.clarin.eu/content/standards-and-formats
https://www.clarin.si/repository/xmlui/page/data same
DH-REP 05-10-2024 https://files.dnb.de/nestor/materialien/nestor_mat_08_eng.pdf https://repository.de.dariah.eu/doc/services/data-policies.html#recommendations-and-list-of-preferred-formats same
ORTOLANG 05-10-2024 https://www.clarin.eu/content/standard-recommendations https://www.ortolang.fr/en/help/data-formats/ https://facile.cines.fr/ via https://www.ortolang.fr/en/home/about/
TGrep 05-10-2024 https://files.dnb.de/nestor/materialien/nestor_mat_08_eng.pdf https://textgridlab.org/doc/services/data-policies.html#preferredformats same
UdS 05-10-2024 http://www.clarin.eu/recommendations https://fedora.clarin-d.uni-saarland.de/ressources/AcceptedFormats.en.pdf same, via https://fedora.clarin-d.uni-saarland.de/depositors.en.html
  • CLARIN.SI provides its own info in an exemplary way; it has a "see also" section with links but that section is only for the explorers, because all the relevant data are served on a silver plate. Note: the links to the old CSC spreadsheets are probably an overkill (the info is extremely obsolete) -- a link to the SIS (at least to the general recommendations, if not to the listing of the SI consortium) would be appreciated.
  • TGrep recommendations are not easy to locate. That can be considered a usability or user-friendliness issue.

3. Centres that neither point anywhere nor publish their own explicit information

This set of centres should be empty, unless the centre does not offer deposition services (in which case, it shouldn't be listed here, so... this set should be empty). Please note that rather than amending the existing lack of recommendations on their own home pages, the best course of action for these centres may be to deposit the information directly in the SIS, and then point to that listing. You do one inputhon and Bob's your... list.

3.1. B-centres

"Absence of evidence is not evidence of absence", and it might be that the centres here do publish their own recommendations, in a non-obvious corner of their homepages. Please feel very welcome to post a comment below if you are able to share info on that. Note also that if the info is hidden then it's not really easily available to the depositing users, and ensuring availability of the information is part of the reason for this entire exercise.

Centre LastChecked Comment DepositionPage
CLARIN-IS 05-10-2024 (no info) https://clarin.is/en/services/
  • Note that IS is a relatively fresh B-centre and its position on this page will hopefully change soon (this comment was posted in mid-June 2024)

3.2. Non-B centres

Some of these centres are listed as "aiming for B", some used to be B. All of them indicate that they provide deposition services.

Centre LastChecked Comment DepositionPage
CELR-EKK 05-10-2024 "all data is accepted", via Entu https://www.keeleressursid.ee/en/services
IMS 05-10-2024 no real recommendations https://wiki.ims.uni-stuttgart.de/extern/CLARIN-D
IMS (cont.) 15-06-2024 "please contact us" http://clarin04.ims.uni-stuttgart.de/repo/
MI 05-10-2024 "please contact [us]" https://meertens.knaw.nl/meertens-collectie/research-data-management/
https://meertens.knaw.nl/en/archive/depositing-data_eng_/
  • "no real recommendations" for the IMS means the line saying "Is the data in one of the acceptable formats (non-proprietary, text-based) or can it be converted?" -- how is the user to know that? :-(
  • The IMS "repo" page was not accessible on 05-10-2024
  • MI: the page in Dutch provides more overall information, but neither language version provides a list of formats recommended by the centre

3.3. Special mention

The C-centre CEDIFOR used to "point elsewhere" for data deposition recommendations, but at present it mentions neither CLARIN nor anything concerning data deposition. The maintainer was contacted around June 2024. It is not at all clear that the centre should be mentioned in this ticket (maybe it's not CLARIN, despite the centre registry, or maybe it no longer offers deposition services), so please withhold your judgement.

Centre LastChecked Page
CEDIFOR 05-10-2024 https://www.cedifor.de/?s=clarin

4. Centres that only publish their own, local recommendations

Note that this satisfies both the CTS requirement and the KPI calculation (except the KPI calculation performed dynamically by the SIS). Unfortunately, it also ensures a gap in the SIS-derived statistics that might otherwise benefit the entire network. It would be greatly appreciated if at least the data formats recommended by the centres could make it into the SIS. The table below includes both B- and C-centres.

Note: Formally, all these centres have done a splendid job. Adding their recommendations to the SIS would be a nice bonus to ensure more accurate statistics.

Centre LastChecked Linked from / Comment InfoPage
BAS 05-10-2024 https://clarin.phonetik.uni-muenchen.de/BASRepository/index.php (choose 'FAQ') https://www.phonetik.uni-muenchen.de/Bas/BasPolicyExternalResources_eng.pdf
BAS (cont.) 05-10-2024 https://www.bas.uni-muenchen.de/forschung/Bas/BasInfoStandardsTemplateseng.html https://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html
CLARIN-DK 05-10-2024 https://repository.clarin.dk/repository/xmlui/page/faq#what-data-formats-are-accepted https://repository.clarin.dk/repository/xmlui/page/formats
CLARIN:EL 05-10-2024 https://www.clarin.gr/en/services/share https://www.clarin.gr/sites/default/files/CLARINELRecommendedFormats.pdf
CMU 05-10-2024 https://talkbank.org/ https://talkbank.org/share/contrib.html
COCOON 05-10-2024 https://cocoon.huma-num.fr/exist/crdo/faq.htm?lang=en https://cocoon.huma-num.fr/exist/crdo/formats.htm?lang=en
DANS 05-10-2024 https://dans.knaw.nl/en/depositing-data-manual/before-depositing_ds/ https://dans.knaw.nl/en/file-formats/
EKUT 05-10-2024 menu on the main page https://talar.sfb833.uni-tuebingen.de/datamanagement/
IVDNT 23-10-2024 https://portal.clarin.ivdnt.org/ https://portal.clarin.ivdnt.org/information-about-deposition.html
IVDNT (cont.) 05-10-2024 access to the recommendations is not obvious https://portal.clarin.inl.nl/doc/information_about_deposition_INT.pdf
LAC 05-10-2024 https://dch.phil-fak.uni-koeln.de/bestaende/language-archive-cologne/user-guides https://dch.phil-fak.uni-koeln.de/bestaende/language-archive-cologne/user-guides/format-whitelist
MPI-PL 05-10-2024 https://archive.mpi.nl/tla/ + "Help" https://archive.mpi.nl/tla/accepted-file-formats
TROLLing 05-10-2024 https://site.uit.no/dataverseno/deposit/ https://site.uit.no/dataverseno/deposit/prepare/
ZIM 05-10-2024 https://informationsmodellierung.uni-graz.at/en/about-the-department/research-data-repository-gams/ https://gams.uni-graz.at/context:gams?mode=about&locale=en
  • ZIM uses GAMS as the repository; it is a repository for digital data from the Humanities / SocSci. The move from ZIM to GAMS required a search for "CLARIN" on the homepage -- there was no obvious path that I could find. It is probably fair to say that the list of the recommended formats is rather coarse-grained. See also issue ZIM: not sure where to find the format list #10
  • EKUT = TALAR, MPI-PL = TLA
  • CLARIN:EL has warned that the paths above may change
  • IVDNT provides its recommendations in two variants (no time stamp in the PDF document -- somewhere, a historian is crying), in a format that should be easy to feed into the SIS. The status of the PDF document is not fully obvious because it does not seem easy to reach the recommendations (or the repository itself) from the homepage of IVDNT. It has crossed my mind that maybe the link to the PDF above is a deep link to an orphaned document (it comes from my old notes).

5. Centres that point at their curated recommendations in the SIS

This is where all (or most of) the centres listed above should ideally end up -- what is needed for them is to maintain the information served by the SIS and explicitly link to it. ("Ideally" from the point of view of contributing to the aggregated information; note that for centres in sections 2 and 4, this is a matter of willingness and sparing the time; they are otherwise fine from the point of view of certification and KPI calculation done by hand, rather than in the SIS).

Centre LastChecked SourcePage
CLARIN-CH 04-10-2024 https://clarin-ch.ch/documentation-platform/standard-data-formats
CLARINO_Bergen 05-10-2024 https://repo.clarino.uib.no/xmlui/page/faq#what-submissions-do-you-accept
FIN-CLARIN 05-10-2024 https://www.kielipankki.fi/tuki/tekninen-muoto/
IDS 05-10-2024 https://repos.ids-mannheim.de/reposdescription.html
OTA 05-10-2024 http://www.clarin.eu/sites/default/files/Standards%20for%20LRT-v6.pdf
PORTULAN 05-10-2024 https://portulanclarin.net/usage/#how
SAW 05-10-2024 https://repo.data.saw-leipzig.de/depositing/en
Språkbanken 05-10-2024 https://repo.spraakbanken.gu.se/xmlui/page/faq#what-submissions-do-you-accept

6. Conclusions

6.1. One conclusion that should be drawn from the picture above that the FAQ contained in the LINDAT customisation of DSpace (the deposition system that unifies many repositories, currently) should no longer point users at the LRT PDF but

As of October 2024, this is being handled in issue #22 .

6.2. Following up on the above, the default landing page (content/standard-recommendations) should, at the top, point at the combined recommendations in the SIS. (that got handled on 17-06-2024)

6.3. Centres which provide their own extensive recommendations will hopefully be willing to share at least their recommended (as opposed to accepted and discouraged) formats, so that (a) the KPI can be properly calculated in the SIS, and (b) so that the statistics of popular formats are not skewed due to the lack of data coming from those centres. This can only be a matter of argumentation and appeal to the "common good" vs. the various restrictions on those centres (time being the most commonly cited one).

@bansp bansp added help wanted task centre data issues regarding data provided by individual centres (name them in the issue description) labels Mar 30, 2021
@bansp bansp pinned this issue Mar 30, 2021
@bansp
Copy link
Member Author

bansp commented Mar 30, 2021

Let us edit the leading note, to extend (or shrink!) the table.
The table might be a good addendum to the April release, so let me set a milestone here.

@TomazErjavec
Copy link
Collaborator

I agree, interesting table, thanks!

@bansp
Copy link
Member Author

bansp commented Mar 31, 2021

I seem to recall Leif-Jöran pointing me to a stock data deposition guidelines for the Sprakbanken (pointing at the "Standards for LRT" PDF, I think), but I am unable to locate that page now at https://spraakbanken.gu.se/en

@bansp
Copy link
Member Author

bansp commented Mar 31, 2021

I seem to recall Leif-Jöran pointing me to a stock data deposition guidelines for the Sprakbanken (pointing at the "Standards for LRT" PDF, I think), but I am unable to locate that page now at https://spraakbanken.gu.se/en

Thanks to Hanna for digging up https://repo.spraakbanken.gu.se/xmlui/page/faq#what-submissions-do-you-accept for me. I spent quite a while at the Sprakbanken site yesterday night, trying to find my way to the deposition guidelines as a "naive user coming from outside", which makes me wonder how (un)easy it is to find that. I'd be grateful for an independent check.
And in the meantime, I'll update the table.

@bansp
Copy link
Member Author

bansp commented Mar 31, 2021

OK, the way to the deposition info at Gothenburg is through the "Tools" in the menu, then one has to navigate to the item mentioning CLARIN, and that takes them to the repository page. So I simply overlooked this route yesterday.

@bansp
Copy link
Member Author

bansp commented Jul 19, 2021

I've just gone through all the links and verified that the info is current. Getting the ticket exported as PDF requires a lot of tinkering in the "Inspect" box and then using custom zoom (55%) in the print window.

@bansp
Copy link
Member Author

bansp commented Oct 15, 2021

Update needed to mention Iceland:
https://repository.clarin.is/repository/xmlui/page/faq#what-submissions-do-you-accept

Apart from that, divide the list into three rather than two, with a separate part for centres that do mention their own recommendations while also referencing the "LRT standards" document (mostly that one, because of the stock FAQ).

@bansp
Copy link
Member Author

bansp commented Nov 3, 2021

Costanza Navarretta has just kindly pointed me to the recommendations for CLARIN-DK:
https://info.clarin.dk/en/the-clarin-dk-infrastructure/recommended-standards-and-formats/

I'll redo the table in 3 parts within daaays, I hope.

@bansp
Copy link
Member Author

bansp commented Nov 4, 2021

(Ah, the reason CLARIN-DK isn't mentioned above is that it doesn't point externally, and the topic of this ticket is external pointers)

@bansp
Copy link
Member Author

bansp commented Nov 13, 2021

While updating the info, I'm unable to access https://repo.spraakbanken.gu.se/xmlui -- making a note of that here, to check again on Monday.

@bansp
Copy link
Member Author

bansp commented Nov 14, 2021

I will take a snapshot of the ticket on Monday afternoon, publish the snapshot and reset the milestone to 1.1.

@bansp
Copy link
Member Author

bansp commented Nov 21, 2021

Posted a snapshot, moving the ticket to milestone 1.1.

@bansp
Copy link
Member Author

bansp commented Oct 13, 2022

@bansp
Copy link
Member Author

bansp commented Oct 5, 2023

I have gone through the tables above and posted short updates. In a few cases, I crossed a centre out (that's actually good!), in one case, I moved a centre from "aiming at B" to "B" (congrats!). This was a quick check, so in case you seen an error or omission, please post a note here.

Overall impression after 2 years: revolution still needs to happen.

@bansp
Copy link
Member Author

bansp commented Oct 24, 2023

Maria Gavriilidou has sent me the following info on CLARIN:EL:

@bansp
Copy link
Member Author

bansp commented Sep 28, 2024

For the record, this is the content of my comments in the CLARIN progress report document, concerning the KPI calculation. The initial motivation of my comments was that, apparently, only B-centres were counted for the relevant KPI, whereas it seems that that is not consistent with what the description of the KPI says.

"Centres offering depositing services" are not only B-centres. I count currently 42:

Currently, I count (but this calculation may be imprecise wrt CLARIN registry, we sync only occasionally because we don't know when changes in the registry occur) 22 B-centres: https://standards.clarin.eu/sis/views/list-centres.xq?status=B-centre&submit=Filter
(but: see above, these are just B-centres, and the KPI does not say "B-centres"). Note that only 4 out of these 22 have submitted their preferences to the SIS. Some of them, however, publish their preferences on their own.

A different set of calculations that I last updated on 23 July 2024 comes from #14 (i.e., this ticket).
Note that this ticket does not fully distinguish between B-centres and others, because it focuses on centres with depositing services, mentioning B-centres only in cases where they either seem not to fulfil certification criteria or when they skirt them (mostly in good faith, probably) by referencing a by now obsolete document in the clarin.eu domain that actually says something different than those centres are expected to say.
So, this ticket has 42 as its basis. Out of the 42, I count:

  • 8 centres that publish their own information and point somewhere else (sometimes sensibly, sometimes not) -- see section 2
  • 12 centres that publish this information on their pages only (section 4)
  • 6 centres that publish their preferences via the SIS (section 5)

Apart from that:

  • 7 centres that link somewhere instead of actually providing the relevant information (section 1)
  • 1 centre (Språkbanken) with an unclear status in this respect: recommendations in the SIS but not pointing to them from their own page (instead pointing elsewhere)

Based on the above, the most restrictive count (and including Språkbanken, sigh) is: 8+12+6+1 = 27 against 42 (~ 64%). Including centres that point at the sky saying that they fulfil the KPI this way (7), we get 34 against 42 (~ 81%).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
centre data issues regarding data provided by individual centres (name them in the issue description) cyclic task task that needs to be repeated cyclically, e.g. before a release
Projects
None yet
Development

No branches or pull requests

2 participants