This page presents and defines the maturity classification system of this site. At the bottom of the page comes a description of how to add and change maturity tags.
In the GielllaLT infrastructure we use a five-step classification to broadly describe the quality and development level of various linguistic resources. These categories are used as labels in README files, on the documentation front page for each resource, as well as in the overview pages for language models, dictionaries, keyboards and spell checkers (the maturity level of grammar checkers, machine translation applications and speech technology are still undefined). The labels look like the following:
No. | Label | Type | Colour |
---|---|---|---|
1. | Production | green | |
2. | Beta | yellow | |
3. | Alpha | red | |
4. | Experiment / student exercise | black | |
5. | Undefined | grey |
Some of the criterias for the various levels are common for all resource pages and listed under General criteria. Other criteria are application specific:
Used when the maturity is not definable, or has not yet been defined/tagged.
This category also covers student exercises (published with permission). The point of such exercises is not to make a working system, but to explore the possibilities for language technology. Such work can of course be extended and in the end result in a fully working, production tool.
- license not required, but is nice
- may not build at all
- Divvun Manager:
- might not be available
- if available: only available in the nightly channel
- rule of thumb: may not work at all
- fragmentary grammar
- less than 1k lexical entries
- less than 1k lexical entries
- all letters may not be included
- layout experimental, will change
- see language model above
- no adaptation of error model
- no weighting corpus
- license highly recommended
- Divvun Manager:
- is available
- only available in the nightly channel
- rule of thumb: it can be built locally and used for something
- grammar model mostly complete
- lexicon between 1k and 10k entries
- entries from different parts of speech
- lexicon between 1k and 10k entries
- layout mostly done, may still change
- all letters in alphabet included
- Program works, corrects text, and is of some use
- there should be a proper license
- CI/CD working for the tools being provided
- Divvun Manager:
- is available
- is available in the stable channel
- NOT visible on the front page, only via the
All languages
view
- rule of thumb: it can easily be installed via Divvun Manager - it must be testable by the user community
- grammar model complete
- lexicon has more than 10k entries
- running text coverage above 80 %
- different parts of speech treated differently
- lexicon has more than 10k entries
- layout complete for all levels and input methods
- The number of false positives is below 20 %
- Correction mechanism gives relevant connection in top-5 in most cases
- there must be a proper license
- at least one contact person in the language community that is willing to or being payed to be a first line support person and language resource maintainer, public contact email or other contact info
- CI/CD working for the tools being provided
- Divvun Manager:
- is available
- is available in the stable channel
- IS visible on the front page
- Release
1.0.0
or higher of either speller or analyser/giella-XXX
package - rule of thumb: it is easily installable via the One-click installer or Divvun Manager front page
- grammar/model/layout complete
- lexicon has more than 30k entries (but subject to realworld realities & limits)
- running text coverage above 90 %
- lexicon has more than 20k entries
- lemma articles are structured according to lemma type
- layout complete and evaluated for all levels and input methods
- The number of false positives is below 5 %
- Correction mechanism gives relevant connection in top-5 in almost all cases, in top position in most cases
The maturity classification is done using GitHub topics.
Maturity badges in README's, documentation and elsewhere are generated automatically from these topics, and they are also used in the keyboard and language resource lists to group the repos automatically.
Adding maturity tags is done via GitHub topics, and can only be done by repo or organisation owners or admins. It is also possible to use gut
to set the topics from the command line if they do not exist, but presently it is not possible to remove or change GitHub topics.
The topic tags corresponding to the labels above are as follows:
The category does of course not have a topic - that is the definition of the category. In the lists and tables linked to above it should ideally be empty, but it is listed in any case to easily spot repositories that do not yet have a defined maturity class.
The maturity tags are turned into json
endpoints for shield.io, and stored in the gh-pages
branch of each repository. This is done automatically by the CI on each push to GitHub, but requires that GitHub Pages have been configured for the repo.
There should be only ONE maturity tag pr repo. — It is technically possible to add more maturity tags to a single repo, but that does not make much sense and will probably cause the json
file creation to fail.