Skip to content

Scripts #320

Closed
Closed
@michaelnmmeyer

Description

@michaelnmmeyer

We have a hierarchical classification of scripts on opentheso, as follows:

Kharoṣṭhī
Brāhmī
...Southern Brāhmī
......Vaṭṭeḻuttu
......Telugu
......Tamil
...Southeast Asian Brāhmī
......Sundanese
......Pyu
......Old West Javanese
...Northern Brāhmī
......Siddhamātr̥kā
......Nāgarī
......Mon-Burmese
......Khmer
......Kawi
......Kannada
......Grantha
......Gauḍī
......Chinese
......Cam
......Bhaikṣukī
......Batak
......Balinese
Arabic
...Jawi

The schema allows in @rendition scripts that have subcategories (Brāhmī, Southern Brāhmī, Southeast Asian Brāhmī, Northern Brāhmī and Arabic). Some of them are used. We have the following frequency distribution:

    864 Tamil
    767 Khmer
    635 Grantha
    544 Southern Brāhmī
     99 Cam
     58 Vaṭṭeḻuttu
     52 Kawi
     13 Kannada
      5 Brāhmī
      3 Undetermined
      1 Telugu
      1 Southeast Asian Brāhmī

This is too inconvenient for machine processing. I am thinking about faceted search, in particular. If, for instance, we want to figure out the number of inscriptions in Southern Brāhmī, it is necessary to count recursively: the answer is count(Northern Brāhmī) + count(Vaṭṭeḻuttu) + count(Telugu) + count(Tamil). The hierarchy is also not encoded in the schema.

I would be much happier if we used a flat list of scripts (as for languages). For instance, the following:

Southern Brāhmī
...Vaṭṭeḻuttu
...Telugu
...Tamil

... could be transformed to:

Vaṭṭeḻuttu
Telugu
Tamil
Southern Brāhmī

... where "Southern Brāhmī" means "any Southern Brāhmī script that is not Vaṭṭeḻuttu, Telugu or Tamil". Can we agree on this interpretation?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions