Description
We have a hierarchical classification of scripts on opentheso, as follows:
Kharoṣṭhī
Brāhmī
...Southern Brāhmī
......Vaṭṭeḻuttu
......Telugu
......Tamil
...Southeast Asian Brāhmī
......Sundanese
......Pyu
......Old West Javanese
...Northern Brāhmī
......Siddhamātr̥kā
......Nāgarī
......Mon-Burmese
......Khmer
......Kawi
......Kannada
......Grantha
......Gauḍī
......Chinese
......Cam
......Bhaikṣukī
......Batak
......Balinese
Arabic
...Jawi
The schema allows in @rendition
scripts that have subcategories (Brāhmī, Southern Brāhmī, Southeast Asian Brāhmī, Northern Brāhmī and Arabic). Some of them are used. We have the following frequency distribution:
864 Tamil
767 Khmer
635 Grantha
544 Southern Brāhmī
99 Cam
58 Vaṭṭeḻuttu
52 Kawi
13 Kannada
5 Brāhmī
3 Undetermined
1 Telugu
1 Southeast Asian Brāhmī
This is too inconvenient for machine processing. I am thinking about faceted search, in particular. If, for instance, we want to figure out the number of inscriptions in Southern Brāhmī, it is necessary to count recursively: the answer is count(Northern Brāhmī) + count(Vaṭṭeḻuttu) + count(Telugu) + count(Tamil). The hierarchy is also not encoded in the schema.
I would be much happier if we used a flat list of scripts (as for languages). For instance, the following:
Southern Brāhmī
...Vaṭṭeḻuttu
...Telugu
...Tamil
... could be transformed to:
Vaṭṭeḻuttu
Telugu
Tamil
Southern Brāhmī
... where "Southern Brāhmī" means "any Southern Brāhmī script that is not Vaṭṭeḻuttu, Telugu or Tamil". Can we agree on this interpretation?