-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripts #320
Comments
In my opinion, the list should absolutely not be flattened. You say that with the present list, if "we want to figure out the number of inscriptions in Southern Brāhmī, it is necessary to count recursively: the answer is count(Northern Brāhmī) + count(Vaṭṭeḻuttu) + count(Telugu) + count(Tamil)". In other words, counting the inscriptions in Southern Brāhmī is complicated but possible. With a flat list, counting the inscriptions in Southern Brāhmī would be simply impossible, since there would be no indication that Vaṭṭeḻuttu etc. are also kinds of Southern Brāhmī. Am I missing something here? I do not understand what you mean by "The hierarchy is also not encoded in the schema." Why should it be and how could it be? It's an OpenTheso vocabulary, and the hierarchy is encoded there. If there is a problem in getting faceted search to "talk to" OpenTheso, then this should be worked out between you and Adeline. One way I can think of to keep the cake and eat it would be to permit more than one "class" in |
What I am suggesting is a slight change in semantics. Currently, you can, for Instead of this, I propose to remove Southern Brāhmī from the list of categories Take the following hierarchy, for instance:
This would be replaced with:
And the schema would only allow you to choose between these categories:
|
I understand perfectly what you are proposing and I repeat: I do not consider this acceptable, for the reasons I explained above. Tamil script is a kind (and not a descendant) of Southern Brāhmī and the same applies (mutatis mutandis) to all of the lower-level categories. A search to retrieve inscriptions in any kind of southern Brāhmī (including Tamil and all other subclasses) is a meaningful search that users might want to do, but it will become impossible in the scheme you propose, unless the search employs tick boxes for the script classes and the user must tick each kind of southern Brāhmī to accomplish that search. There is another reason, namely that script classes are pretty fuzzy. Tamil may be a special case, always recognisable to Tamilists as clearly and unequivocally different from non-Tamil (though I doubt that), but most of the other subclasses don't have a clear boundary where they begin. For instance, the script of some of my Eastern Cālukya inscriptions could arguably be labelled as Telugu. I don't think I've ever used that script class, but if I ever do (for one of my late inscriptions), that will not mean that there is an essential difference between the script of that inscription and the script of another one, say 50 years earlier, which has been labelled with the generic label. The lowest-level script classes exist because some texts use specifically nameable scripts. This is not so with the majority of inscriptions. A Tamilist working on an inscription would not classify its script as "southern Brāhmī" if they identified the script as Tamil. The higher-level labels are to be used when a lower level label does not apply unequivocally (see the Memo on Controlled Vocabularies), and not optionally or randomly. The situation where "you cannot really know how many inscriptions are in Tamil" does not arise, because an inscription written in (unequivocal) Tamil will be labelled as such by its encoder and because when you zoom in on the fuzzy boundary between Tamil and non-Tamil, the question ceases to be meaningful. Finally, I should note that the hierarchy is important not only between the middle and lowest levels, but also between the highest and middle levels. Restricting a search to Brāhmī inscriptions (i.e. including all kinds of Brāhmī while excluding e.g. Arabic, Chinese or - in the future, if our system remains in use, Kharoṣṭhī) is a meaningful thing and a valuable research tool which would likewise be impossible (or very tedious) in a flat hierarchy. |
So in fact we are agreeing. I can work with that. |
For the record, I see no agreement here as regards the topic of this thread, only in the functional detail that the token for a higher hierarchical class will only be used in encoding practice when none of the lower classes is applicable. If that satisfies you, then well and good. But this absolutely does not mean that the conceptual hierarchy is or can be flat. The higher categories incorporate their descendants. |
We have a hierarchical classification of scripts on opentheso, as follows:
The schema allows in
@rendition
scripts that have subcategories (Brāhmī, Southern Brāhmī, Southeast Asian Brāhmī, Northern Brāhmī and Arabic). Some of them are used. We have the following frequency distribution:This is too inconvenient for machine processing. I am thinking about faceted search, in particular. If, for instance, we want to figure out the number of inscriptions in Southern Brāhmī, it is necessary to count recursively: the answer is count(Northern Brāhmī) + count(Vaṭṭeḻuttu) + count(Telugu) + count(Tamil). The hierarchy is also not encoded in the schema.
I would be much happier if we used a flat list of scripts (as for languages). For instance, the following:
... could be transformed to:
... where "Southern Brāhmī" means "any Southern Brāhmī script that is not Vaṭṭeḻuttu, Telugu or Tamil". Can we agree on this interpretation?
The text was updated successfully, but these errors were encountered: