-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compliance with TG #251
Comments
@michaelnmmeyer |
@manufrancis Je peux m'en occuper si tu ne l'as pas déjà fait, c'est vite fait. @arlogriffiths Quite a lot of people use ṃ ṛ ṝ ḷ ḹ (probably because their input method system uses these characters). I believe it is safe to globally substitute ṃ ṛ ṝ with ṁ r̥ r̥̄, and maybe ḹ with l̥̄, but substituting ḷ with l̥ should only be done for Sanskrit. Is this correct? |
|
Do wait for Arlo's opinion. As far as I can see, all of those substitutions except ḷ are safe. The consonant ḷ also occurs in Sanskrit other than Grantha, for example many of my Eastern Cālukya inscriptions (in names [including the name Cāḷukya itself] as well as in non-standard spelling of Sanskrit words). So that character should be left alone, but since the actual vocalic l̥ occurs less than once in a blue moon, I don't think that should be a problem. |
@manufrancis @danbalogh Oops, OK. |
The vocalic l̥ is actually common in Old Jasvanese inscriptions, as shotthand for the syllable lə. Conversely, retiorflex consonant ḷ does not occur there at all. So I'd expand the scope of Manu's rule "substituting ḷ with l̥ should only be done for Sanskrit" to "substituting ḷ with l̥ should only be done for Sanskrit and Old Javanese". I agree with all Manu and Dan have said. |
I think I have to correct my answer of last month. Since, as Dan previously pointed our, in some cases ḷ is used in Sanskrit, my reformuation of the rule to "substituting ḷ with l̥ should only be done for Sanskrit and Old Javanese" was wrong. Cases of ṃ ṛ ṝ may occur in quotations and should not necessarily always be replaced with ṁ r̥ r̥̄. Any such replacements that are made should be limited to text and apparatus nodes of our xml files. In brief, I think in some repos known to contain cases of non compliance with TG, cases-by-case replacements can be made with due limitation to specific nodes. But I don't think it's a good idea to implement to implement any global replacement rules applying to all parts of all files. And basically everybody should (be trained to) comply with TG and cases of non-compliance gradually polished away. What is the status of your work on this issue, @michaelnmmeyer? Can we close it? |
Fixing transliteration issues is too complicated to be implemented reliably, thus I will leave it to authors. |
Perhaps you could nevertheless generate per repo a list of occurrences of ṃ ṛ ṝ ḷ ḹ? That will help encoders and PIs to follow up and weed out any cases of non-compliance with TG. |
About 1,000 texts (1/3 of our collection) use these characters, so generating a basic list would not be useful. I will try to find something for prioritizing repos and texts to check. |
thanks. for me (and I guess for most team members) it is easy to do multifile search at repo level, but not higher. so if I know that in a given repo, for which I am responsible, there are instances of the offending characters, I can search them and weed them out. |
Here is a list. Numbers within brackets represent respectively: A. Number of occurrences of ṃ ṛ ṝ ḷ ḹ We expect to find ṃ ṛ ṝ ḷ ḹ much more rarely than ṁ r̥ r̥̄ l̥ l̥̄, so repositories where this is not the case are more likely to present encoding issues. However, languages are not taken into account, so this can be very wrong (as for tfa-pallava-epigraphy).
|
I just want to add that silently normalising quoted transliteration to our transliteration scheme, even in block quotes, is acceptable to me. I'm not saying that we should do it, but if we did, then batch-replacing ṃ, ṝ and ḹ to ṁ, r̥̄ and l̥̄ would become an option. Alternatively, instances of ṃ, ṝ and ḹ that are not children of a |
I have checked my own subcorpora (Vengi, Badami and Siddham). I've found:
|
Thanks. You can presumably also help doing the same check and clean uop for maitraka, daksinakosala, telugu and bhaumakara. Ryosuke does not seem to be receiving github notifications so perhaps you can step in for bengalcharters too. Please ask Samana if she can take care of everything that is Kannada-related. @amandinebricout : can you do the same kind of check and clean-up as Dan has described above for your somavamsin files? I'll take care of everything that's tfc plus tfb-eiad. |
A slight snag in this is that back in the early days we told people in the EGD that if they have difficulty producing r̥ on their keyboards, they can use ṛ instead, and this would be converted automatically. It seems, after looking at the repositories, that a lot of people have availed of this option. I think we can be pretty sure that none of these people have also used ṛ for the NIA retroflex flap, but I think they are in a better position to decide this themselves. So I've done the checking and replacement for bhaumakara (which seems to be just a single file) and I'll start on the telugu, which I guess we can't expect Jens to solve, so I'll take a look and try to sort it out myself. For the others, I'll post here some instructions for how I would do this and ask the relevant people to do it themselves, with my help if they need it. |
One way to check for the suspect characters in your texts is as follows. @michaelnmmeyer may be able to suggest a better one, but this seems to work.
This will give you a result list with all occurrences of any of these characters in any of your files, except those within a |
The Telugu is done, except for one case where ṛ may be a typo for ̱r or an exact reproduction of a previous editor's reading, but certainly not a substitute for r̥ (I've left an XML comment there), and a couple of instances where Jens reproduces a published translation and does so exactly, with same transliteration system used in the published translation. Our guidelines say that transliteration should be silently normalised when reproducing complete translations, but I don't have the capacity to check through all the translations in Jens's files and do this, and changing just ṛ to r̥ would only make them inconsistent, so I did nothing to those. |
@manufrancis has not so far changed representation of anusvāra with ṃ to ṁ in his editions of Pallava inscriptions and hence these editions are not yet compliant with the DHARMA TG. There may be other points on which he has not brought his transliteration into compliance either. @michaelnmmeyer — Can you make the necessary replacements for him?
The text was updated successfully, but these errors were encountered: