Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umlaut vowels in the Zotero Short Title #318

Open
danbalogh opened this issue Jun 7, 2024 · 14 comments
Open

Umlaut vowels in the Zotero Short Title #318

danbalogh opened this issue Jun 7, 2024 · 14 comments
Assignees

Comments

@danbalogh
Copy link
Collaborator

The Zotero Guide tells us that "spaces, hyphens, diacritics and any other non-letter signs should be removed" from author names when creating a ZST. But it turns out that German names with an umlaut vowel (ö, ü, ä) have been treated inconsistently and, mostly, wrongly if we are to heed the letter of the guide. Should an exception be made to the rule that German names including one of these vowels must be represented as oe, ue and ae respectively? I worry that it may not always be straightforward to tell which name counts as German, and that non-European colleagues may have difficulty with this in general. It also bothers me a little that we could do the same with e.g. Hungarian names, where the practice of using vowel combinations was well established in the age of telegrams, although unlike in German, it went out of currency with personal computers, and diacritical marks are generally just dropped when they cannot be written. And Scandinavian?

Anyway, what we now have is quite inconsistent, e.g.

  • digraph: Buehler, Boehtlingk, Fluegel, Buehnemann, Hinueber, Huesken, KiefferPuelz, Lueders, Mueller (these are the most numerous)
  • dropped accent: Bohtlingk, Kolver, Luders (these are rare)
  • actual diaeresis in the ZST: Hinüber, Müller, Gräfe, Härtel, HauserSchäublin (these are definitely not correct by the ZST)

Any change to existing short titles of course requires checking the XML files for references.
Any ideas what to do? Just live with the inconsistency?

@manufrancis
Copy link
Collaborator

If Michaël confirms that having umlaut or other diacritical marks in ZST is not a problem, I guess we could adapt the ZG on this and just use in ZST names with umlaut or other diacritical marks.

@michaelnmmeyer
Copy link
Member

michaelnmmeyer commented Jun 13, 2024 via email

@danbalogh
Copy link
Collaborator Author

Hang on a moment. Are you suggesting to change the Zotero guide so that names in the ZST must include diacritical marks (and then, I assume, other special characters such as apostrophes) and be identical to the name field? Or that names in the ZST can optionally include diacritical marks?
The former would require that we revise hundreds of existing Zotero entries and any already existing references to them throughout the corpus. The latter would make our Zotero short titles inconsistent, making it more difficult to keep track of number suffixes and generally increasing the chance of human error. Neither would bring us any noticeable gain, and importantly, neither would solve the problem that at the moment, existing ZSTs such as Buehler contradict the ZG.

@manufrancis
Copy link
Collaborator

@danbalogh
Your are right. This was an intrepid suggestion, without noticeable gain.
I can just live with the inconsistency.

@arlogriffiths
Copy link
Collaborator

I am, like Dan, quite averse to the idea of allowing freedom on this point and am inclined to suggest we must go through all offending ZSTs and make them compliant with the ZG, rather than modify the ZG.

Looking at the names represented with digraph, I don't think many are directly relevant to our database. Some of the ZSTs represented with diaeresis may have been used in our XML files but I don't expect they'll be very numerous.

If @michaelnmmeyer can list all offending ZST, and perhaps even automate their correction, I'll be happy to work on updating any XML files where wrong ZSTs have been used.

@arlogriffiths
Copy link
Collaborator

I don't think living with inconsistency is a good idea on this point but I am willing to do a lot of the work, and can find other people willing to help, so Manu doesn't need to lose any time with contributing to the cleaning process.

@danbalogh
Copy link
Collaborator Author

The problem with automation, or even just listing, is that there are different kinds of "offenders" which have to be dealt with in different ways. The digraphs ae, oe, ue (also with uppercase initial) can occur legitimately or illegitimately, and I don't think it's feasible to write an algorithm that would compare ZSTs containing one of these with the author (etc.) name fields to see if they are also present there. So human attention is needed to check all of these and keep only the illegitimate ones on the list. On the other hand, ä, ö, ü (and any other characters other than [a-zA-Z0-9_\+]) are illegitimate. The first three could be blindly corrected to a, o and u respectively, but the number block must then also be checked, since a ZST might potentially already exist with the umlaut-less vowel and the same number.
Then again, the actual replacement probably could and should be automated, and done at a single point of time in the entire corpus (as well as in the Zotero database), not piecemeal as each of us find time to correct our own files. What we need for that is a list with two columns, the first with existing incorrect ZSTs, and the second with the correct replacement. I guess it should be feasible to write an algorithm that would then go over that list row by row and do the replacements corpus-wide; I don't know if it is also possible to do this on our Zotero database, or if it would have to be done manually there.

@michaelnmmeyer
Copy link
Member

I confirm that using diacritical marks in short titles would be OK (the Zotero app performs the necessary transformations).

Before editing existing entries: are people using short titles in other contexts than within DHARMA editions? I can guarantee referential integrity in this context (though this is not done for now), but if people are using short titles as references somewhere else, modifying them might break their work.

@danbalogh
Copy link
Collaborator Author

I repeat: we do not want diacritical marks in ZSTs.
We are talking about changing the existing ZSTs involving diacritics or digraphs to plain vowels, to conform to the ZG.
I am not aware of ZSTs being used anywhere other than in the XML editions.

@danbalogh
Copy link
Collaborator Author

We need a decision (and, I think, explicit confirmation of it from the PIs) here.

A. Have we agreed to stick to what the ZG says and record Short Titles with the diacritics simply removed, as in Coedes for “Cœdès” (thus, also Buhler for "Bühler")? If yes, we'll need to start listing and correcting the existing entries that don't conform to this, as I outlined above.

If we don't want to do that, then we need a different decision - what shall that be?

B. Require oe, ue and ae for German Umlaut vowels and stick to just removing the diacritic for everything else? (Then we still need to list all existing ZSTs containing a character other than [a-zA-Z0-9_\+]) and check and correct manually.)
C. Keep the inconsistency?

@arlogriffiths
Copy link
Collaborator

For me it's definitely A. So indeed Cœdès becomes Coedes and Bühler becomes Buhler.

I am regularly merging items with two different ZST's for the same bibliographic entry, one of teh ZSTs not compliant, or correcting single entries with wrong ZSTs.

Please implement the necessary clarification and add any useful examples in ZG.
Can the listing be automated?

@danbalogh
Copy link
Collaborator Author

Annette has confirmed that she agrees with A.

@danbalogh
Copy link
Collaborator Author

I've added the clarification to the ZG as a suggestion.

On automation, see my suggestions above. We'll need @michaelnmmeyer to tell us if this is feasible.

@arlogriffiths
Copy link
Collaborator

I have just (manually) weeded out all cases of Hinueber and Hinüber in ZST fields. It would be nice of @michaelnmmeyer could confirm that no such culprits remain and that there are no cases of <bibl> in our xml files where ZSTs now need to be adjusted.

It is a somewhat tedious process to carry out manually if there are lots of items concerned, because ever time an offending tag has been removed, the search window automatically empties itself so that I had to type Hinueber again. And this I had to repeated about 50 times to catch all offending ZSTs and derivative tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants