Skip to content

Commit

Permalink
name-parser: Allow dashes between modifier and weight
Browse files Browse the repository at this point in the history
[why]
Some fonts might have a non-standard (i.e. broken) weight naming scheme:
They put a blank or a dash between the modifier and the weight, for
example "Extra Bold" or "Demi-Condensed", when they mean "ExtraBold"
resp "DemiCondensed".

The former happens with CartographCF, the later with IBM3270.

[how]
Automatically allow a dash between modifier and weight, which comes up
as CamelCase boundary. Insert an optional dash (r'-?') into such
boundaries.
For the further lookup we need to remove the dash in the found keyword,
if there is any, to get back to standard naming.

This might break if the font name ends in a modifier. So we can not
really distinguish

       Font Name Extra Bold Italic
    => Font Name - ExtraBold Italic
    => Font Name Extra - Bold Italic

The known modifiers are 'Demi', 'Ultra', 'Semi', 'Extra'.

It is possible but unlikely that a font name ends in one of these.
For example "Modern Ultra - Bold".

[note]
The question arises if we should not parse the PSname instead of the
Fullname; and stick to the dash there as boundary.
The problem might be prepatched fonts with broken naming, that would be
parsed completely wrong then. So maybe the current approach is still the
best, with the caveat given above (fontnames ending in a modifier).

[note 2]
Funny enough the variable allow_regex_token was not used at all :->
Some leftover? Anyhow we use it now.

[note 3]
We can still not remove the special handling for IBM3270, because the
font initially looks like a PSname and this is parsed as such, which
breaks the name in the incorrect place:

        PSname template  = "Name-StylesWeights"
        Fullname of 3270 = "IBM 3270 Semi-Condensed"

Signed-off-by: Fini Jastrow <ulf.fini.jastrow@desy.de>
  • Loading branch information
Finii committed May 26, 2023
1 parent 30d7317 commit bb21d28
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion bin/scripts/name_parser/FontnameTools.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,12 @@ def get_name_token(name, tokens, allow_regex_token = False):
not_matched = ""
all_tokens = []
j = 1
regex = re.compile('(.*?)(' + '|'.join(tokens) + ')(.*)', re.IGNORECASE)
token_regex = '|'.join(tokens)
if not allow_regex_token:
# Allow a dash between CamelCase token word parts, i.e. Camel-Case
# This allows for styles like Extra-Bold
token_regex = re.sub(r'(?<=[a-z])(?=[A-Z])', '-?', token_regex)
regex = re.compile('(.*?)(' + token_regex + ')(.*)', re.IGNORECASE)
while j:
j = regex.match(name)
if not j:
Expand All @@ -159,6 +164,9 @@ def get_name_token(name, tokens, allow_regex_token = False):
sys.exit('Malformed regex in FontnameTools.get_name_token()')
not_matched += ' ' + j.groups()[0] # Blanc prevents unwanted concatenation of unmatched substrings
tok = j.groups()[1].lower()
if not allow_regex_token:
# Remove dashes between CamelCase token words
tok = tok.replace('-', '')
if tok in lower_tokens:
tok = tokens[lower_tokens.index(tok)]
tok = FontnameTools.unify_style_names(tok)
Expand Down

0 comments on commit bb21d28

Please sign in to comment.