Skip to content

wrong parsing of DatePeriod #2

@geostag

Description

@geostag

The DatePeriod parser unintentionally parses the keywords of DatePeriods as month in some cases.

import gedcom7.types as T
T.DatePeriod('FROM 21 JAN 1990 TO 22 FEB 2000').parse()
T.DatePeriod('FROM  1990 TO  2000').parse()

The first DatePeriod ist recognized correctly as period from 1999-01-21 to 2000-02-22. The second one however, which is a valid DatePeriod according to the GEDCOM 7 specification, is parsed as

{
   "from":{
      "calendar":"None",
      "day":1990,
      "month":"TO",
      "year":2000,
      "epoch":"None"
   }
}

The root cause is, that the orginal specification of GEDCOM 7 does not exclude the keywords from the month pattern, thus leading to this ambiguous interpretation, which is formal valid but not intended.

With the following change in grammar.py, you get the intended interpretation:

- month = f'({stdtag}|{exttag})'
+ no_months = f'FROM|TO|BET|AND|AFT|BEF'
+ month = f'(?!{no_months})({stdtag}|{exttag})'

Perhaps this should be fixed in GEDCOM 7 as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions