Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grammar.gedstruct has infinite loops #424

Open
augean opened this issue Jan 19, 2024 · 8 comments
Open

grammar.gedstruct has infinite loops #424

augean opened this issue Jan 19, 2024 · 8 comments

Comments

@augean
Copy link

augean commented Jan 19, 2024

in https://github.com/FamilySearch/GEDCOM/blob/main/extracted-files/grammar.gedstruct
we have types, that reference types going round in a loop
, for example, look at the pattern we see a loop starting at line 569

hence, the schema has become an infinite loop, and we can't build from it
GEDCOM 5.5.1 carefully avoided this type of error.

`
line=35 " +3 <<ADDRESS_STRUCTURE>> {0:1}" in HEADER
line=52 " +1 <<NOTE_STRUCTURE>> {0:1}" in HEADER
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
line=614 " +2 <<DATE_VALUE>> {0:1}" in SOURCE_CITATION
line=623 " +1 <<MULTIMEDIA_LINK>> {0:M}" in SOURCE_CITATION
line=624 " +1 <<NOTE_STRUCTURE>> {0:M}" in SOURCE_CITATION
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
line=614 " +2 <<DATE_VALUE>> {0:1}" in SOURCE_CITATION
line=623 " +1 <<MULTIMEDIA_LINK>> {0:M}" in SOURCE_CITATION
line=624 " +1 <<NOTE_STRUCTURE>> {0:M}" in SOURCE_CITATION
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
line=614 " +2 <<DATE_VALUE>> {0:1}" in SOURCE_CITATION
line=623 " +1 <<MULTIMEDIA_LINK>> {0:M}" in SOURCE_CITATION
line=624 " +1 <<NOTE_STRUCTURE>> {0:M}" in SOURCE_CITATION
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
line=614 " +2 <<DATE_VALUE>> {0:1}" in SOURCE_CITATION
line=623 " +1 <<MULTIMEDIA_LINK>> {0:M}" in SOURCE_CITATION
line=624 " +1 <<NOTE_STRUCTURE>> {0:M}" in SOURCE_CITATION
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
line=614 " +2 <<DATE_VALUE>> {0:1}" in SOURCE_CITATION
line=623 " +1 <<MULTIMEDIA_LINK>> {0:M}" in SOURCE_CITATION
line=624 " +1 <<NOTE_STRUCTURE>> {0:M}" in SOURCE_CITATION
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
line=614 " +2 <<DATE_VALUE>> {0:1}" in SOURCE_CITATION
line=623 " +1 <<MULTIMEDIA_LINK>> {0:M}" in SOURCE_CITATION
line=624 " +1 <<NOTE_STRUCTURE>> {0:M}" in SOURCE_CITATION
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
line=614 " +2 <<DATE_VALUE>> {0:1}" in SOURCE_CITATION
line=623 " +1 <<MULTIMEDIA_LINK>> {0:M}" in SOURCE_CITATION
line=624 " +1 <<NOTE_STRUCTURE>> {0:M}" in SOURCE_CITATION
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
line=614 " +2 <<DATE_VALUE>> {0:1}" in SOURCE_CITATION
line=623 " +1 <<MULTIMEDIA_LINK>> {0:M}" in SOURCE_CITATION
line=624 " +1 <<NOTE_STRUCTURE>> {0:M}" in SOURCE_CITATION
line=569 " +1 <<SOURCE_CITATION>> {0:M}" in NOTE_STRUCTURE
java.lang.RuntimeException: Recursive

`

@Norwegian-Sardines
Copy link

GEDCOM 5.5.1 carefully avoided this type of error.

I disagree, v5.5.1 has this issue as well.

@augean
Copy link
Author

augean commented Jan 20, 2024

Actually, now you have jogged my memory,
When I created the machine readable ged.5.1.1.txt,
I think I had to remove something to prevent infinite loops
At least we agree, that this is still an issue., even if not a regression.

@Norwegian-Sardines
Copy link

The problem is that the term “source” is used incorrectly! The design and people say, the provider of a fact, object or note/statement is the “source” of the information, we should be using a citation referencing the provider. Therefore, an object, note, fact are cited, but a citation would never have an object, note or fact associated with it (no provider), so no circular reference! GEDCOM needs a citation record and deprecate the “Source_Citation” and the Source_Record!

@tychonievich
Copy link
Collaborator

The spec already identifies this case explicitly

A NOTE_STRUCTURE can contain a SOURCE_CITATION, which in turn can contain a NOTE_STRUCTURE, allowing potentially unbounded nesting of structures. Because each dataset is finite, this nesting is also guaranteed to be finite.

I do not see an issue here. This is as designed. Can you clarify what is wrong with it as it stands?

@augean
Copy link
Author

augean commented Jan 26, 2024

To fix this you would need a new "NOTE_STRUCTURE_FOR_SOURCES"

"A NOTE_STRUCTURE can contain a SOURCE_CITATION, which in turn can contain a NOTE_STRUCTURE_FOR_SOURCES"

NOTE_STRUCTURE_FOR_SOURCES doesn't contain any SOURCE_CITATION's (preventing looping)

It would be very helpful if this was part of the spec.

@tychonievich
Copy link
Collaborator

The option of deep nesting is a feature that has been supported since NOTE.SOUR was added in version 5.5 (1996) and has valid semantic meaning. To remove it would be a breaking change, and thus could not happen before version 8.0 at the earliest.

That said, while I've never gotten to arbitrary depth I have used the SOUR.NOTE.SOUR.NOTE several times to express things that really benefit from this structure, and I'd be sad to see this removed from the spec. I'm also not seeing why we need to remove it, unless it is to support the specific operation of enumerating all possible paths.

A cleaned up example of how it can be used to convey valid semantic information is

1 BIRT
2 SOUR @S1@
3 NOTE The applicability of that source to this event has been challenged
4 SOUR @S2@
5 NOTE Several blogs have weighted in on the validity of this source's arguments in this case
6 SOUR @S3@
6 SOUR @S4@

...where S1 is the source record, S2 is an article questioning it, S3 and S4 are blog posts about if S2 applies to S1 or not. And there's no need for this to stop at any given depth: I might have notes about if S3 actually applies to my note on S2, and sources for that note, and so on.

@Norwegian-Sardines
Copy link

First let me say that I'm not advocating the removal of these relationship per se!

I have actually seen this in a GEDCOM I received!

An image O1 was added and some one wanted to give credit to the provider of the image, so they created a "Source of the Image" record S2. (An aside: This is why I indicated that images, notes, and facts should not have "sources" but should have a citation record that has no lower links).

S2 receives a NOTE N3 describing some additional information about the source.

At a later date a person looking at N3 decides that the NOTE needs a link to the image O1 because the NOTE refers to something in the image, and when displaying the NOTE N3 they want to see the image O1 along side the note.

And thus we have a loop.

Again I'm not advocating changing the current relationships, an object should have notes and notes should have objects, but something in the program implementing these relationship needs to stop O1 from having a link to N3 when N3 already has a link to O1! This is a simple check, but not so simple in the first example!

@augean
Copy link
Author

augean commented Jan 31, 2024

thanks for the analysis, of this issue
it seems like GEDCOM can have an infinite number of paths,
I will think it over, and how this could be implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants