Skip to content

Rectified file with new data#4

Open
jittymolmathew92 wants to merge 3 commits into
D-Jeffrey:mainfrom
jittymolmathew92:main
Open

Rectified file with new data#4
jittymolmathew92 wants to merge 3 commits into
D-Jeffrey:mainfrom
jittymolmathew92:main

Conversation

@jittymolmathew92
Copy link
Copy Markdown

@jittymolmathew92 jittymolmathew92 commented May 1, 2026

@D-Jeffrey

On behalf of GrampsWeb, I was in a research of the existing Queens.ged file. And identified several issues and updated the file according to the analysis. Updated the clean file by removing all unformatted date data, person don't have father and mother data, special characters, empty notes etc. The detailed report also attaching for your reference and new updated & cleaned file is there in the PR for your reference.

Queen_clean_import_report.md

@jittymolmathew92
Copy link
Copy Markdown
Author

@D-Jeffrey Could you please very the newly cleaned Queen.ged file? It was a different and exciting experience for me and waiting for you verification and comments on the same.

@D-Jeffrey
Copy link
Copy Markdown
Owner

Reviewing. The analyst is interesting and valid.

Any project that imports GED files should be able to handle hand-crafted records—data exported from other systems or maintained by a family member using their own tracking method. A key part of testing is whether the importer fails gracefully and how it reports or handles those failures. I know that GrampsWeb created Notes as part of the import. I’m also aware of the parent issue in this file, and I’ve seen other files with special-character problems as well. My main question is: are you aiming to support real-world test data, or only “clean”/perfect data, in GrampsWeb? The Queen dataset isn’t a great example for demonstrating lineage or family relationships because it includes many unique, disconnected individuals.

For my purposes have a less than perfect GED file is preferred. When such records are loaded, how does GrampsWeb respond—does the user need to correct everything and produce a perfect file before the import will work? For example, I’ve had exports from MyHeritage that incorrectly included HTML tags in the NOTE/CONC fields. My goal was to stress-test my project with complex, problematic data. As I work through these issues, I see several ways we could help users successfully bring their data into the application.

I'm concerning whether to move towards perfect data or hold onto the glitchy data which is the way I found it on the Internet.

@D-Jeffrey D-Jeffrey marked this pull request as ready for review May 3, 2026 19:15
@D-Jeffrey D-Jeffrey marked this pull request as draft May 3, 2026 19:18
@D-Jeffrey D-Jeffrey marked this pull request as draft May 3, 2026 19:18
@D-Jeffrey D-Jeffrey requested review from D-Jeffrey May 3, 2026 19:19
@DavidMStraub
Copy link
Copy Markdown

Hi @D-Jeffrey, let me provide some more background. In the process of experimenting with new example trees for Gramps (and Gramps Web - it's the same for desktop and web), @jittymolmathew92 imported the Queen.ged file. Gramps (again, same import code for web and desktop) can import it, but lots of dates are recognized as "text only", which makes them readable but not sortable etc.

This PR contains a file that's normalized to GEDCOM 5.5 standard (I know there isn't really a standard 😉) to avoid such problems. Whether this file is useful for this repo or not I'm not sure, totally up to you. For Gramps, it's just the starting point, a lot more needs to be added manually to make it a useful example database (sources, media, etc.).

@D-Jeffrey
Copy link
Copy Markdown
Owner

I hear what the request is. If you have a tool which does the work to automatically correct the GED file and produce the markdown, that is really interesting (and needs more testing). If you want to take a copy of the queen.ged or other files, that is fine. I did not create most of them, so I look for no credit. I'm not trying to be difficult. I do think copying and recopying records should be done carefully, especially if others are going to access them and use it for their information source. In my own private tree building, I drop brother and sisters who are not in the generational linage. I have seen others erroneously merge together family, and it is nightmare to unwind. When a genealogy line works, it is a great feeling, when the dates or names get merges or miss-written, the great feel disappears.

My version of the file has 8 less lines than the other sources of it on the Internet here [https://duncan.familygenes.ca/tng/members_data/0033ab/gedcom/Queen_Eliz_II.ged and here https://kingscoronation.com/queen-elizabeth-ii-gedcom-download/ because that was the only way I could get my input to work. The other examples in my collection were from Source Forge https://sourceforge.net/projects/godskingsheroes/ which seems to take it from https://famousfamilytrees.blogspot.com/. And I suspect those where assembled from hand me down sources.

The PR is for
1957 additions & 14545 deletions. The MD Log was very informative for many details and understated other details.

The Markdown stated in point 4. Vendor tags retained (not removed). Which is not true. All 2 RIN were removed. And I agree they add no value, but that is not what was in the log.

I'm not a fan of fixing the dates in an arbitrary way
2 DATE 1030 or 36 -> 2 DATE 1030
or
2 DATE abt. 1066 or 1094 -> 2 DATE ABT 1066

Those kind of records, could be properly corrected with research, but just picking the first date is not something that should be encouraged.

or
2 DATE 935/950 -> 2 DATE (which I assume is not standard)

Whole records were removed

0 @I206@ INDI
1 RIN MH:I206
1 _UID 22FA49DC-F510-4728-A5D1-8BC383379898
1 _UPD 17 MAR 2013 16:54:57 GMT+9.5
1 SEX F
1 BIRT
2 _UID D43205FC-84DF-4820-85A6-F63CFA051894
2 RIN MH:IF1454
2 DATE 705
1 DEAT
2 _UID 5B869AD0-A572-4754-A2C7-BF3448A2D442
2 RIN MH:IF1455
2 DATE 770
1 FAMS @F94@
1 NOTE Birthdate: 	705
2 CONT <p>Birthplace: 	Oppland, Norway</p>
2 CONT <p>Death: 	Died 770 in Norway</p>

0 @I208@ INDI
1 RIN MH:I208
1 _UID A3DD36C9-672F-4765-9C3D-3DDB109B2FA0
1 _UPD 17 MAR 2013 16:56:25 GMT+9.5
1 SEX F
1 BIRT
2 _UID 5515CBB1-0464-479D-90A6-B8A59504F5A9
2 RIN MH:IF1458
2 DATE 605
1 DEAT
2 _UID D3384CFD-4346-44B8-8D55-9726ACC8C1E5
2 RIN MH:IF1459
1 FAMS @F95@
1 NOTE Nicknames: 	"Svidri Heitson's Wife", "Svidre Heitsons kone"
2 CONT <p>Birthdate: 	circa 605</p>
2 CONT <p>Birthplace: 	Of, , , Norway</p>
2 CONT <p>Death: 	(Date and location )</p>

My decision will be to keep the files as I found them so that other developer to have example files which are not clean and not test suite checkbox ready.

As I said, if you want to take the files and clean them in your repo, you are more than welcome to do so, but they will not be representitive of what users may be struggling with, using their own data.

Creating a tool to help, using good pratices (if that is what you have) or teaching users how to correct situations (not files this large), may be the opportunity.

@DavidMStraub
Copy link
Copy Markdown

I actually agree it wouldn't make sense to replace the original file, I just recommended to share the cleaned-up file in case it is useful to anyone, but a different repo might be a better place to do so.

@D-Jeffrey D-Jeffrey marked this pull request as ready for review May 8, 2026 02:45
Copy link
Copy Markdown
Owner

@D-Jeffrey D-Jeffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this update to keep the orginal file

Copy link
Copy Markdown
Owner

@D-Jeffrey D-Jeffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the addition of the Queen_clean.ged 8bd4fc5, but removed the othe two 09bfe56 and 021fdae

@D-Jeffrey
Copy link
Copy Markdown
Owner

@jittymolmathew92 I'm looking for a cherry-pick 8bd4fc5 or a drop 09bfe56 & drop 021fdae

If you make those changed then you and @DavidMStraub will get queen_clean.ged and everyone will be happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants