Rectified file with new data#4
Conversation
|
@D-Jeffrey Could you please very the newly cleaned Queen.ged file? It was a different and exciting experience for me and waiting for you verification and comments on the same. |
|
Reviewing. The analyst is interesting and valid. Any project that imports GED files should be able to handle hand-crafted records—data exported from other systems or maintained by a family member using their own tracking method. A key part of testing is whether the importer fails gracefully and how it reports or handles those failures. I know that GrampsWeb created Notes as part of the import. I’m also aware of the parent issue in this file, and I’ve seen other files with special-character problems as well. My main question is: are you aiming to support real-world test data, or only “clean”/perfect data, in GrampsWeb? The Queen dataset isn’t a great example for demonstrating lineage or family relationships because it includes many unique, disconnected individuals. For my purposes have a less than perfect GED file is preferred. When such records are loaded, how does GrampsWeb respond—does the user need to correct everything and produce a perfect file before the import will work? For example, I’ve had exports from MyHeritage that incorrectly included HTML tags in the NOTE/CONC fields. My goal was to stress-test my project with complex, problematic data. As I work through these issues, I see several ways we could help users successfully bring their data into the application. I'm concerning whether to move towards perfect data or hold onto the glitchy data which is the way I found it on the Internet. |
|
Hi @D-Jeffrey, let me provide some more background. In the process of experimenting with new example trees for Gramps (and Gramps Web - it's the same for desktop and web), @jittymolmathew92 imported the Queen.ged file. Gramps (again, same import code for web and desktop) can import it, but lots of dates are recognized as "text only", which makes them readable but not sortable etc. This PR contains a file that's normalized to GEDCOM 5.5 standard (I know there isn't really a standard 😉) to avoid such problems. Whether this file is useful for this repo or not I'm not sure, totally up to you. For Gramps, it's just the starting point, a lot more needs to be added manually to make it a useful example database (sources, media, etc.). |
|
I hear what the request is. If you have a tool which does the work to automatically correct the GED file and produce the markdown, that is really interesting (and needs more testing). If you want to take a copy of the queen.ged or other files, that is fine. I did not create most of them, so I look for no credit. I'm not trying to be difficult. I do think copying and recopying records should be done carefully, especially if others are going to access them and use it for their information source. In my own private tree building, I drop brother and sisters who are not in the generational linage. I have seen others erroneously merge together family, and it is nightmare to unwind. When a genealogy line works, it is a great feeling, when the dates or names get merges or miss-written, the great feel disappears. My version of the file has 8 less lines than the other sources of it on the Internet here [https://duncan.familygenes.ca/tng/members_data/0033ab/gedcom/Queen_Eliz_II.ged and here https://kingscoronation.com/queen-elizabeth-ii-gedcom-download/ because that was the only way I could get my input to work. The other examples in my collection were from Source Forge https://sourceforge.net/projects/godskingsheroes/ which seems to take it from https://famousfamilytrees.blogspot.com/. And I suspect those where assembled from hand me down sources. The PR is for The Markdown stated in point I'm not a fan of fixing the dates in an arbitrary way Those kind of records, could be properly corrected with research, but just picking the first date is not something that should be encouraged. or Whole records were removed My decision will be to keep the files as I found them so that other developer to have example files which are not clean and not test suite checkbox ready. As I said, if you want to take the files and clean them in your repo, you are more than welcome to do so, but they will not be representitive of what users may be struggling with, using their own data. Creating a tool to help, using good pratices (if that is what you have) or teaching users how to correct situations (not files this large), may be the opportunity. |
|
I actually agree it wouldn't make sense to replace the original file, I just recommended to share the cleaned-up file in case it is useful to anyone, but a different repo might be a better place to do so. |
D-Jeffrey
left a comment
There was a problem hiding this comment.
Remove this update to keep the orginal file
|
@jittymolmathew92 I'm looking for a If you make those changed then you and @DavidMStraub will get |
@D-Jeffrey
On behalf of GrampsWeb, I was in a research of the existing Queens.ged file. And identified several issues and updated the file according to the analysis. Updated the clean file by removing all unformatted date data, person don't have father and mother data, special characters, empty notes etc. The detailed report also attaching for your reference and new updated & cleaned file is there in the PR for your reference.
Queen_clean_import_report.md