-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for overlapping tags #441
Check for overlapping tags #441
Conversation
Thanks for the PR, and sorry for the delay (I was on vacation with bad connectivity). Can you add a test for the added check? |
bfd71b9
to
5fc5dcd
Compare
@magro Sorry to take so long. I got burned by this issue again and spent three hours before discovering it. So I remembered that I had this PR waiting for clean-up. Should be ready now. |
5fc5dcd
to
763ce88
Compare
Great, thanks! |
@cypherdare One follow up question: what's the recommendation for users that get this exception for their existing code base after the kryo update (assuming the underlying issue was not noticed before)? We should at least tell this in the release notes, maybe we could add some actionable hint to the exception. |
@magro
We might be able to conceivably recover data for situation 1. If the field names have not been changed, the two fields were probably written in alphabetical order, so when reading them, we could map multiple appearances of the same Tag ID sequentially to their original fields. Maybe someone could pass in a Map<Integer, List> linking old tag values to new tag values as a configuration. This is an awful lot of complexity to add to the TaggedFieldSerializer class for a small possibility. Perhaps a RecoveryTaggedFieldSerializer class could be written and put in a Gist for those that might want to attempt this recovery. But I don't know that it's worth trying to adequately test a solution like this if we don't know if anyone has been affected. For situation 1, if we can assume the fields were written in alphabetical order*, the simpler solution is to give one of the two fields a new unique tag value, and the other should be left alone. The one that keeps the tag value should be the one that is last alphabetically. During reading, it will be written once with the wrong value, and then a second time with the correct value. The class should be updated to be able to withstand the first field's data being lost. *I'm not 100% this can be assumed. It looks to me like fields are naturally in alphabetical order. We started sorting them by tag ID at some point, but fields with the same ID probably didn't switch places during sorting. If we can't assume this, we still need to keep that tag on one of the values to prevent errors during reading (like losing our place in the stream), but the class must be updated to assume that both of the fields might have incorrect values. For situation 2, I think both fields need new IDs, and the original tag ID should be deprecated, which can be done by creating some new unused field and tagging it with that ID and deprecating it. Old data should still be readable, and the application can now safely write the data as part of a larger graph. For situation 3, I think the same should be done as for situation 2, but there should also be error checking on the object graph after reading it if attempting to read old data. |
@cypherdare Wow, thanks for this elaborate analysis! When preparing the release notes I'll add a link to your comment here, ok? |
@magro No problem. There could be errors here but if someone encounters this issue, we can probably reason out a solution. |
Addresses #440.