-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect Schema for different type of Project to Sync #280
Conversation
@vishav1771 it looks like your tests are failing because the docstring you added to your test has an extra trailing quote (four instead of three). I'll look at the tests now. |
@vyasr I have not done any changes in the |
Yes, you've absolutely done the right thing. To make sure I understand these tests:
Is that interpretation correct? One more I might add is where the types don't match but the values match exactly e.g. Reading these tests makes me think that we are perhaps too stringent with our tests of invalid schema. The existence of the |
I agree. The initial intention when I added the schema check was to prevent cases where one would apply a migration to one project data space (project A) and then sync it with a diverged copy (project B), e.g., one where a key was renamed, and now one would end up with a bunch of additional jobs that were originally supposed to be the same. So here is my proposal, I think that we should keep the current behavior where sync really only syncs when the schema is exactly the same. However, we should clarify in the error message that this does not meany anything bad if you intend to sync two data space that are similar, but not identical in terms of schema. We could consider defusing the warning message and to use a different option to override just that check that sounds less menacing then When I look at the original issue, the error message that we see is very confusing:
That just doesn't make any sense since it claims that certain keys were only found in the source or destination schema, but they are actually identical. This is what needs to be addressed. |
I think that's a reasonable approach for now. If we go this route of not changing how schema checking works, long term I would advocate that for 2.0 we change the default behavior to |
@vyasr So, Only need to change the warning message and a different option to override for now? |
Yes, I think that for now we would just want to add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this PR based on master
and apply discussed modifications to this PR.
Codecov Report
@@ Coverage Diff @@
## master #280 +/- ##
==========================================
+ Coverage 64.81% 64.86% +0.05%
==========================================
Files 40 40
Lines 5644 5644
==========================================
+ Hits 3658 3661 +3
+ Misses 1986 1983 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Approved pending the application of the minor suggestions that I added.
@vishav1771 One final question, have you verified that this will solve the specific issue for the data space posted in #230 ?
Yes, I have verified it with statepoints given in #230 and few other cases. However, while testing I came across a test case shown :
They will generate the same schema and will pass |
Great, thank you.
While the two projects don't have the identical set of jobs, their implicit schema is indeed the same so IMO it is intended behavior that they would sync without problem. Otherwise we would need to redefine that merging means that one syncs two projects that have the exact same set of jobs. I don't think that's what we want. @vyasr Thoughts? |
I agree. I think this question is revisiting the motivation for the original requirement that schema had to be exactly identical, which was to avoid e.g. renaming or adding keys to schema and then trying to sync. If the keys and the types of the values corresponding to each key are the same between two schema, any jobs that exist in the source but not the destination must be truly distinct statepoints, so I think this behavior is the correct one. |
@vyasr Great, please merge at your discretion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a typo, I'm going to apply the suggestion myself then I think it's good to go.
@vishav1771 I don't have push permissions to your repo. Can you commit my suggestion directly? I will merge after that. Also, in general it is helpful for us if you complete the checklist when you make the pull request. That lets us track how far you've gotten. |
Sure
Sorry about that. I'll keep in mind for next time |
@vyasr This is ready, isn't it? |
Yes, thanks for the ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Description
Changed the
detect_schema
function to compare on the bases of data types instead of values.Motivation and Context
Fixes #230
Types of Changes
1The change breaks (or has the potential to break) existing functionality.
Checklist:
If necessary:
Example for a changelog entry:
Fix issue with launching rockets to the moon (#101, #212).