-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to 3.0.0, run backwards compatibility with OpenSearch 2.4.0 #242
Update to 3.0.0, run backwards compatibility with OpenSearch 2.4.0 #242
Conversation
Signed-off-by: dblock <dblock@dblock.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this on the other issue. There's two kinds of compatibility, wire and index. For index compatibility the next major versions are compatible back to the previous major release (e.g., 3.0.0 is index compatible back to 2.0.0). For wire compatibility the next major version is only compatible back to the previous minor (e.g., 2.4.0). This is how API deprecation / removals are handled across minor releases to support rolling upgrades.
Opening a PR to make BWC testing a 3.0.0 plugin w/ any minor release breaks this deprecation / removal paradigm. It's not supported so you should see all sorts of BWC test failures. We'd have to redesign core and introduce an internal API versioning scheme to change this design which would require us to support API deprecation / removals across the entire minor suite of a major line. This has implications deeper into deprecation / removals w/in lucene compatibility.
This exception during the master election process is due to wire incompatibility when trying to send the cluster state. Either the exception for the version compatibility check is getting swallowed somewhere or it's incorrectly passing. Either way, this is due to 3.0.0 not being transport compatible w/ 2.1.0 and the error message is just not very useful at communicating that here. |
@nknize I understand. You're saying that this PR should test against 2.4.0 (whatever is in 2.x). Now I just need to make it work. Do you know of a dynamic way to get the "2.x -> 2.4.0" information? Because I need to fish the job-scheduler plugin out of 2.4.0. Or otherwise maybe try to build the job-scheduler from 2.x as well instead of trying to download it... |
Correct. For BWC testing the plugin repository's branching process and logic needs to be consistent w/ the core branching logic. That is the |
Signed-off-by: dblock <dblock@dblock.org>
a5c551a
to
83d588f
Compare
Putting everything on 2.4.0. I see a failure starting the cluster. So far not able to fix it :(
|
cc:@reta ideas? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick checkin: I poked at this for a short bit and was able to manually form two separate clusters:
- 3 node mixed cluster (2 -
2.4.0
, 1 -3.0.0
) w/ clean builds of2.4.0
and3.0.0
from the core repo - 3 node cluster (3 -
2.4.0
) w/ job-scheduler's bwc checkout of2.4.0
(freshly built in.gradle/caches
by the bwc scaffolding)
This tells me it's not an issue in core but possibly something up w/ the gradle bwc test configuration (which looks to be a copy of core's bwc config - e.g., RecoveryIT
).
It's also odd to me that gradle executes the entire sample-extensions test suite despite passing in the single test using the --tests
flag. But that's likely a separate issue. If I have some time I'll poke at this a little deeper today.
I tried to bisect the problem and didn't get anywhere, tried at cb238aae616d6a0fd8f82e128a1f94c8e4e8b1f7 (when we bumped version to 3.0), same problem. Trying to go back further makes a mess. I still can't understand why the test cluster doesn't come up though, which seems to be the key problem. I also tried using out-of-the-box 2.4.0 on the configuration generated by the bcw tests and my cluster does't even get into discovery, loops over |
I'll take a stab at this. |
@dblock @nknize I think I found out the cause: it seems not related to cluster manager or election, but the fact that the cluster type was set to
|
Signed-off-by: dblock <dblock@dblock.org>
@reta THANK YOU. I am so dumb. Instead of reading the actual output of the failed test I spent days chasing a red herring on why the cluster isn't coming up. It just wasn't given enough time given the earlier assertion failure. The old version is no longer opendistro, so I merged the OLD and MIXED assertions, getting rid of the opendistro variation that checks for an old plugin name. |
Codecov Report
@@ Coverage Diff @@
## main #242 +/- ##
============================================
- Coverage 53.19% 52.96% -0.23%
+ Complexity 65 64 -1
============================================
Files 8 8
Lines 438 438
Branches 50 50
============================================
- Hits 233 232 -1
Misses 186 186
- Partials 19 20 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -12,3 +12,4 @@ build | |||
bin/ | |||
.classpath | |||
.vscode | |||
sample-extension-plugin/src/test/resources/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it accidental? Seems like legit non-ignorable target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not. The old code was downloading older versions of job-scheduler into src/test/resources/bwc/job-scheduler
so I kept it. The folder doesn't exist in the tree. Maybe it should download to /tmp, but I didn't want to change too many things. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ... I see, thanks for explaining, let's keep it like that indeed
@nknize I dismissed your review to get us unblocked, feel free to chime in if you think this needs more changes and I can followup @saratvemulapalli +1 so we can merge? |
Signed-off-by: dblock dblock@dblock.org
Description
OpenSearch 3.0 cannot be directly upgraded from OpenSearch 1.x. This changes the BCW to use OpenSearch 2.4.0. The latest version of the job-scheduler plugin is also downloaded from CI.
We'll need to find a way to automate getting "the latest 2.x version" if we don't want to keep updating this number here every time a 2.x releases.
Issues Resolved
opensearch-project/OpenSearch#3615
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.