Description
Todo:
- implement Version in java so we can use it in cluster-formation
- rename to
testClusters
andTestClustersPlugin
ditchingClusterFormation
- proof of concept plugin to check the integration points with Gradle and write integration test
- implement support for setting up a single node cluster and actually starting and using it
- restrict the type of tasks that can use the plugin by default ( ony configure task extensions on specific tasks )
- start using the new cluster-formation for rest integration tests ( modules, plugins )
- start using the new cluster-formation for rest integration tests on x-pack
DSL Glimpse
plugins {
id 'elasticsearch.clusterformation'
}
testClusters {
myTestCluster {
distribution = 'ZIP'
version = '6.3.0'
}
}
task user1 {
useCluster testClusters.myTestCluster
doLast {
println "Cluster running @ ${elasticsearchNodes.myTestCluster.httpSocketURI}"
}
}
task user2 {
useCluster testClusters.myTestCluster
doLast {
println "Cluster running @ ${elasticsearchNodes.myTestCluster.httpSocketURI}"
}
}
Produces this output:
> Task :syncClusterFormationArtifacts UP-TO-DATE
> Task :user1
Starting `myTestCluster`
Cluster running @ [::1]:37347
Not stopping `myTestCluster`, since node still has 1 claim(s)
> Task :user2
Cluster running @ [::1]:37347
Stopping `myTestCluster`, number of claims is 0
BUILD SUCCESSFUL in 10s
3 actionable tasks: 2 executed, 1 up-to-date
Initial Description
The current cluster formation has the following limitations:
- no straight forward way to create additional clusters, define relationships between them
- does not currently work with
--parallel
, and as such has support for no parallelism ( note thattest.jvm
doesn't help here, these tests always run in sequence) - complex tests like rolling upgrade are not readable at all as they make use of relations between Gradle tasks that are really hard to follow.
The main reason --parallel
does not work is that Gradle's finalizedBy
does not offer any guarantees about when the task will be run. We sue this for stopping clusters, but when running with parallel Gradle puts that off so that one can end up running with 40+ es nodes ( 512mb * 40 ~ 20GB ) before running out of memory and build starting to fail because of this. There is no easy fix for this, other than setting up a bunch of mustRunAfter
rules fro the different tasks. Some test run across clusters, upgrade and restart nodes, etc we can't make any assumptions about when the stop tasks is safe to run, so we can't really enforce a "stop after test runner for this cluster completed" rule as the test runners of other clusters might still need this cluster.
Even after doing some hacks to bring down the nodes sooner and not run out of memory, --parallel
uncovered some missing ordering relations between tasks that were causing failures.
From some limited testing, I estimate build time could be reduced by at least 30% by being able to run integ tests in parallel (based on running :qa:check
on my 6 physical core CPU with 32GB ram).
From what I can see, this is the only thing preventing us from simply running builds with clean check --parallel
without having to pick and choose what works in parallel and what doesn't.
I think we should create a cluster formation DSL that does not rely on Gradle tasks to perform it's operations. We would still use gradle to fetch and set up distributions, but everything else would be externalized. The DSL would provide configuration for the cluster and expose methods to alter it's state (start/stop the cluster or individual nodes, change configuration etc ).
There would be methods for high level operations like starting and stopping the cluster, and running tests as well as lower level operations that can manipulate at the node level.
No operation would be carried out by default, a task would have to be set up that calls these operation from the task action (or as doLast
). We can provide a task as well with the option to control if it's created to cover the common setup of setting up cluster, running tests and terminating.
Of course we would need to have a way to run tests outside of Gradle, but since we don't use it's infrastructure to do it anyway, it shouldn't be that hard.
The custom DSL can make use of Gradles NamedDomainObjectCollection
so plugins can change defaults for different sections of the builds when a new cluster is defined.