The structure of these integration tests is heavily influenced by the existing integration test structure. In that previous structure:
- Each test group ran as separate Maven build.
- Each would build an image, start a cluster, run the test, and shut down the cluster.
- Tests were created using TestNG, a long-obsolete test framework.
- A
IntegrationTestingConfig
is created from system properties (passed in from Maven via-D<key>=<value>
options). - A TestNG test runner uses a part of the Druid Guice configuration to inject test objects into the tests.
- The test then runs.
To minimize test changes, we try to keep much of the "interface" while changing the "implementation". Basically:
- The same Docker image is used for all tests.
- Each test defines its own test cluster using Docker Compose.
- Tests are grouped into categories, represented by JUnit categories.
- Maven runs one selected category, starting and stopping the test-specific cluster for each.
- A cluster-specific directory contains the
docker-compose.yaml
file that defines that cluster. Each of these files imports from common definitions. - Each test is annotated with the
DruidTestRunner
to handle initialization, and JUnitCategory
to group the test into a category. - Categories can share cluster configuration to reduce redundant definitions.
- A
docker.yaml
file defines the test configuration and creates theIntegrationTestingConfig
object. - Tests run as JUnit tests.
The remainder of this section describes the test internals.
Due to the way the Failsafe Maven plugin works, it will look for ITs with names of the form "IT*.java". This is the preferred form for Druid ITs. That is, name your test "ITSomething", not "SomethingTest" or "IntegTestSomething", etc. Many tests are called "ITSomethingTest", but this is a bit repetitious and redundant since "IT" stands for "Integration Test".
A test must have a cluster configuration to define the cluster. There is a many-to-one relationship between test categories and test clusters.
See Test Configuration for details on the docker.yaml
file
that you create for each test module to tell the tests about the cluster you
have defined.
Test configuration allows inheritance so, as in Docker Compose, we define
standard bits in one place, just providing test-specific information in each
tests docker.yaml
file.
The test code assumes that the test configuration file is in
src/test/resources/cluster/<category>/docker.yaml
, where <category>
is
the test category. The test runner loads the configuration file into
(or, specifically that it is on the class path at /yaml/docker.yaml
)
a ClusterConfig
instance.
The ClusterConfig
instance provides the backward-compatible
IntegrationTestingConfig
instance tha that most existing test cases use.
New tests may want to work with ClusterConfig
directly as the older interface
is a bit of a muddle in several areas.
Each test is associated with a cluster definition. Maven starts the required
cluster, runs a group of tests, and shuts down the cluster. We use the JUnit
Category
to identify the category for each test:
@RunWith(DruidTestRunner.class)
@Category(BatchIndex.class)
public class ITIndexerTest extends AbstractITBatchIndexTest
{
...
The category is a trivial class that exists just to provide the category name. It can also hold annotations, which will use in a moment. When adding tests, use and existing category, or define a new one if you want your tests to run in parallel with other categories.
The test-cases
module contains all integration tests. However,
Maven can run only one category per Maven run. You specify the category using a
profile of the same name, but with "IT-" prefixed. Thus the Maven profile for the
above BatchIndex
category is IT-BatchIndex
.
Test categories may share the same cluster definition. We mark this by adding an annotation to the category (not test) class. The test class itself:
@RunWith(DruidTestRunner.class)
@Category(InputFormat.class)
public class ITLocalInputSourceAllInputFormatTest extends AbstractLocalInputSourceParallelIndexTest
{
...
The test category class:
@Cluster(BatchIndex.class)
public class InputFormat
{
}
This says that the test above is in the InputFormat
category, and tests in that
category use the same cluster definition as the BatchIndex
category. Specifically,
to look for the cluster definition in the BatchIndex
folders.
At present, the following test categories are fully or partly converted:
Category | Test NG Group | Description |
---|---|---|
HighAvailability | high-availability | Cluster failover tests |
BatchIndex | batch-index | Batch indexing tsets |
InputFormat | input-format | Input format tests |
The new names correspond to class names. The Test NG names were strings.
The ITs are JUnit test, but use a special test runner to handle configuration.
Test configuration is complex. The easiest way to configure, once the configuration
files are set, is to use the DruidTestRunner
class:
@RunWith(DruidTestRunner.class)
@Category(MyCategory.class)
public class MyTest
{
@Inject
private SomeObject myObject;
...
@Test
public void myTest()
{
...
The test runner loads the configuration files, configures Guice, starts the Druid lifecycle, and injects the requested values into the class each time a test method runs. For simple tests, this is all you need.
The test runner validates that the test has a category, and handles the above mapping from category to cluster definition.
The DruidTestRunner
extends JUnitParamsRunner
to allow parameterized tests.
This class stays discretely out of the way if you don't care about parameters.
To use parameters, see the CalciteJoinQueryTest
class for an example.
The JUnit-based integration tests are designed to be as simple as possible
to debug. Each test class uses annotations and configuration files to provide
all the information needed to run a test. Once the customer is started
(using cluster.sh
as described here), each test can
be run from the command line or IDE with no additional command-line parameters.
To do that, we use a docker.yaml
configuration file that defines all needed
parameters, etc.
A test needs both configuration and a Guice setup. The DruidTestRunner
,
along with a number ofm support classes, mostly hide the details from the tests.
However, you should know what's being done so you can debug.
- JUnit uses the annotation to notice that we've provided a custom test runner. (When converting tests, remember to add the required annotation.)
- JUnit calls the test class constructor one or more times per test class.
- On the first creation of the test class,
DruidTestRunner
creates an instance of theInitializer
class, via itsBuilder
to load test configuration, create the Guice injector, inject dependencies into the class instanance, and start the Druid lifecycle. - JUnit calls one of the test methods in the class.
- On the second creation of the test class in the same JVM,
DruidTestRunner
reuses the existing injector to inject dependencies into the test, which avoids the large setup overhead. - During the first configuration,
DruidTestRunner
causes initialization to check the health of each service prior to starting the tests. - The test is now configured just as it would be from TestNG, and is ready to run.
DruidTestRunner
ends the lifecycle after the last test within this class runs.
See this explanation for the gory details.
DruidTestRunner
loads the basic set of Druid modules to run the basic client
code. Tests may wish to load additional modules specific to that test.
There are times when a test needs additional Guice modules beyond what the
Initializer
provides. In such cases, you can add a method to customize
configuration.
If your test requires additional Guice modules, add them as follows:
@Configure
public static void configure(Initializer.Builder builder)
{
builder.modules(
new MyExtraModule(),
new AnotherModule()
);
}
Druid makes heavy use of properties to configure objects via the 'JsonConfigProvidermechanism. Integration tests don't read the usual
runtime.propertiesfiles: there is no such file to read. Instead, properties are set in the test configuration file. There are times, however, when it makes more sense to hard-code a property value. This is done in the
@Configure` method:
builder.property(key, value);
You can also bind a property to an environment variable. This value is used when the environment variable is set. You should also bind a default value:
builder.property("druid.my.property", 42);
builder.propertyEnvVarBinding("druid.my.property", "ULTIMATE_ANSWER");
A property can also be passed in as either a system property or an environment variable of the "Docker property environment variable form":
druid_property_a=foo
./it.sh Category test
Or, directly on the command line:
-Ddruid_property_b=bar
Property precedence is:
- Properties set in code, as above.
- Properties from the configuration file.
- Properties bound to environment variables, and the environment variable is set.
- Properties from the command line.
The test properties can also be seen as default values for properties provided in config files or via the command line.
If your test get the dreaded "it doesn't work that way" message, it means that
an injected property in your test is asking Guice to instantiate a lifecycle-managed
class after the lifecycle itself was started. This typically happens if the class
in question is bound via the polymorphic PolyBind
mechanism which doesn't support
"eager singletons". (If the class in question is not created via PolyBind
, change
its Guice binding to include .asEagerSingleton()
rather than .as(LazySingleton.class)
.
See this reference.
A quick workaround is to tell the initializer to create an instance before the lifecycle starts. The easy way to do that is simply to inject the object into a field in your class. Otherwise, give the builder a hint:
builder.eagerInstance(ThePeskyComponent.class);
When working with tests, it is helpful to know a bit more about the "magic"
behind DruidTestRunner
.
Druid's code is designed to run in a server, not a client. Yet, the tests are clients. This means that tests want to run code in a way that it was not intended to be run. The existing ITs have mostly figured out how to make that happen, but result is not very clean. This is an opportunity for improvement.
Druid introduced a set of "injector builders" to organize Guice initialization
a bit. The builders normally build the full server Guice setup. For the ITs,
the builders also allow us to pick and choose which modules to use to define
a client. The Initializer
class in it-base
uses the injector builders to
define the "client" modules needed to run tests.
Druid uses the Lifecycle
class to start and stop services. For this to work,
the managed instance must be created before the lifecycle starts. There are
a few items that are lazy singletons. When run in the server, they work fine.
But, when run in tests, we run into a race condition: we want to start the
lifecycle once before the tests start, the inject dependencies into each test
class instance as tests run. But, those injections create the insteance we want
the lifecycle to manage, resulting in a muddle. This is why the DruidTestRunner
has that odd "first test. vs. subsequent test" logic.
The prior ITs would start running tests immediately. But, it can take up to a
minute or more for a Druid cluster to stabilize as all the services start
running simultaneously. The previous ITs would use a generic retry up to 240
times to work around the fact that any given test could fail due to the cluster
not being ready. This version does that startup check as part if DruidTestRunner
.
By the time the tests run, the cluster is up and has reported itself healthy.
That is, your tests can assume a healthy cluster. If a test fails: it indicates
an actual error or race condition.
Specifically, if tests still randomly fail, those tests are telling you something: something in Druid itself is non-deterministic (such as the delay to see changes to the DB, etc.), or the tests are making invalid assumptions such as assuming an ordering when there is none, using a time delay to try to synchronize actions when there should be some specific synchronization, etc. This means that, in general, you should avoid the use of the generic retry facility: if you have to retry to get your tests to work, then the Druid user has to also retry. Unless we document the need to retry in the API documentation, then having to retry should be considered a bug to be fixed (perhaps by documenting the need to retry, perhaps by fixing a bug, perhaps by adding a synchronization API.)
Another benefit of the startup check is that the startup and health-check costs are paid once per test class. This allows you to structure your tests as a large number of small tests rather than a few big tests.
The ClusterConfig
class is the Java representation of the
test configuration. The instance is available from the
Initializer
and by Guice injection.
It is a Jackson-serialized class that handles the "raw" form of configuration.
The ClusterConfig.resolve()
method expands includes, applies defaults,
validates values, and returns a ResolvedClusterConfig
instance used
by tests. ResolvedClusterConfig
is available via Guice injection.
In most cases, however, you'll use it indirecty via the various clients
described below. Each of those uses IntegrationTestingConfig
class, an
instance of which is created to read from ResolvedClusterConfig
.
Remember that each host has two names and two ports:
- The external (or "proxy") host and port, as seen by the machine running the tests.
- The internal host and port, as seen by the service itself running in the Docker cluster.
The various config files provide configurations for
the Docker, K8s and local cluster cases. This means that resolveProxyHost()
will resolve to the proxy for Docker, but the actual host for a local cluster.
The original test setup was designed before Druid introduced the router. A good future improvement is to modify the code to use the router to do the routing rather than doing it "by hand" in the tests. This means that each test would use the router port and router API for things like the Overlord and Coordinator. Then, configuration need only specify the router, not the other services.
It is also possible to use Router APIs to obtain the server list dynamically rather than hard-coding the services and ports. If we find cases where tests must use the APIs directly, then we could either extend the Router API or implement client-side service lookup.
The integration tests make many REST calls to the Druid cluster. The tests
contain much copy/paste code to make these calls. The ClusterClient
class
is intended to gather up these calls so we have a single implementation
rather than many copies. Add methods as needed for additional APIs.
The cluster client is "test aware": it uses the information in
ClusterConfig
to know how to send the requested API. The methods handle
JSON deserialization, so tests can focus simply on making a call and
checking the results.
This package in integration-tests
has clients for most other parts of
Druid. For example, CoordinatorResourceTestClient
is a
client for Coordinator calls. These clients are also aware of the test
configuration, by way of the IntegrationTestingConfig
class, an
instance of which is created to read from ResolvedClusterConfig
.