Skip to content

Conversation

@kumargauravsharma
Copy link
Contributor

@kumargauravsharma kumargauravsharma commented Jul 10, 2021

Issue #, if available:

Create, Read, Update, Delete Integration tests across custom resources scenarios contain test code, fixtures that are repetitive across test files.

Description of changes:

A declarative integration test framework to write integration test as yaml and a reusable common logic for their execution:

  • Each test yaml represents an integration test scenario with 1 or many ordered steps.
  • Each step specifies a verb (apply, patch, delete) and corresponding expectations to assert
  • Scenarios can run in parallel
  • Scenarios input yaml can have input replacements ($SOME_VARIABLE) which are replaced during test run
  • Scenario tear down logic is automatic
  • Scenarios can be marked for inclusion/exclusion with custom markers and usecase markers
  • Allows custom resource specific customization using ResourceHelper auto discovery using @resource_helper decorator.
  • Uses session scoped fixture for service bootstrap resources and input replacements, removing the need for call to service_bootstrap.py and service_cleanup.py from test-infra made here and here.

Declarative integration test framework ( test/declarative_test_fwk/) is reusable, this PR is reviewed and merged, it can be moved to test-infra later.
It can be enhanced further as needed.

Refer test/e2e/declarative_tests.py file in this PR for using declarative integration test framework.

  • test/e2e/scenarios/scenario1.yaml represent integration test scenario as declarative yaml

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

def description(self) -> str:
return self.config.get("description", "")

def run(self) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run logic can be moved to separate file (example: runner.py) later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added runner.py

@kumargauravsharma kumargauravsharma force-pushed the declarative-test-fwk branch 3 times, most recently from 08fddc2 to 3420ded Compare July 11, 2021 23:38
@RedbackThomson
Copy link
Contributor

/test elasticache-kind-e2e

@RedbackThomson
Copy link
Contributor

Super interesting ideas explored here. Love the idea of moving a lot of the testing to declarative, as most tests run in a fairly consistent CRUD manner. Just have some questions about implementation choices:

  • Is there a reason why scenarios are not linked 1:1 with custom resource types? Is this to allow for a scenario to run tests that require multiple CRs?
  • Does this implementation allow for server-side checks as well? That is, asserting that the AWS service has properly registered the resource.
  • What is the intended effect of the usecases marks?
  • When running multiple scenarios, how easy is it to debug the output in PyTest? Currently PyTest is very readable, as each test is a separate class and the assertions are very clearly laid out in static Python code.

@kumargauravsharma kumargauravsharma force-pushed the declarative-test-fwk branch 3 times, most recently from 7417bba to 2cfab5f Compare July 16, 2021 23:13
@kumargauravsharma
Copy link
Contributor Author

Thanks for your inputs @RedbackThomson

Is there a reason why scenarios are not linked 1:1 with custom resource types?
Is this to allow for a scenario to run tests that require multiple CRs?

In the initial revision of this PR, I considered each step to have custom response details specified in it. It was for flexibility so that scenarios that require multiple CRs, for example pre-req dependency CRs (example: User/UserGroup CR for ReplicationGroup CR), can have steps to create them.
But it also lead to repeated resource reference details (metadata, kind, apiversion etc) in each step even when all the steps are for single resource.
To remove this repetition while keeping the flexibility to specify resource at each step, the current revision of the PR allows customResourceReference field at the scenario level.

customResourceReference:
  apiVersion: $CRD_GROUP/$CRD_VERSION
  kind: ReplicationGroup
  metadata:
    name: test$RANDOM_SUFFIX
steps:

This information is mixed in into each step such that details specified in step takes precedence.
Thus if scenario is for a single resource then there is no need to repeat it in every step, but steps can override it if scenario requires that.

@kumargauravsharma
Copy link
Contributor Author

Does this implementation allow for server-side checks as well? That is, asserting that the AWS service has properly registered the resource.

Yes, the ResourceHelper mechanism allows service controllers to provide resource specific helper implementation.
By overriding assert_expectations service controllers can have logic for server side checks based on step's verb, input data to verb, specified expectations for given custom resource:

@helper.resource_helper("ReplicationGroup")
class ReplicationGroupHelper(helper.ResourceHelper):
    def assert_expectations(self, verb: str, input_data: dict, expectations: dict,
                            reference: k8s.CustomResourceReference):
        # default assertions
        super().assert_expectations(verb, input_data, expectations, reference)
        
        # perform custom server side checks based on:
        # verb, input data to verb, expectations for given resource

@kumargauravsharma
Copy link
Contributor Author

What is the intended effect of the usecases marks?

I considered it to allowlist specific custom resource types in a test suite run. This is similar to service marker that we have currently, but its scope is further finegrained to custom resource type.

As single python file is used for all test scenarios, I expect it to provide input configurability to run subset of scenarios from entire test suite.

@kumargauravsharma
Copy link
Contributor Author

When running multiple scenarios, how easy is it to debug the output in PyTest? Currently PyTest is very readable, as each test is a separate class and the assertions are very clearly laid out in static Python code.

Following help analyze test step failure:

  1. Each scenario is executed as individual test with unique test id. The test id includes the test scenario file name:
    For example if scenarios are "scenario2.yaml" and "scenario1.yaml" then corresponding test ids are:
declarative_tests.py::TestSuite::test_scenario[scenario2.yaml] 
declarative_tests.py::TestSuite::test_scenario[scenario1.yaml] 
  1. Test framework uses logging, when log level is configured (for example at INFO level using --log-cli-level=INFO), failure details are logged. It provides information about the scenario, step that failed for the test id:
declarative_tests.py::TestSuite::test_scenario[scenario2.yaml] 
-------------------------------- live log call ---------------------------------
INFO     root:runner.py:31 Execute: Scenario(id='transitEncryptionEnabled_on_CMD_replication_group_causes_Advisory_condition')
INFO     root:runner.py:75 Execute: Step(id='create_CMD_replication_group_with_transitEncryptionEnabled')
INFO     root:runner.py:115 assert:  Step(id='create_CMD_replication_group_with_transitEncryptionEnabled')
ERROR    root:runner.py:124 AssertionError at Step(id='create_CMD_replication_group_with_transitEncryptionEnabled')
FAILED                                                                   [ 50%]
  1. Upon test failure, pytest assert introspection provides information about the scenario that failed:
    It contains information about scenario, and failed step's expectations, resource reference reference, and the assertion values that failed:
declarative_tests.py:90 (TestSuite.test_scenario[scenario2.yaml])
self = <e2e.declarative_tests.TestSuite object at 0x106236eb0>
scenario = Scenario(id='transitEncryptionEnabled_on_CMD_replication_group_causes_Advisory_condition')

    def test_scenario(self, scenario):
>       runner.run(scenario)

declarative_tests.py:92: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../declarative_test_fwk/runner.py:33: in run
    run_step(step)
../declarative_test_fwk/runner.py:82: in run_step
    assert_expectations(step)
../declarative_test_fwk/runner.py:125: in assert_expectations
    raise ae
../declarative_test_fwk/runner.py:122: in assert_expectations
    resource_helper.assert_expectations(step.verb, step.input_data, step.expectations, reference)
../declarative_test_fwk/helper.py:97: in assert_expectations
    self._assert_conditions(expectations, reference)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <e2e.declarative_tests.ReplicationGroupHelper object at 0x10627db50>
expectations = {'status': {'conditions': {'ACK.Advisory': 'True'}}}
reference = CustomResourceReference(group='elasticache.services.k8s.aws', version='v1alpha1', plural='replicationgroups', name='test-5sebujxj5bsa111bdv1touzi2ut52kl', namespace='default')

    def _assert_conditions(self, expectations: dict, reference: k8s.CustomResourceReference):
>       assert True == False
E       AssertionError

../declarative_test_fwk/helper.py:106: AssertionError

@kumargauravsharma kumargauravsharma force-pushed the declarative-test-fwk branch 5 times, most recently from 13d963d to dadc204 Compare July 17, 2021 08:40
Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumargauravsharma once again, I'm diggin' the move to declarative testing. ++ on that!

I have some suggestions inline that I think might make the test scenarios easier to write and read.

"""
Represents a declarative test step
"""
indent = "\t\t"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the unused variable.

@@ -0,0 +1,40 @@
id: "RG_CMD_basic_create_update"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend naming the file test/e2e/scenarios/replication_group/cmd_basic_create_update.yaml or something like that instead of scenario1.yaml :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated file name.

Comment on lines 3 to 8
usecases:
- replication_group
- cmd
#marks:
# - slow
# - blocked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between usecases and marks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some details related to it at: #40 (comment)

I think currently marks are used to include tests that are by default excluded otherwise.
The intent of usecases was to have 'selector'/'filter' behavior such that when it is specified to pytest run command, only tests that match the supplied usecase are included in the test run.

Note: the behavior is not yet implemented.

Comment on lines 18 to 23
spec:
engine: redis
replicationGroupID: test$RANDOM_SUFFIX
replicationGroupDescription: Basic create and update of CMD RG
cacheNodeType: cache.t3.micro
numNodeGroups: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool to allow spec values directly in the scenario YAML definition. I'd recommend also allowing to load from a filepath as wel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code. create, patch, delete now support reading custom resource from filepath.

Filepath is resolved by taking resource_directory parameter into account:

loader.load_scenario(scenario_file_path, resource_directory, input_replacements)

Example:

steps:
  - id: "create_CMD_replication_group"
    description: "Initial config"
    create: "replicationgroup_cmd.yaml"

apiVersion: $CRD_GROUP/$CRD_VERSION
kind: ReplicationGroup
metadata:
name: test$RANDOM_SUFFIX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does $RANDOM_SUFFIX come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method:
load_scenario(scenario_file: Path, resource_directory: Path = None, replacements: dict = {}) -> model.Scenario

sets RANDOM_SUFFIX in replacements dictionary before loading the scenario file.

Each scenario gets different random suffix.

Comment on lines 96 to 100
# conditions expectations met, now check current resource against expectations
resource = k8s.get_resource(reference)
self.assert_items(expectations.get("status"), resource.get("status"))

# self._assert_state(expectations.get("spec"), resource) # uncomment to support spec assertions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda funky because you are reading the CR from the Kubernetes API server multiple times (hidden to the user) but the test scenario acts like what is being asserted is simply the returned CR from the Create or Patch operation.

I feel like a more robust solution, if slightly more verbose, would be to have an explicit wait part of the Step structure that occurs before the expect stanza.

So, instead of this:

steps:
  - id: "create_CMD_replication_group"
    description: "Initial config"
    create:
      spec:
        engine: redis
        replicationGroupID: test$RANDOM_SUFFIX
        replicationGroupDescription: Basic create and update of CMD RG
        cacheNodeType: cache.t3.micro
        numNodeGroups: 1
      expect:
        status:
          conditions:
            ACK.ResourceSynced: "True"
          status: "available"

You would do something like this:

steps:
  - id: "create_CMD_replication_group"
    description: "Initial config"
    create:
      spec:
        engine: redis
        replicationGroupID: test$RANDOM_SUFFIX
        replicationGroupDescription: Basic create and update of CMD RG
        cacheNodeType: cache.t3.micro
        numNodeGroups: 1
    wait:
      status:
         conditions:
           ACK.ResourceSynced:
             value: "True"
             timeout: 30
    # These assertions are made on the resulting CR from a Kuberetes Describe operation
    # after all waits have successfully completed
    expect:
      status: "available"

That way, it's more explicit that there are actually distinct reads to the Kubernetes API server, with one set of reads having a retry/wait semantic and the other not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. If we add a wait field, then assert_expectations just needs one describe call and can perform all the assertions using the results of that.

The way the wait is specified can also potentially be simplified:

    wait:
      status:
         conditions:
           ACK.ResourceSynced:
             value: "True"
             timeout: 30

If we decide that all we will ever wait on is the resource synced condition, then it could be as simple as:

waitResourceSynced: 30

or something in between.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code.
wait can be specified as:

  1. Wait for 5 minutes
    wait: 5
  1. Wait on condition expected value - waits for default timeout (30 minutes)
    wait:
      status:
        conditions:
          ACK.Terminal: "True"

I kept the wait details open ended, so that support to wait on other properties of custom resource can be be added later.

  1. Wait on specific condition details with optional timeout:
    wait:
      status:
        conditions:
          ACK.ResourceSynced:
            status: "True"
            timeout: 30     # timeout in minutes, if not specified defaults to 30 minutes

Assert on condition message:

Conditions can be specified with expected message (both in wait as well as expect):

    status:
        conditions:
          ACK.Terminal: "True"
          ACK.ResourceSynced:
            status: "False"
            message: "Expected message ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Wait for 5 minutes

Seconds, not minutes :)

@kumargauravsharma
Copy link
Contributor Author

/test elasticache-kind-e2e

2 similar comments
@kumargauravsharma
Copy link
Contributor Author

/test elasticache-kind-e2e

@kumargauravsharma
Copy link
Contributor Author

/test elasticache-kind-e2e

@kumargauravsharma
Copy link
Contributor Author

/retest

@kumargauravsharma
Copy link
Contributor Author

/test elasticache-kind-e2e

@kumargauravsharma
Copy link
Contributor Author

/test elasticache-kind-e2e

@nmvk
Copy link
Contributor

nmvk commented Oct 4, 2021

/retest

Copy link
Member

@a-hilaly a-hilaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good @kumargauravsharma ! I left one small question

reference = self.custom_resource_reference(input_data, input_replacements)
_ = k8s.create_custom_resource(reference, input_data)
resource = k8s.wait_resource_consumed_by_controller(reference, wait_periods=10)
assert resource is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to use assert statements in non unit test files? I'm not very experienced with python but i always thought that it's discouraged to use assert in libraries/functions that be imported by other projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the file is used in a test context I used the assert so that it helps in debug if it fails.
As we use this framework in e2e tests, we can evolve the code to keep/remove it later.

Copy link
Contributor

@RedbackThomson RedbackThomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this looks like it's solid. I still stand by my comment for the namePrefix part, but I'm happy with everything else.

apiVersion: $CRD_GROUP/$CRD_VERSION
kind: ReplicationGroup
metadata:
name: test$RANDOM_SUFFIX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any comments on this @kumargauravsharma ? I still feel it leaves room for mistakes as it currently exists. If everyone is just going to have to append $RANDOM_SUFFIX anyway, then lets just do it for them?

@kumargauravsharma
Copy link
Contributor Author

@RedbackThomson
For $RANDOM_SUFFIX I considered scenario where one resource refers the other resource by name in its Spec, then the current mechanism allows such configuration, which will be difficult to setup if framework adds the suffix automatically.

For the current PR, I think we can keep the current logic.
We can enhace the framework as we use it to automate test scenarios and if we find that auto appending the suffix makes it better.

Copy link
Contributor

@nmvk nmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ack-bot ack-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2021
@ack-bot
Copy link
Collaborator

ack-bot commented Oct 5, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kumargauravsharma, nmvk, RedbackThomson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [RedbackThomson,kumargauravsharma,nmvk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RedbackThomson
Copy link
Contributor

@kumargauravsharma Alright. I'll take your word for that. Let's ship it

@ack-bot ack-bot merged commit 599c140 into aws-controllers-k8s:main Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants