Skip to content

Commit

Permalink
NET-2397: Add readme.md to upgrade test subdirectory
Browse files Browse the repository at this point in the history
  • Loading branch information
NiniOak committed Mar 14, 2023
1 parent a915d0c commit 75fbb63
Show file tree
Hide file tree
Showing 3 changed files with 126 additions and 0 deletions.
123 changes: 123 additions & 0 deletions test/integration/consul-container/test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Upgrade Integration Tests

The upgrade integration test is responsible for ensuring that users are able to upgrade between consul versions without experiencing difficulties. The upgrade tests does the following:
- Ensure that customers are able to upgrade to a new Consul version, and new and existing features work as expected.
- This tests aims to ensure a high degree of interoperability between versions of Consul by supporting `n-2` and `n-1` upgrades, ensuring that upgrading to newer versions does not degrade or break functionalities.
- Validate users are able to perform a direct upgrade of Consul. For example upgrading from `1.13 → 1.14`.
- Validate users are able to perform skip-version upgrade of Consul. For example upgrading from `1.13 → 1.15`.

- The upgrade tests also aims to highlight errors that may occur as users attempt to upgrade their current version to a newer version.

Here is an example of how the upgrade tests work
1. Create a cluster with a specified number of server and client agents, then enable the feature to be tested.
2. Create some workload in the cluster, e.g. register 2 services: static-server, static-client.
3. Verify connection / disconnection (e.g. deny Action)
4. Upgrade consul cluster and restart the envoy sidecars (we restart envoy sidecar to ensure the upgraded consul binary can read the state from the previous version and generate the correct envoy configurations)
5. Verify connection / disconnection (e.g., deny Action)

**Note** that all consul agents and user workloads such as application services, mesh-gateway are running in docker containers.


The tests focus on Consul upgrade lifecycle and emit logs to stdout which developers can silence. Please reach out to the CTIA Team for any questions or help getting started.

## Getting Started
### Prerequisites
If you wish to run or add new test cases, the following are required:
- install [Go](https://go.dev/) (the version should match that of our CI config's Go image).
- install [`golangci-lint`](https://golangci-lint.run/usage/install/)
- install [`Makefile`](https://www.gnu.org/software/make/manual/make.html)
- [`Docker`](https://docs.docker.com/get-docker/) required to run tests locally

### Running Upgrade integration tests
In CI, the tests are executed against all test cases defined under test/integration/consul-container/test/upgrade. To run a selected test case locally, e.g., testing upgrade from 1.15 to local changed version, we can use the following command:
```sh
make dev-docker
cd ./test/integration/consul-container
go test -v -timeout 30m -run ^TestACL_Upgrade_Node_Token$ ./.../upgrade/ --target-image consul --target-version local --latest-image consul --latest-version 1.15 -follow-log=false
```

The command above runs a single test. To run the entire upgrade test suite, the following command can be used.
```sh
go test -v -timeout 30m ./.../upgrade --target-image consul --target-version local --latest-image consul --latest-version 1.15 -follow-log=false
```
Below are the supported CLI options
| Tags / Flags | Default value | Description |
| ----------- | ----------- | ----------- |
| --latest-image | consul | Refers to the docker image name to be deployed before upgrade
| --latest-version | latest | Refers to the version or tag of latest image to be deployed before upgrade
| --target-image | consul | Refers to the image name that will be upgrade to
| --target-version | local | Refers to the version of target image to upgrade to
| -follow-log | false | This optional flag emit event logs



## Adding a new upgrade integration test
Upgrade integration tests are defined in the [test/integration/consul-container/test/upgrade](/test/integration/consul-container/test/upgrade) subdirectory and new upgrade integration tests should always be added to this location. The test framework uses
[functional table-driven tests in Go](https://yourbasic.org/golang/table-driven-unit-test/) and using function types to modify the base value for each test case.

For tests with multiple test cases, it should always start by invoking
```go
type testcase struct {
name string
create func()
extraAssertion func()
}
```
see example [here](./hashicorp/consul/test/integration/consul-container/test/upgrade/l7_traffic_management/resolver_default_subset_test.go). For upgrade tests with a single test case, they can be written like
```go
run := func(t *testing.T, oldVersion, targetVersion string) {
// insert test
}
t.Run(fmt.Sprintf("Upgrade from %s to %s", utils.LatestVersion, utils.TargetVersion),
func(t *testing.T) {
run(t, utils.LatestVersion, utils.TargetVersion)
})
```
see example [here](./hashicorp/consul/test/integration/consul-container/test/upgrade/acl_node_test.go)

### How it works
![Upgrade Tests Workflow](util/upgrade_tests_workflow.png?raw=true)

A Consul cluster is deployed, then a static-server, static-client and envoy sidecars are created in the cluster. An API request is made to the static-client to validate that it is ready.

Then we validate traffic between the static-server and static-client envoy sidecar. After validation, we take a snapshot and upgrade the Consul cluster to the `target-version` and restart the sidecars. Re-validate the client, server and sidecars to ensure the data snapshotted from the pervious version can be accessed in the latest version.


### Errors Test Cases
There are some caveats for special error handling of versions prior to `1.14`.
Upgrade tests for features such peering, had API changes that returns an error if attempt to upgrade, and should be accounted for in upgrade tests. If running upgrade tests for any version before `1.14`, the following lines of code needs to be added to skip test or it will not pass.

```go
fromVersion, err := version.NewVersion(utils.LatestVersion)
require.NoError(t, err)
if fromVersion.LessThan(utils.Version_1_14) {
continue
}
```
See example [here](https://github.com/hashicorp/consul-enterprise/blob/005a0a92c5f39804cef4ad5c4cd6fd3334b95aa2/test/integration/consul-container/test/upgrade/peering_control_plane_mgw_test.go#L92-L96)

To write tests for bugs found during upgrades, see example on how to add a testcase for those scenarios [here](./hashicorp/consul/test/integration/consul-container/test/upgrade/fullstopupgrade_test.go).

### FAQS

**Q.** To troubleshoot, how can I send API request or consul command to the deployed cluster?
**A.** To send an API request or command to the deployed cluster, ensure that a cluster, services and sidecars have been created. See example below:
```go
cluster, _, _ := topology.NewCluster()
clientService := createServices(t, cluster)
_, port := clientService.GetAddr()
_, adminPort := clientService.GetAdminAddr()
...
time.Sleep(900 * time.Second)
fmt.Println(port, adminPort)
```
Then in your terminal `docker ps -a | grep consul` to get the running services and cluster. Exec in the cluster and run commands directly or make API request to `localhost:port` to relevant service or `localhost:adminPort` for envoy.

**Q.** To troubleshoot, how can I access the envoy admin page?
**A.** To access envoy admin page, ensure that a cluster, services and sidecars have been created. Then get the adminPort for the client or server sidecar. See example on how to get the port above. Then navigate to a browser and go to the url `http://localhost:adminPort/`

**Q.** My test stuck with the error "could not start or join all agents: container 0: port not found"?
**A.** Simply re-run the tests. If the error persists, prune docker images `docker system prune`, run `make dev-docker`, then re-run tests again.

**Q.** How to clean up the resources created the upgrade test?
**A.** Run the command `docker ps | grep consul` to find all left over resources, then `docker stop {CONTAINER_ID} && docker rm {CONTAINER_ID}`
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package basic
import (
"fmt"
"testing"
"time"

"github.com/stretchr/testify/require"

Expand Down Expand Up @@ -40,6 +41,8 @@ func TestBasicConnectService(t *testing.T) {
clientService := createServices(t, cluster)
_, port := clientService.GetAddr()
_, adminPort := clientService.GetAdminAddr()
time.Sleep(900 * time.Second)
fmt.Println(port, adminPort)

libassert.AssertUpstreamEndpointStatus(t, adminPort, "static-server.default", "HEALTHY", 1)
libassert.GetEnvoyListenerTCPFilters(t, adminPort)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 75fbb63

Please sign in to comment.