Skip to content

Commit

Permalink
Fix and improve canary thrift config and docs (cadence-workflow#4580)
Browse files Browse the repository at this point in the history
  • Loading branch information
longquanzheng authored Oct 19, 2021
1 parent 93934ab commit 53833a2
Show file tree
Hide file tree
Showing 8 changed files with 125 additions and 80 deletions.
8 changes: 3 additions & 5 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,6 @@ Different ways of start the bench workers:

#### 1. Use docker image `ubercadence/cadence-bench:master`

For now, this image has no release versions for simplified the release process. Always use `master` tag for the image.

Similar to server/CLI images, the bench image will be built and published automatically by Github on every commit onto the `master` branch.

You can [pre-built docker-compose file](../docker/docker-compose-bench.yml) to run against local server
In the `docker/` directory, run:
```
Expand All @@ -39,7 +35,9 @@ You can modify [the bench worker config](../docker/config/bench/development.yaml

Or may run it with Kubernetes, for [example](https://github.com/longquanzheng/cadence-lab/blob/master/eks/bench-deployment.yaml).



NOTE: Similar to server/CLI images, the `master` image will be built and published automatically by Github on every commit onto the `master` branch.
To use a different image than `master` tag. See [docker hub](https://hub.docker.com/repository/docker/ubercadence/cadence-bench) for all the images.

#### 2. Build & Run the binary

Expand Down
103 changes: 60 additions & 43 deletions canary/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This README describes how to set up Cadence canary, different types of canary te

Setup
-----------
### Cadence server
## Prerequisite: Cadence server

Canary test suite is running against a Cadence server/cluster. See [documentation](https://cadenceworkflow.io/docs/operation-guide/setup/) for Cadence server cluster setup.

Expand All @@ -14,27 +14,24 @@ For local server env you can run it through:
- Docker: Instructions for running Cadence server through docker can be found in `docker/README.md`. Either `docker-compose-es-v7.yml` or `docker-compose-es.yml` can be used to start the server.
- Build from source: Please check [CONTRIBUTING](/CONTRIBUTING.md) for how to build and run Cadence server from source. Please also make sure Kafka and ElasticSearch are running before starting the server with `./cadence-server --zone es start`. If ElasticSearch v7 is used, change the value for `--zone` flag to `es_v7`.

### Start canary
## Run canary

:warning: NOTE: By default, starting this canary worker will not automatically start a canary test. Next two sections will cover how to start and configure it.
Different ways of start the canary:

Different ways of start the canary workers:

#### 1. Use docker image `ubercadence/cadence-canary:master`

For now, this image has no release versions for simplified the release process. Always use `master` tag for the image.

Similar to server/CLI images, the canary image will be built and published automatically by Github on every commit onto the `master` branch.
### 1. Use docker image `ubercadence/cadence-canary:master`

You can [pre-built docker-compose file](../docker/docker-compose-canary.yml) to run against local server
In the `docker/` directory, run:
```
docker-compose -f docker-compose-canary.yml up
```
You can modify [the canary worker config](../docker/config/canary/development.yaml) to run against a prod server cluster.
You can modify [the canary worker config](../docker/config/canary/development.yaml) to run against a prod server cluster:
* Use a different mode to start canary worker only for testing
* Update the config to use Thrift/gRPC for communication
* Use a different image than `master` tag. See [docker hub](https://hub.docker.com/repository/docker/ubercadence/cadence-canary) for all the images.
Similar to server/CLI images, the `master` image will be built and published automatically by Github on every commit onto the `master` branch.


#### 2. Build & Run the worker/canary
### 2. Build & Run

In the project root, build cadence canary binary:
```
Expand All @@ -52,10 +49,16 @@ This is essentially the same as

By default, it will load [the configuration in `config/canary/development.yaml`](../config/canary/development.yaml).
Run `./cadence-canary -h` for details to understand the start options of how to change the loading directory if needed.
This will only start the workers.

This will only start the workers. To start both workers and cron starter:
```
./cadence-canary start -mode all
```

### 3. Monitoring

In production, it's recommended to monitor the result of this canary. You can use [the workflow success metric](https://github.com/uber/cadence/blob/9336ed963ca1b5e0df7206312aa5236433e04fd9/service/history/execution/context_util.go#L138)
emitted by cadence history service `workflow_success`.
emitted by cadence history service `workflow_success`. To monitor all the canary test cases, use `workflowType` of `workflow.sanity`.

Configurations
----------------------
Expand All @@ -76,15 +79,16 @@ An exception here is `HistoryArchival` and `VisibilityArchival` test cases will
```yaml
cadence:
service: "cadence-frontend" # frontend service name
host: "127.0.0.1:7933" # frontend address
address: "127.0.0.1:7833" # frontend address
#host: "127.0.0.1:7933" # replace address with host if using Thrift for compatibility
```
- **Metrics**: metrics configuration. Similar to server metric emitter, only M3/Statsd/Prometheus is supported.
- **Log**: logging configuration. Similar to server logging configuration.

Canary Test Cases
Canary Test Cases & Starter
----------------------

#### Cron Canary: periodically running Sanity test suite
### Cron Canary (periodically running the Sanity/starter suite)

The Cron workflow is not a test case. It's a top-level workflow to kick off the Sanity suite(described below) periodically.
To start the cron canary:
Expand All @@ -103,7 +107,7 @@ It can be [improved](https://github.com/uber/cadence/issues/4469) in the future.

The workflowID is fixed: `"cadence.canary.cron"`

#### Test case starter & Sanity suite
### Sanity suite (Starter for all test cases)
The sanity workflow is test suite workflow. It will kick off a bunch of childWorkflows for all the test to verify that Cadence server is operating correctly.

An error result of the sanity workflow indicates at least one of the test case fails.
Expand All @@ -112,67 +116,80 @@ You can start the sanity workflow as one-off run:
```
cadence --do <the domain you configured> workflow start --tl canary-task-queue --et 1200 --wt workflow.sanity -i 0
```
Note:

Or using the Cron Canary mentioned above to manage it.


Then observe the progress:
```
cadence --do cadence-canary workflow ob -w <...workflowID form the start command output>
```

NOTE 1:
* tasklist(tl) is fixed to `canary-task-queue`
* execution timeout(et) is recommended to 20 minutes(`1200` seconds) but you can adjust it
* the only required input is the scheduled unix timestamp, and `0` will uses the workflow starting time


Or using a cron job(e.g. every minute):
```
cadence --do <the domain you configured> workflow start --tl canary-task-queue --et 1200 --wt workflow.sanity -i 0 --cron "* * * * *"
```
NOTE 2: This is the workflow that you should monitor for alerting.
You can use [the workflow success metric](https://github.com/uber/cadence/blob/9336ed963ca1b5e0df7206312aa5236433e04fd9/service/history/execution/context_util.go#L138)
emitted by cadence history service `workflow_success`. To monitor all the canary test cases use `workflowType` of `workflow.sanity`.

This is [the list of the test cases](./sanity.go) that it will start all supported test cases by default if no excludes are configured.
NOTE 3: This is [the list of the test cases](./sanity.go) that it will start all supported test cases by default if no excludes are configured.
You can find [the workflow names of the tests cases in this file](./const.go) if you want to manually start certain test cases.
For example, manually start an `Echo` test case:


### Echo
Echo workflow tests the very basic workflow functionality. It executes an activity to return some output and verifies it as the workflow result.

To manually start an `Echo` test case:
```
cadence --do <> workflow start --tl canary-task-queue --et 10 --wt workflow.echo
```

Once you start the test cases, you can observe the progress:
Then observe the progress:
```
cadence --do cadence-canary workflow ob -w <...workflowID form the start command output>
```

#### Echo
Echo workflow tests the very basic workflow functionality. It executes an activity to return some output and verifies it as the workflow result.
You can use these command for all other test cases listed below.

#### Signal
### Signal
Signal workflow tests the signal feature.

#### Visibility
### Visibility
Visibility workflow tests the basic visibility feature. No advanced visibility needed, but advanced visibility should also support it.

#### SearchAttributes
### SearchAttributes
SearchAttributes workflow tests the advanced visibility feature. Make sure advanced visibility feature is configured on the server. Otherwise, it should be excluded from the sanity test suite/case.

#### ConcurrentExec
### ConcurrentExec
ConcurrentExec workflow tests executing activities concurrently.

#### Query
### Query
Query workflow tests the Query feature.

#### Timeout
### Timeout
Timeout workflow make sure the activity timeout is enforced.

#### LocalActivity
### LocalActivity
LocalActivity workflow tests the local activity feature.

#### Cancellation
### Cancellation
Cancellation workflowt tests cancellation feature.

#### Retry
### Retry
Retry workflow tests activity retry policy.

#### Reset
### Reset
Reset workflow tests reset feature.

#### HistoryArchival
### HistoryArchival
HistoryArchival tests history archival feature. Make sure history archival feature is configured on the server. Otherwise, it should be excluded from the sanity test suite/case.
This test case always uses `canary-archival-domain` domain.

#### VisibilityArchival
### VisibilityArchival
VisibilityArchival tests visibility archival feature. Make sure visibility feature is configured on the server. Otherwise, it should be excluded from the sanity test suite/case.

#### Batch
### Batch
Batch workflow tests the batch job feature. Make sure advanced visibility feature is configured on the server. Otherwise, it should be excluded from the sanity test suite/case.
11 changes: 7 additions & 4 deletions canary/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ const (
EnvKeyEnvironment = "CADENCE_CANARY_ENVIRONMENT"
// EnvKeyAvailabilityZone is the environment variable key for AZ
EnvKeyAvailabilityZone = "CADENCE_CANARY_AVAILABILITY_ZONE"
// EnvKeyMode is the environment variable key for Mode
EnvKeyMode = "CADENCE_CANARY_MODE"
)

const (
// CadenceLocalHostPort is the default address for cadence frontend service
CadenceLocalHostPort = "127.0.0.1:7933"
// CadenceServiceName is the default service name for cadence frontend
CadenceServiceName = "cadence-frontend"
// CanaryServiceName is the default service name for cadence canary
Expand Down Expand Up @@ -77,8 +77,11 @@ type (

// Cadence contains the configuration for cadence service
Cadence struct {
ServiceName string `yaml:"service"`
HostNameAndPort string `yaml:"host"`
ServiceName string `yaml:"service"`
// support Thrift for backward compatibility. It will be ignored if host (gRPC) is used.
ThriftHostNameAndPort string `yaml:"host"`
// gRPC host name and port
GRPCHostNameAndPort string `yaml:"address"`
}
)

Expand Down
63 changes: 42 additions & 21 deletions canary/runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,12 @@ import (
"sync"
"time"

"go.uber.org/cadence/.gen/go/cadence/workflowserviceclient"
apiv1 "go.uber.org/cadence/.gen/proto/api/v1"
"go.uber.org/cadence/compatibility"
"go.uber.org/yarpc"
"go.uber.org/yarpc/transport/grpc"
"go.uber.org/yarpc/transport/tchannel"
"go.uber.org/zap"

"github.com/uber/cadence/common/log/loggerimpl"
Expand All @@ -53,34 +55,53 @@ func NewCanaryRunner(cfg *Config) (Runnable, error) {
cfg.Cadence.ServiceName = CadenceServiceName
}

if cfg.Cadence.HostNameAndPort == "" {
cfg.Cadence.HostNameAndPort = CadenceLocalHostPort
var dispatcher *yarpc.Dispatcher
var runtimeContext *RuntimeContext
if cfg.Cadence.GRPCHostNameAndPort != "" {
dispatcher = yarpc.NewDispatcher(yarpc.Config{
Name: CanaryServiceName,
Outbounds: yarpc.Outbounds{
cfg.Cadence.ServiceName: {Unary: grpc.NewTransport().NewSingleOutbound(cfg.Cadence.GRPCHostNameAndPort)},
},
})
clientConfig := dispatcher.ClientConfig(cfg.Cadence.ServiceName)
runtimeContext = NewRuntimeContext(
logger,
metricsScope,
compatibility.NewThrift2ProtoAdapter(
apiv1.NewDomainAPIYARPCClient(clientConfig),
apiv1.NewWorkflowAPIYARPCClient(clientConfig),
apiv1.NewWorkerAPIYARPCClient(clientConfig),
apiv1.NewVisibilityAPIYARPCClient(clientConfig),
),
)
} else if cfg.Cadence.ThriftHostNameAndPort != "" {
tch, err := tchannel.NewChannelTransport(
tchannel.ServiceName(CanaryServiceName),
)
if err != nil {
return nil, fmt.Errorf("failed to create transport channel: %v", err)
}
dispatcher = yarpc.NewDispatcher(yarpc.Config{
Name: CanaryServiceName,
Outbounds: yarpc.Outbounds{
cfg.Cadence.ServiceName: {Unary: tch.NewSingleOutbound(cfg.Cadence.ThriftHostNameAndPort)},
},
})
runtimeContext = NewRuntimeContext(
logger,
metricsScope,
workflowserviceclient.New(dispatcher.ClientConfig(cfg.Cadence.ServiceName)),
)
} else {
return nil, fmt.Errorf("must specify either gRPC address(address) or Thrift address (host) in the config")
}

dispatcher := yarpc.NewDispatcher(yarpc.Config{
Name: CanaryServiceName,
Outbounds: yarpc.Outbounds{
cfg.Cadence.ServiceName: {Unary: grpc.NewTransport().NewSingleOutbound(cfg.Cadence.HostNameAndPort)},
},
})

if err := dispatcher.Start(); err != nil {
dispatcher.Stop()
return nil, fmt.Errorf("failed to create outbound transport channel: %v", err)
}

clientConfig := dispatcher.ClientConfig(cfg.Cadence.ServiceName)
runtimeContext := NewRuntimeContext(
logger,
metricsScope,
compatibility.NewThrift2ProtoAdapter(
apiv1.NewDomainAPIYARPCClient(clientConfig),
apiv1.NewWorkflowAPIYARPCClient(clientConfig),
apiv1.NewWorkerAPIYARPCClient(clientConfig),
apiv1.NewVisibilityAPIYARPCClient(clientConfig),
),
)

return &canaryRunner{
RuntimeContext: runtimeContext,
config: &cfg.Canary,
Expand Down
7 changes: 4 additions & 3 deletions cmd/canary/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,9 +124,10 @@ func buildCLI() *cli.App {
Usage: "start cadence canary worker or cron, or both",
Flags: []cli.Flag{
cli.StringFlag{
Name: "mode, m",
Value: canary.ModeWorker,
Usage: fmt.Sprintf("%v, %v or %v", canary.ModeWorker, canary.ModeCronCanary, canary.ModeAll),
Name: "mode, m",
Value: canary.ModeWorker,
Usage: fmt.Sprintf("%v, %v or %v", canary.ModeWorker, canary.ModeCronCanary, canary.ModeAll),
EnvVar: canary.EnvKeyMode,
},
},
Action: func(c *cli.Context) {
Expand Down
5 changes: 3 additions & 2 deletions config/canary/development.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
canary:
domains: ["cadence-canary"]
excludes: ["workflow.searchAttributes", "workflow.batch", "workflow.archival.visibility"]
excludes: ["workflow.searchAttributes", "workflow.batch", "workflow.archival.history", "workflow.archival.visibility"]

cadence:
service: "cadence-frontend"
host: "127.0.0.1:7833"
address: "127.0.0.1:7833"
#host: "127.0.0.1:7933" # replace address with host if using Thrift for compatibility
6 changes: 4 additions & 2 deletions docker/config/canary/development.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ log:

canary:
domains: ["cadence-canary"]
excludes: ["workflow.searchAttributes", "workflow.batch", "workflow.archival.visibility"]
excludes: ["workflow.searchAttributes", "workflow.batch", "workflow.archival.history", "workflow.archival.visibility"]

cadence:
service: "cadence-frontend"
host: "host.docker.internal:7933" # see https://docs.docker.com/desktop/mac/networking/
address: "host.docker.internal:7833" # address is for gRPC
#host: "host.docker.internal:7933" # for using thrift, replace address with host

2 changes: 2 additions & 0 deletions docker/docker-compose-canary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ services:
image: ubercadence/cadence-canary:master
volumes:
- ./config/canary:/etc/cadence-canary/config/canary
environment:
- "CADENCE_CANARY_MODE=all" # this will run both worker and cron starter

0 comments on commit 53833a2

Please sign in to comment.