English | 中文
Opentelemetry for Kitex
- Out-of-the-box default opentelemetry provider
- Support setting via environment variables
- Support server and client kitex rpc tracing
- Support automatic transparent transmission of peer service through meta info
- Support kitex rpc metrics [R.E.D]
- Support service topology map metrics [Service Topology Map]
- Support go runtime metrics
- Extend kitex logger based on logrus and zap
- Implement tracing auto associated logs
import (
...
"github.com/kitex-contrib/obs-opentelemetry/provider"
"github.com/kitex-contrib/obs-opentelemetry/tracing"
)
func main() {
serviceName := "echo"
p := provider.NewOpenTelemetryProvider(
provider.WithServiceName(serviceName),
provider.WithExportEndpoint("localhost:4317"),
provider.WithInsecure(),
)
defer p.Shutdown(context.Background())
svr := echo.NewServer(
new(EchoImpl),
server.WithSuite(tracing.NewServerSuite()),
// Please keep the same as provider.WithServiceName
server.WithServerBasicInfo(&rpcinfo.EndpointBasicInfo{ServiceName: serviceName}),
)
if err := svr.Run(); err != nil {
klog.Fatalf("server stopped with error:", err)
}
}
import (
...
"github.com/kitex-contrib/obs-opentelemetry/provider"
"github.com/kitex-contrib/obs-opentelemetry/tracing"
)
func main(){
serviceName := "echo-client"
p := provider.NewOpenTelemetryProvider(
provider.WithServiceName(serviceName),
provider.WithExportEndpoint("localhost:4317"),
provider.WithInsecure(),
)
defer p.Shutdown(context.Background())
c, err := echo.NewClient(
"echo",
client.WithSuite(tracing.NewClientSuite()),
// Please keep the same as provider.WithServiceName
client.WithClientBasicInfo(&rpcinfo.EndpointBasicInfo{ServiceName: serviceName}),
)
if err != nil {
klog.Fatal(err)
}
}
import (
kitexlogrus "github.com/kitex-contrib/obs-opentelemetry/logging/logrus"
)
func init() {
klog.SetLogger(kitexlogrus.NewLogger())
klog.SetLevel(klog.LevelDebug)
}
// Echo implements the Echo interface.
func (s *EchoImpl) Echo(ctx context.Context, req *api.Request) (resp *api.Response, err error) {
klog.CtxDebugf(ctx, "echo called: %s", req.GetMessage())
return &api.Response{Message: req.Message}, nil
}
{"level":"debug","msg":"echo called: my request","span_id":"056e0cf9a8b2cec3","time":"2022-03-09T02:47:28+08:00","trace_flags":"01","trace_id":"33bdd3c81c9eb6cbc0fbb59c57ce088b"}
Below is a table of RPC server metric instruments.
Name | Instrument | Unit | Unit (UCUM) | Description | Status | Streaming |
---|---|---|---|---|---|---|
rpc.server.duration |
Histogram | milliseconds | ms |
measures duration of inbound RPC | Recommended | N/A. While streaming RPCs may record this metric as start-of-batch to end-of-batch, it's hard to interpret in practice. |
Below is a table of RPC client metric instruments. These apply to traditional RPC usage, not streaming RPCs.
Name | Instrument | Unit | Unit (UCUM) | Description | Status | Streaming |
---|---|---|---|---|---|---|
rpc.client.duration |
Histogram | milliseconds | ms |
measures duration of outbound RPC | Recommended | N/A. While streaming RPCs may record this metric as start-of-batch to end-of-batch, it's hard to interpret in practice. |
The RED Method defines the three key metrics you should measure for every microservice in your architecture. We can calculate RED based on rpc.server.duration
.
the number of requests, per second, you services are serving.
eg: QPS
sum(rate(rpc_server_duration_count{}[5m])) by (service_name, rpc_method)
the number of failed requests per second.
eg: Error ratio
sum(rate(rpc_server_duration_count{status_code="Error"}[5m])) by (service_name, rpc_method) / sum(rate(rpc_server_duration_count{}[5m])) by (service_name, rpc_method)
distributions of the amount of time each request takes
eg: P99 Latency
histogram_quantile(0.99, sum(rate(rpc_server_duration_bucket{}[5m])) by (le, service_name, rpc_method))
The rpc.server.duration
will record the peer service and the current service dimension. Based on this dimension, we can aggregate the service topology map
sum(rate(rpc_server_duration_count{}[5m])) by (service_name, peer_service)
Name | Instrument | Unit | Unit (UCUM)) | Description |
---|---|---|---|---|
process.runtime.go.cgo.calls |
Sum | - | - | Number of cgo calls made by the current process. |
process.runtime.go.gc.count |
Sum | - | - | Number of completed garbage collection cycles. |
process.runtime.go.gc.pause_ns |
Histogram | nanosecond | ns |
Amount of nanoseconds in GC stop-the-world pauses. |
process.runtime.go.gc.pause_total_ns |
Histogram | nanosecond | ns |
Cumulative nanoseconds in GC stop-the-world pauses since the program started. |
process.runtime.go.goroutines |
Gauge | - | - | measures duration of outbound RPC. |
process.runtime.go.lookups |
Sum | - | - | Number of pointer lookups performed by the runtime. |
process.runtime.go.mem.heap_alloc |
Gauge | bytes | bytes |
Bytes of allocated heap objects. |
process.runtime.go.mem.heap_idle |
Gauge | bytes | bytes |
Bytes in idle (unused) spans. |
process.runtime.go.mem.heap_inuse |
Gauge | bytes | bytes |
Bytes in in-use spans. |
process.runtime.go.mem.heap_objects |
Gauge | - | - | Number of allocated heap objects. |
process.runtime.go.mem.live_objects |
Gauge | - | - | Number of live objects is the number of cumulative Mallocs - Frees. |
process.runtime.go.mem.heap_released |
Gauge | bytes | bytes |
Bytes of idle spans whose physical memory has been returned to the OS. |
process.runtime.go.mem.heap_sys |
Gauge | bytes | bytes |
Bytes of idle spans whose physical memory has been returned to the OS. |
runtime.uptime |
Sum | ms | ms |
Milliseconds since application was initialized. |
The sdk of OpenTelemetry is fully compatible with 1.X opentelemetry-go. see
maintained by: CoderPoet
Library/Framework | Versions | Notes |
---|---|---|
go.opentelemetry.io/otel | v1.7.0 | |
go.opentelemetry.io/otel/trace | v1.7.0 | |
go.opentelemetry.io/otel/metric | v0.30.0 | |
go.opentelemetry.io/otel/semconv | v1.7.0 | |
go.opentelemetry.io/contrib/instrumentation/runtime | v0.30.0 | |
kitex | v0.3.1 |