Skip to content

Commit

Permalink
[doc]metrics (vesoft-inc#1406)
Browse files Browse the repository at this point in the history
* metrics

* comments

* wilson
  • Loading branch information
amber-moe authored and dutor committed Dec 16, 2019
1 parent c2f1836 commit 686d6cf
Show file tree
Hide file tree
Showing 8 changed files with 355 additions and 63 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Graph Metrics

## 介绍

目前,**Nebula Graph** 支持通过 HTTP 方式来获取 Graph Service 层的基本性能指标。

每一个性能指标都由三部分组成,分别为指标名、统计类型、时间范围。

| counter\_name | statistic\_type | time_range |
| ---- | ----|-------|

### 指标名

每个指标名都由服务名加模块名构成,目前支持获取如下接口:

```cpp
通过 storageClient 发送的请求,需要同时向多个 storage 并发多条消息时,按一次统计 graph_storageClient
通过 metaClient 发送的请求
graph_graph_all 客户端向 graph 发送的请求,当一条请求包含多条语句时,按一条计算 graph_metaClient
插入点 graph_insertVertex
插入边 graph_insertEdge
删除点 graph_deleteVertex
删除边 graph_deleteEdge //未支持
更新点的属性 graph_updateVertex
更新边的属性 graph_updateEdge
执行 go 命令 graph_go
查找最小路径或者全路径 graph_findPath
获取点属性,不统计获取点的总数,只统计执行命令的数量 graph_fetchVertex
获取边属性,不统计边的总数,只统计执行命令的数量 graph_fetchEdge
```

每一个接口都有三个性能指标,分别为延迟(单位为 us)、成功的 QPS、发生错误的 QPS,后缀名如下:

```text
_latency
_qps
_error_qps
```

将接口名和相应指标连接在一起即可获得完整的指标名,例如 `graph_insertVertex_latency``graph_insertVertex_qps``graph_insertVertex_error_qps`、分别代表插入一个点的延迟、QPS 和发生错误的 QPS。

### 统计类型

目前支持的统计类型有 SUM、COUNT、AVG、RATE 和 P 分位数 (P99,P999, ... ,P999999)。其中:

- `_qps``_error_qps` 后缀的指标,支持 SUM、COUNT、AVG、RATE,但不支持 P 分位;
- `_latency` 后缀的指标,支持 SUM、COUNT、AVG、RATE,也支持 P 分位。

### 时间范围

时间范围目前只支持三种,分别为 60、600、3600,分别表示最近一分钟,最近十分钟和最近一小时。

## 通过 HTTP 接口获取相应的性能指标

根据上面的介绍,就可以写出一个完整的指标名称了,下面是一些示例:

```cpp
graph_insertVertex_latency.avg.60 // 最近一分钟插入点命令执行成功的平均延时
graph_updateEdge_error_qps.count.3600 // 最近一小时更新边命令失败的总计数量
```

假设本地启动了一个 nebula graph service,同时启动时设置的 `ws_http_port` 端口号为 13000。通过 HTTP 的 **GET** 接口发送,方法名为 **get_stats**,参数为 stats 加对应的指标名字。下面是通过 HTTP 接口获取指标的示例:

```bash
# 获取一个指标
curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60"
# graph_insertVertex_qps.rate.60=3069

# 同时获取多个指标
curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60, graph_deleteVertex_latency.avg.60"
# graph_insertVertex_qps.rate.60=3069
# graph_deleteVertex_latency.avg.60=837

# 同时获取多个指标并以 json 格式返回
curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60, graph_deleteVertex_latency.avg.60&returnjson"
# [{"value":2373,"name":"graph_insertVertex_qps.rate.60"},{"value":760,"name":"graph_deleteVertex_latency.avg.60"}]

# 获取所有指标
curl -G "http://127.0.0.1:13000/get_stats?stats"
#
curl -G "http://127.0.0.1:13000/get_stats"
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Meta Metrics

## 介绍

目前,**Nebula Graph** 支持通过 HTTP 方式来获取 Meta Service 层的基本性能指标。

每一个性能指标都由三部分组成,分别为指标名、统计类型、时间范围。

| counter\_name | statistic\_type | time_range |
| ---- | ----|-------|

### 指标名

每个指标名都由服务名加模块名构成,meta 只统计心跳信息,目前支持获取如下接口:

```text
meta_heartbeat_qps
meta_heartbeat_error_qps
meta_heartbeat_latency
```

### 统计类型

目前支持的统计类型有 SUM、COUNT、AVG、RATE 和 P 分位数 (P99,P999, ... ,P999999)。其中:

- `_qps``_error_qps` 后缀的指标,支持 SUM、COUNT、AVG、RATE,但不支持 P 分位;
- `_latency` 后缀的指标,支持 SUM、COUNT、AVG、RATE,也支持 P 分位。

### 时间范围

时间范围目前只支持三种,分别为 60、600、3600,分别表示最近一分钟,最近十分钟和最近一小时。

## 通过 HTTP 接口获取相应的性能指标

下面是一些示例:

```cpp
meta_heartbeat_qps.avg.60 // 最近一分钟心跳的平均 QPS
meta_heartbeat_error_qps.count.60 // 最近一分钟心跳的平均错误总计数量
meta_heartbeat_latency.avg.60 // 最近一分钟心中的平均延时
```

假设本地启动了一个 nebula meta service,同时启动时设置的 `ws_http_port` 端口号为 11000。通过 HTTP 的 **GET** 接口发送,方法名为 **get_stats**,参数为 stats 加对应的指标名字。下面是通过 HTTP 接口获取指标的示例:

```bash
# 获取一个指标
curl -G "http://127.0.0.1:11000/get_stats?stats=meta_heartbeat_qps.avg.60"
# meta_heartbeat_qps.avg.60=580

# 同时获取多个指标
curl -G "http://127.0.0.1:11000/get_stats?stats=meta_heartbeat_qps.avg.60,meta_heartbeat_error_qps.avg.60"
# meta_heartbeat_qps.avg.60=537
# meta_heartbeat_error_qps.avg.60=579

# 同时获取多个指标并以 json 格式返回
curl -G "http://127.0.0.1:11000/get_stats?stats=meta_heartbeat_qps.avg.60,meta_heartbeat_error_qps.avg.60&returnjson"
# [{"value":533,"name":"meta_heartbeat_qps.avg.60"},{"value":574,"name":"meta_heartbeat_error_qps.avg.60"}]

# 获取所有指标
curl -G "http://127.0.0.1:11000/get_stats?stats"
#
curl -G "http://127.0.0.1:11000/get_stats"
```
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Nebula Graph Storage Metrics
# Storage Metrics

## 介绍

目前,**Nebula Graph** 支持通过 HTTP 方式来获取 Storage Service 层操作的一些基本性能指标
目前,**Nebula Graph** 支持通过 HTTP 方式来获取 Storage Service 层的基本性能指标

每一个性能指标都由三部分组成,分别为指标名统计类型时间范围。
每一个性能指标都由三部分组成,分别为指标名统计类型时间范围。

| counter\_name | statistic\_type | time_range |
| ---- | ----|-------|
Expand All @@ -13,67 +13,67 @@

### 指标名

每个指标名都由接口名加指标名构成,目前支持获取如下接口
每个指标名都由服务名加模块名构成,目前支持获取如下接口

```text
获取点的属性 vertex_props
获取边的属性 edge_props
插入一个点 add_vertex
插入一条边 add_edge
删除一个点 del_vertex
更新一个点的属性 update_vertex
更新一条边的属性 update_edge
读取一个键值对 get_kv
写入一个键值对 put_kv
仅限内部使用 get_bound
获取点的属性 storage_vertex_props
获取边的属性 storage_edge_props
插入一个点 storage_add_vertex
插入一条边 storage_add_edge
删除一个点 storage_del_vertex
更新一个点的属性 storage_update_vertex
更新一条边的属性 storage_update_edge
读取一个键值对 storage_get_kv
写入一个键值对 storage_put_kv
仅限内部使用 storage_get_bound
```

每一个接口都有三个性能指标,分别为延迟(单位为 us)、QPS、发生错误的 QPS,后缀名如下:
每一个接口都有三个性能指标,分别为延迟(单位为 us)、成功的 QPS、发生错误的 QPS,后缀名如下:

```text
_latency
_qps
_error_qps
```

将接口名和相应指标连接在一起即可获得完整的指标名,例如 `add_vertex_latency``add_vertex_qps``add_vertex_error_qps` 分别代表插入一个点的延迟、QPS 和发生错误的 QPS。
将接口名和相应指标连接在一起即可获得完整的指标名,例如 `storage_add_vertex_latency``storage_add_vertex_qps``storage_add_vertex_error_qps` 分别代表插入一个点的延迟、QPS 和发生错误的 QPS。

### 统计类型

目前支持的统计类型有 SUMCOUNTAVGRATE和 P 分位数 (P99,P999, ... ,P999999)。其中:
目前支持的统计类型有 SUMCOUNTAVGRATE 和 P 分位数 (P99,P999, ... ,P999999)。其中:

- `_latency``_error_qps` 这两类后缀的指标,支持 SUMCOUNTAVGRATE,但不支持 P 分位;
- `_qps` 后缀的指标,支持 SUMCOUNTAVGRATE,也支持 P 分位。
- `_qps``_error_qps` 后缀的指标,支持 SUMCOUNTAVGRATE,但不支持 P 分位;
- `_latency` 后缀的指标,支持 SUMCOUNTAVGRATE,也支持 P 分位。

### 时间范围

时间范围目前只支持三种,分别为 606003600,分别表示最近一分钟最近十分钟和最近一小时。
时间范围目前只支持三种,分别为 606003600,分别表示最近一分钟最近十分钟和最近一小时。

## 通过 HTTP 接口获取相应的性能指标

根据上面的介绍,就可以写出一个完整的指标名称了,下面是一些示例:

```text
add_vertex_latency.avg.60 # 最近一分钟插入一个点的平均延时
get_bound_qps.rate.600 # 最近十分钟获取邻点的 QPS
update_edge_error_qps.count.3600 # 最近一小时更新一条边发生错误的总计数量
```cpp
storage_add_vertex_latency.avg.60 // 最近一分钟插入一个点的平均延时
storage_get_bound_qps.rate.600 // 最近十分钟获取邻点的 QPS
storage_update_edge_error_qps.count.3600 // 最近一小时更新一条边发生错误的总计数量
```

假设本地启动了一个 nebula storage service,同时启动时设置的 `ws_http_port` 端口号为 50005。通过 HTTP 的 GET 接口发送,方法名为 get_stats,参数为 stats 加对应的指标名字。下面是通过 HTTP 接口获取指标的示例:
假设本地启动了一个 nebula storage service,同时启动时设置的 `ws_http_port` 端口号为 12000。通过 HTTP 的 **GET** 接口发送,方法名为 **get_stats**,参数为 stats 加对应的指标名字。下面是通过 HTTP 接口获取指标的示例:

```bash
# 获取一个指标
curl -G "http://127.0.0.1:12000/get_stats?stats=vertex_props_qps.rate.60"
# vertex_props_qps.rate.60=2674
curl -G "http://127.0.0.1:12000/get_stats?stats=storage_vertex_props_qps.rate.60"
# storage_vertex_props_qps.rate.60=2674

# 同时获取多个指标
curl -G "http://127.0.0.1:12000/get_stats?stats=vertex_props_qps.rate.60,vertex_props_latency.avg.60"
# vertex_props_qps.rate.60=2638
# vertex_props_latency.avg.60=812
curl -G "http://127.0.0.1:12000/get_stats?stats=storage_vertex_props_qps.rate.60,storage_vertex_props_latency.avg.60"
# storage_vertex_props_qps.rate.60=2638
# storage_vertex_props_latency.avg.60=812

# 同时获取多个指标并以 json 格式返回
curl -G "http://127.0.0.1:12000/get_stats?stats=vertex_props_qps.rate.60,vertex_props_latency.avg.60&returnjson"
# [{"value":2723,"name":"vertex_props_qps.rate.60"},{"value":804,"name":"vertex_props_latency.avg.60"}]
curl -G "http://127.0.0.1:12000/get_stats?stats=storage_vertex_props_qps.rate.60,storage_vertex_props_latency.avg.60&returnjson"
# [{"value":2723,"name":"storage_vertex_props_qps.rate.60"},{"value":804,"name":"storage_vertex_props_latency.avg.60"}]

# 获取所有指标
curl -G "http://127.0.0.1:12000/get_stats?stats"
Expand Down
8 changes: 6 additions & 2 deletions docs/manual-CN/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,21 +92,25 @@
* [配置文件说明](3.build-develop-and-administration/3.deploy-and-administrations/deployment/configuration-description.md)
* [用 Docker 部署](3.build-develop-and-administration/3.deploy-and-administrations/deployment/deploy-cluster-on-docker.md)
* [部署集群](3.build-develop-and-administration/3.deploy-and-administrations/deployment/deploy-cluster.md)
* [连接 Prometheus](3.build-develop-and-administration/3.deploy-and-administrations/deployment/connect-prometheus.md)
* [接入 Prometheus](3.build-develop-and-administration/3.deploy-and-administrations/deployment/connect-prometheus.md)
* 服务器管理操作
* 账号管理
* [Drop User](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/account-management-statements/drop-user-syntax.md)
* 服务器配置
* [服务器配置](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/configuration-statements/configs-syntax.md)
* [RocksDB Compaction 和 Flush](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/configuration-statements/rocksdb-compaction-flush.md)
* [日志](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/configuration-statements/log.md)
* 计算服务相关运维
* [计算层运行统计 (metrics)](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/graph-service-administration/graph-metrics.md)
* meta 服务相关运维
* [meta 层运行统计 (metrics)](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/meta-service-administration/meta-metrics.md)
* 存储服务相关运维
* 离线数据加载
* [加载 .sst 文件](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/data-import/download-and-ingest-sst-file.md)
* [读取 .csv 文件](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/data-import/import-csv-file.md)
* [Spark 导入工具](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/data-import/spark-writer.md)
* [负载均衡和数据迁移](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/storage-balance.md)
* [存储层运行统计 (metric)](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/storage-metrics.md)
* [存储层运行统计 (metrics)](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/storage-metrics.md)
* [集群快照](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/cluster-snapshot.md)

## 社区贡献 (开源社区爱好者)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Graph Metrics

## Introduction

Currently, **Nebula Graph** supports obtaining the basic performance metric for the graph service via HTTP.

Each performance metrics consists of three parts, namely `<counter_name>.<statistic_type>.<time_range>`.

### Counter Names

Each counter name is composed of the interface name and the counter name. Currently, the supported interfaces are:

```cpp
graph_storageClient // Requests sent via storageClient, when sending requests to multiple storages concurrently, counted as one
graph_metaClient // Requests sent via metaClient
graph_graph_all // Requests sent by the client to the graph, when a request contains multiple queries, counted as one
graph_insertVertex // Insert a vertex
graph_insertEdge // Insert an edge
graph_deleteVertex // Delete a vertex
graph_deleteEdge // Delete an edge // Not supported yet
graph_updateVertex // Update properties of a vertex
graph_updateEdge // Update properties of an edge
graph_go // Execute the go command
graph_findPath // Find the shortest path or the full path
graph_fetchVertex // Fetch the vertex's properties. Only count the commands executed rather than the total number of fetched vertices.
graph_fetchEdge // Fetch the edge's properties. Only count the commands executed rather than the total number of fetched edges.
```

Each interface has three metrics, namely latency (in the units of us), QPS and QPS with errors. The suffixes are as follows:

```text
_latency
_qps
_error_qps
```

The complete metric concatenates the interface name with the corresponding metric, such as `graph_insertVertex_latency`, `graph_insertVertex_qps` and `graph_insertVertex_error_qps`, representing the latency of inserting a vertex, QPS and the QPS with errors, respectively.

### Statistics Type

Currently supported types are SUM, COUNT, AVG, RATE and P quantiles (P99, P999, ..., P999999). Among which:

- Metrics have suffixes `_qps` and `_error_qps` support SUM, COUNT, AVG, RATE but don't support P quantiles.
- Metrics have suffixes `_latency` support SUM, COUNT, AVG, RATE, and P quantiles.

### Time Range

Currently, the supported time ranges are 60s, 600s, and 3600s, which correspond to the last minute, the last ten minutes, and the last hour till now.

## Obtain the Corresponding Metrics via HTTP Interface

According to the above introduction, you can make a complete metrics name. Here are some examples:

```cpp
graph_insertVertex_latency.avg.60 // the average latency of successfully inserting a vertex in the last minute
graph_updateEdge_error_qps.count.3600 // total number of failures in updating an edge in the last hour
```

Assume that a graph service is started locally, and the `ws_http_port` port number is set to 13000 when starting. It is sent through the **GET** interface of HTTP. The method name is **get_stats**, and the parameter is stats plus the corresponding metrics name. Here's an example of getting metrics via the HTTP interface:

```bash
# obtain a metrics
curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60"
# graph_insertVertex_qps.rate.60=3069

# obtain multiple metrics at the same time
curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60, graph_deleteVertex_latency.avg.60"
# graph_insertVertex_qps.rate.60=3069
# graph_deleteVertex_latency.avg.60=837

# obtain multiple metrics at the same time and return in json format
curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60, graph_deleteVertex_latency.avg.60&returnjson"
# [{"value":2373,"name":"graph_insertVertex_qps.rate.60"},{"value":760,"name":"graph_deleteVertex_latency.avg.60"}]

# obtain all the metrics
curl -G "http://127.0.0.1:13000/get_stats?stats"
# or
curl -G "http://127.0.0.1:13000/get_stats"
```
Loading

0 comments on commit 686d6cf

Please sign in to comment.