[doc]metrics (vesoft-inc#1406)

* metrics * comments * wilson
kangshaojie · Dec 16, 2019 · 686d6cf · 686d6cf
1 parent c2f1836
commit 686d6cf
Show file tree

Hide file tree

Showing 8 changed files with 355 additions and 63 deletions.
diff --git a/...nistrations/server-administration/graph-service-administration/graph-metrics.md b/...nistrations/server-administration/graph-service-administration/graph-metrics.md
@@ -0,0 +1,82 @@
+# Graph Metrics
+
+## 介绍
+
+目前，**Nebula Graph** 支持通过 HTTP 方式来获取 Graph Service 层的基本性能指标。
+
+每一个性能指标都由三部分组成，分别为指标名、统计类型、时间范围。
+
+| counter\_name | statistic\_type | time_range |
+| ----  |  ----|-------|
+
+### 指标名
+
+每个指标名都由服务名加模块名构成，目前支持获取如下接口：
+
+```cpp
+通过 storageClient 发送的请求，需要同时向多个 storage 并发多条消息时，按一次统计  graph_storageClient
+通过 metaClient 发送的请求
+graph_graph_all 客户端向 graph 发送的请求，当一条请求包含多条语句时，按一条计算 graph_metaClient
+插入点 graph_insertVertex
+插入边 graph_insertEdge
+删除点 graph_deleteVertex
+删除边 graph_deleteEdge //未支持
+更新点的属性 graph_updateVertex
+更新边的属性 graph_updateEdge
+执行 go 命令 graph_go
+查找最小路径或者全路径 graph_findPath
+获取点属性，不统计获取点的总数，只统计执行命令的数量 graph_fetchVertex
+获取边属性，不统计边的总数，只统计执行命令的数量 graph_fetchEdge
+```
+
+每一个接口都有三个性能指标，分别为延迟(单位为 us)、成功的 QPS、发生错误的 QPS，后缀名如下：
+
+```text
+_latency
+_qps
+_error_qps
+```
+
+将接口名和相应指标连接在一起即可获得完整的指标名，例如 `graph_insertVertex_latency`、`graph_insertVertex_qps`、`graph_insertVertex_error_qps`、分别代表插入一个点的延迟、QPS 和发生错误的 QPS。
+
+### 统计类型
+
+目前支持的统计类型有 SUM、COUNT、AVG、RATE 和 P 分位数 (P99，P999， ... ，P999999)。其中：
+
+- `_qps`、`_error_qps` 后缀的指标，支持 SUM、COUNT、AVG、RATE，但不支持 P 分位；
+- `_latency` 后缀的指标，支持 SUM、COUNT、AVG、RATE，也支持 P 分位。
+
+### 时间范围
+
+时间范围目前只支持三种，分别为 60、600、3600，分别表示最近一分钟，最近十分钟和最近一小时。
+
+## 通过 HTTP 接口获取相应的性能指标
+
+根据上面的介绍，就可以写出一个完整的指标名称了，下面是一些示例：
+
+```cpp
+graph_insertVertex_latency.avg.60        // 最近一分钟插入点命令执行成功的平均延时
+graph_updateEdge_error_qps.count.3600   // 最近一小时更新边命令失败的总计数量
+```
+
+假设本地启动了一个 nebula graph service，同时启动时设置的 `ws_http_port` 端口号为 13000。通过 HTTP 的 **GET** 接口发送，方法名为 **get_stats**，参数为 stats 加对应的指标名字。下面是通过 HTTP 接口获取指标的示例：
+
+```bash
+# 获取一个指标
+curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60"
+# graph_insertVertex_qps.rate.60=3069
+
+# 同时获取多个指标
+curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60, graph_deleteVertex_latency.avg.60"
+# graph_insertVertex_qps.rate.60=3069
+# graph_deleteVertex_latency.avg.60=837
+
+# 同时获取多个指标并以 json 格式返回
+curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60, graph_deleteVertex_latency.avg.60&returnjson"
+# [{"value":2373,"name":"graph_insertVertex_qps.rate.60"},{"value":760,"name":"graph_deleteVertex_latency.avg.60"}]
+
+# 获取所有指标
+curl -G "http://127.0.0.1:13000/get_stats?stats"
+# 或
+curl -G "http://127.0.0.1:13000/get_stats"
+```
diff --git a/...ministrations/server-administration/meta-service-administration/meta-metrics.md b/...ministrations/server-administration/meta-service-administration/meta-metrics.md
@@ -0,0 +1,63 @@
+# Meta Metrics
+
+## 介绍
+
+目前，**Nebula Graph** 支持通过 HTTP 方式来获取 Meta Service 层的基本性能指标。
+
+每一个性能指标都由三部分组成，分别为指标名、统计类型、时间范围。
+
+| counter\_name | statistic\_type | time_range |
+| ----  |  ----|-------|
+
+### 指标名
+
+每个指标名都由服务名加模块名构成，meta 只统计心跳信息，目前支持获取如下接口：
+
+```text
+meta_heartbeat_qps
+meta_heartbeat_error_qps
+meta_heartbeat_latency
+```
+
+### 统计类型
+
+目前支持的统计类型有 SUM、COUNT、AVG、RATE 和 P 分位数 (P99，P999， ... ，P999999)。其中：
+
+- `_qps`、`_error_qps` 后缀的指标，支持 SUM、COUNT、AVG、RATE，但不支持 P 分位；
+- `_latency` 后缀的指标，支持 SUM、COUNT、AVG、RATE，也支持 P 分位。
+
+### 时间范围
+
+时间范围目前只支持三种，分别为 60、600、3600，分别表示最近一分钟，最近十分钟和最近一小时。
+
+## 通过 HTTP 接口获取相应的性能指标
+
+下面是一些示例：
+
+```cpp
+meta_heartbeat_qps.avg.60         // 最近一分钟心跳的平均 QPS
+meta_heartbeat_error_qps.count.60   // 最近一分钟心跳的平均错误总计数量
+meta_heartbeat_latency.avg.60     // 最近一分钟心中的平均延时
+```
+
+假设本地启动了一个 nebula meta service，同时启动时设置的 `ws_http_port` 端口号为 11000。通过 HTTP 的 **GET** 接口发送，方法名为 **get_stats**，参数为 stats 加对应的指标名字。下面是通过 HTTP 接口获取指标的示例：
+
+```bash
+# 获取一个指标
+curl -G "http://127.0.0.1:11000/get_stats?stats=meta_heartbeat_qps.avg.60"
+# meta_heartbeat_qps.avg.60=580
+
+# 同时获取多个指标
+curl -G "http://127.0.0.1:11000/get_stats?stats=meta_heartbeat_qps.avg.60,meta_heartbeat_error_qps.avg.60"
+# meta_heartbeat_qps.avg.60=537
+# meta_heartbeat_error_qps.avg.60=579
+
+# 同时获取多个指标并以 json 格式返回
+curl -G "http://127.0.0.1:11000/get_stats?stats=meta_heartbeat_qps.avg.60,meta_heartbeat_error_qps.avg.60&returnjson"
+# [{"value":533,"name":"meta_heartbeat_qps.avg.60"},{"value":574,"name":"meta_heartbeat_error_qps.avg.60"}]
+
+# 获取所有指标
+curl -G "http://127.0.0.1:11000/get_stats?stats"
+# 或
+curl -G "http://127.0.0.1:11000/get_stats"
+```
diff --git a/...rations/server-administration/storage-service-administration/storage-metrics.md b/...rations/server-administration/storage-service-administration/storage-metrics.md
@@ -1,10 +1,10 @@
-# Nebula Graph  Storage Metrics
+# Storage Metrics
 
 ## 介绍
 
-目前，**Nebula Graph** 支持通过 HTTP 方式来获取 Storage Service 层操作的一些基本性能指标。
+目前，**Nebula Graph** 支持通过 HTTP 方式来获取 Storage Service 层的基本性能指标。
 
-每一个性能指标都由三部分组成，分别为指标名，统计类型，时间范围。
+每一个性能指标都由三部分组成，分别为指标名、统计类型、时间范围。
 
 | counter\_name | statistic\_type | time_range |
 | ----  |  ----|-------|
@@ -13,67 +13,67 @@
 
 ### 指标名
 
-每个指标名都由接口名加指标名构成，目前支持获取如下接口
+每个指标名都由服务名加模块名构成，目前支持获取如下接口：
 
 ```text
-获取点的属性 vertex_props
-获取边的属性 edge_props
-插入一个点 add_vertex
-插入一条边 add_edge
-删除一个点 del_vertex
-更新一个点的属性 update_vertex
-更新一条边的属性 update_edge
-读取一个键值对 get_kv
-写入一个键值对 put_kv
-仅限内部使用 get_bound
+获取点的属性 storage_vertex_props
+获取边的属性 storage_edge_props
+插入一个点 storage_add_vertex
+插入一条边 storage_add_edge
+删除一个点 storage_del_vertex
+更新一个点的属性 storage_update_vertex
+更新一条边的属性 storage_update_edge
+读取一个键值对 storage_get_kv
+写入一个键值对 storage_put_kv
+仅限内部使用 storage_get_bound
 ```
 
-每一个接口都有三个性能指标，分别为延迟(单位为 us)、QPS、发生错误的 QPS，后缀名如下：
+每一个接口都有三个性能指标，分别为延迟(单位为 us)、成功的 QPS、发生错误的 QPS，后缀名如下：
 
 ```text
 _latency
 _qps
 _error_qps
 ```
 
-将接口名和相应指标连接在一起即可获得完整的指标名，例如 `add_vertex_latency`，`add_vertex_qps`，`add_vertex_error_qps` 分别代表插入一个点的延迟、QPS 和发生错误的 QPS。
+将接口名和相应指标连接在一起即可获得完整的指标名，例如 `storage_add_vertex_latency`、`storage_add_vertex_qps`、`storage_add_vertex_error_qps` 分别代表插入一个点的延迟、QPS 和发生错误的 QPS。
 
 ### 统计类型
 
-目前支持的统计类型有 SUM，COUNT，AVG，RATE，和 P 分位数 (P99，P999， ... ，P999999)。其中：
+目前支持的统计类型有 SUM、COUNT、AVG、RATE 和 P 分位数 (P99，P999， ... ，P999999)。其中：
 
-- `_latency` 和 `_error_qps` 这两类后缀的指标，支持 SUM，COUNT，AVG，RATE，但不支持 P 分位；
-- `_qps` 后缀的指标，支持 SUM，COUNT，AVG，RATE，也支持 P 分位。
+- `_qps`、`_error_qps` 后缀的指标，支持 SUM、COUNT、AVG、RATE，但不支持 P 分位；
+- `_latency` 后缀的指标，支持 SUM、COUNT、AVG、RATE，也支持 P 分位。
 
 ### 时间范围
 
-时间范围目前只支持三种，分别为 60，600，3600，分别表示最近一分钟，最近十分钟和最近一小时。
+时间范围目前只支持三种，分别为 60、600、3600，分别表示最近一分钟、最近十分钟和最近一小时。
 
 ## 通过 HTTP 接口获取相应的性能指标
 
 根据上面的介绍，就可以写出一个完整的指标名称了，下面是一些示例：
 
-```text
-add_vertex_latency.avg.60        # 最近一分钟插入一个点的平均延时
-get_bound_qps.rate.600        # 最近十分钟获取邻点的 QPS
-update_edge_error_qps.count.3600   # 最近一小时更新一条边发生错误的总计数量
+```cpp
+storage_add_vertex_latency.avg.60        // 最近一分钟插入一个点的平均延时
+storage_get_bound_qps.rate.600        // 最近十分钟获取邻点的 QPS
+storage_update_edge_error_qps.count.3600   // 最近一小时更新一条边发生错误的总计数量
 ```
 
-假设本地启动了一个 nebula storage service，同时启动时设置的 `ws_http_port` 端口号为 50005。通过 HTTP 的 GET 接口发送，方法名为 get_stats，参数为 stats 加对应的指标名字。下面是通过 HTTP 接口获取指标的示例：
+假设本地启动了一个 nebula storage service，同时启动时设置的 `ws_http_port` 端口号为 12000。通过 HTTP 的 **GET** 接口发送，方法名为 **get_stats**，参数为 stats 加对应的指标名字。下面是通过 HTTP 接口获取指标的示例：
 
 ```bash
 # 获取一个指标
-curl -G "http://127.0.0.1:12000/get_stats?stats=vertex_props_qps.rate.60"
-# vertex_props_qps.rate.60=2674
+curl -G "http://127.0.0.1:12000/get_stats?stats=storage_vertex_props_qps.rate.60"
+# storage_vertex_props_qps.rate.60=2674
 
 # 同时获取多个指标
-curl -G "http://127.0.0.1:12000/get_stats?stats=vertex_props_qps.rate.60,vertex_props_latency.avg.60"
-# vertex_props_qps.rate.60=2638
-# vertex_props_latency.avg.60=812
+curl -G "http://127.0.0.1:12000/get_stats?stats=storage_vertex_props_qps.rate.60,storage_vertex_props_latency.avg.60"
+# storage_vertex_props_qps.rate.60=2638
+# storage_vertex_props_latency.avg.60=812
 
 # 同时获取多个指标并以 json 格式返回
-curl -G "http://127.0.0.1:12000/get_stats?stats=vertex_props_qps.rate.60,vertex_props_latency.avg.60&returnjson"
-# [{"value":2723,"name":"vertex_props_qps.rate.60"},{"value":804,"name":"vertex_props_latency.avg.60"}]
+curl -G "http://127.0.0.1:12000/get_stats?stats=storage_vertex_props_qps.rate.60,storage_vertex_props_latency.avg.60&returnjson"
+# [{"value":2723,"name":"storage_vertex_props_qps.rate.60"},{"value":804,"name":"storage_vertex_props_latency.avg.60"}]
 
 # 获取所有指标
 curl -G "http://127.0.0.1:12000/get_stats?stats"

diff --git a/docs/manual-CN/README.md b/docs/manual-CN/README.md
@@ -92,21 +92,25 @@
     * [配置文件说明](3.build-develop-and-administration/3.deploy-and-administrations/deployment/configuration-description.md)
     * [用 Docker 部署](3.build-develop-and-administration/3.deploy-and-administrations/deployment/deploy-cluster-on-docker.md)
     * [部署集群](3.build-develop-and-administration/3.deploy-and-administrations/deployment/deploy-cluster.md)
-    * [连接 Prometheus](3.build-develop-and-administration/3.deploy-and-administrations/deployment/connect-prometheus.md)
+    * [接入 Prometheus](3.build-develop-and-administration/3.deploy-and-administrations/deployment/connect-prometheus.md)
   * 服务器管理操作
     * 账号管理
       * [Drop User](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/account-management-statements/drop-user-syntax.md)
     * 服务器配置
       * [服务器配置](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/configuration-statements/configs-syntax.md)
       * [RocksDB Compaction 和 Flush](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/configuration-statements/rocksdb-compaction-flush.md)
       * [日志](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/configuration-statements/log.md)
+    * 计算服务相关运维
+      * [计算层运行统计 (metrics)](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/graph-service-administration/graph-metrics.md)
+    * meta 服务相关运维
+      * [meta 层运行统计 (metrics)](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/meta-service-administration/meta-metrics.md)
     * 存储服务相关运维
       * 离线数据加载
         * [加载 .sst 文件](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/data-import/download-and-ingest-sst-file.md)
         * [读取 .csv 文件](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/data-import/import-csv-file.md)
         * [Spark 导入工具](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/data-import/spark-writer.md)
       * [负载均衡和数据迁移](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/storage-balance.md)
-      * [存储层运行统计 (metric)](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/storage-metrics.md)
+      * [存储层运行统计 (metrics)](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/storage-metrics.md)
       * [集群快照](3.build-develop-and-administration/3.deploy-and-administrations/server-administration/storage-service-administration/cluster-snapshot.md)
 
 ## 社区贡献 (开源社区爱好者)

diff --git a/...nistrations/server-administration/graph-service-administration/graph-metrics.md b/...nistrations/server-administration/graph-service-administration/graph-metrics.md
@@ -0,0 +1,79 @@
+# Graph Metrics
+
+## Introduction
+
+Currently, **Nebula Graph** supports obtaining the basic performance metric for the graph service via HTTP.
+
+Each performance metrics consists of three parts, namely `<counter_name>.<statistic_type>.<time_range>`.
+
+### Counter Names
+
+Each counter name is composed of the interface name and the counter name. Currently, the supported interfaces are:
+
+```cpp
+graph_storageClient // Requests sent via storageClient, when sending requests to multiple storages concurrently, counted as one
+graph_metaClient // Requests sent via metaClient
+graph_graph_all // Requests sent by the client to the graph, when a request contains multiple queries, counted as one
+graph_insertVertex // Insert a vertex
+graph_insertEdge // Insert an edge
+graph_deleteVertex // Delete a vertex
+graph_deleteEdge // Delete an edge // Not supported yet
+graph_updateVertex // Update properties of a vertex
+graph_updateEdge // Update properties of an edge
+graph_go // Execute the go command
+graph_findPath // Find the shortest path or the full path
+graph_fetchVertex // Fetch the vertex's properties. Only count the commands executed rather than the total number of fetched vertices.
+graph_fetchEdge // Fetch the edge's properties. Only count the commands executed rather than the total number of fetched edges.
+```
+
+Each interface has three metrics, namely latency (in the units of us), QPS and QPS with errors. The suffixes are as follows:
+
+```text
+_latency
+_qps
+_error_qps
+```
+
+The complete metric concatenates the interface name with the corresponding metric, such as `graph_insertVertex_latency`, `graph_insertVertex_qps` and `graph_insertVertex_error_qps`, representing the latency of inserting a vertex, QPS and the QPS with errors, respectively.
+
+### Statistics Type
+
+Currently supported types are SUM, COUNT, AVG, RATE and P quantiles (P99, P999, ..., P999999). Among which:
+
+- Metrics have suffixes `_qps` and `_error_qps` support SUM, COUNT, AVG, RATE but don't support P quantiles.
+- Metrics have suffixes `_latency` support SUM, COUNT, AVG, RATE, and P quantiles.
+
+### Time Range
+
+Currently, the supported time ranges are 60s, 600s, and 3600s, which correspond to the last minute, the last ten minutes, and the last hour till now.
+
+## Obtain the Corresponding Metrics via HTTP Interface
+
+According to the above introduction, you can make a complete metrics name. Here are some examples:
+
+```cpp
+graph_insertVertex_latency.avg.60   // the average latency of successfully inserting a vertex in the last minute
+graph_updateEdge_error_qps.count.3600  // total number of failures in updating an edge in the last hour
+```
+
+Assume that a graph service is started locally, and the `ws_http_port` port number is set to 13000 when starting. It is sent through the **GET** interface of HTTP. The method name is **get_stats**, and the parameter is stats plus the corresponding metrics name. Here's an example of getting metrics via the HTTP interface:
+
+```bash
+# obtain a metrics
+curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60"
+# graph_insertVertex_qps.rate.60=3069
+
+# obtain multiple metrics at the same time
+curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60, graph_deleteVertex_latency.avg.60"
+# graph_insertVertex_qps.rate.60=3069
+# graph_deleteVertex_latency.avg.60=837
+
+# obtain multiple metrics at the same time and return in json format
+curl -G "http://127.0.0.1:13000/get_stats?stats=graph_insertVertex_qps.rate.60, graph_deleteVertex_latency.avg.60&returnjson"
+# [{"value":2373,"name":"graph_insertVertex_qps.rate.60"},{"value":760,"name":"graph_deleteVertex_latency.avg.60"}]
+
+# obtain all the metrics
+curl -G "http://127.0.0.1:13000/get_stats?stats"
+# or
+curl -G "http://127.0.0.1:13000/get_stats"
+```