change NGINX world from metrics to insight
The first goal of Mio
is to provide powerful API statistics and summary for NGINX.
Metrics is just base, the final goal is automatic improve the user's NGINX system with the power of data.
install The latest OpenResty version.
If your OpenResty version above 1.11.2.1, pelase use develop
branch instead of master
branch. Because OpenResty added the optional "init" argument for shdict:incr()
in 1.11.2.1, which can reduce a lot of if
conditions and improve performance.
Please remember add --with-http_stub_status_module
configuration parameter when run ./configure
.
By default, OpenResty is installed into the prefix /usr/local/openresty/
.
download Mio to your application directories, then run like this:
sudo openresty -p /opt/my-fancy-app/
If you build OpenResty from source code, maybe you can not find openresty
,which is symbolic link to OpenResty's nginx executable file, /usr/local/openresty/nginx/sbin/nginx. So you can run like this:
sudo /usr/local/openresty/nginx/sbin/nginx -p /opt/my-fancy-app/
you can test Mio like this:
- The port
80
is designed for your own API.
curl -i http://127.0.0.1/
hello! this is Mio.
- The port
9090
is designed for Mio's statistics and summary API.
curl -i http://127.0.0.1:9090/summary
{"/hello":{"total":1,"4xx":1,"sent":314,"request_time":0}}
curl -i http://127.0.0.1:9090/status
{"load_timestamp":1470384389,"requests":{"current":0,"total":2,"success":1},"worker_count":2,"address":"127.0.0.1:80","ngx_lua_version":"0.10.5","server_zones":[],"nginx_version":"1.9.15","connections":{"active":1,"writing":1,"current":1,"idle":0,"reading":0},"timestamp":1470384409,"generation":0,"upstreams":[]}
- The port
8080
is designed for Mio's dashboard.
Open your browser with http://127.0.0.1:8080
, you will you'll see a monitor page similar to NGINX plus. And this monitor page is preview version, we are working for it.
Congratulations,Mio
is running!
If you run failed, please create a new issue, I will fix it ASAP.
- use shared dict incr() method
- add beautiful UI
The /status
and /summary
APIs are 100% compatible with NGINX Plus.
curl http://127.0.0.1/status
NGINX Plus 的统计模块数据格式和说明文档在这里。NGINX Plus 的json 大块数据为:
{
"version": 6,
"nginx_version": "1.9.13",
"address": "206.251.255.64",
"generation": 21,
"load_timestamp": 1462615200247,
"timestamp": 1462870443024,
"pid": 24978,
"processes": {},
"connections": {},
"ssl": {},
"requests": {},
"server_zones": {},
"upstreams": {},
"caches": {},
"stream": {}
}
注意下面数据的缩进,缩进代表json数据的组织。
比如
- server_zones - hg.nginx.org - processing - trac.nginx.org - responses - 1xx
代表的json格式为:
"server_zones":{
"processing":0,
"requests":71639,
"responses":{
"1xx":0,
"2xx":66973,
"3xx":3289,
"4xx":941,
"5xx":264,
"total":71467
},
"discarded":172,
"received":21575699,
"sent":2652969417
},
-
version
json格式数据集合的版本号,现在为1。为了兼容性设计
-
nginx_version
NGINX 版本号
-
ngx_lua_version
nginx lua 版本号
-
address
接收 status 请求的服务器地址
-
generation
NGINX 重新加载的次数。不是stop-start的模式,而是reload
-
start_timestamp
NGINX 上次 reload 时的时间戳(ms)
-
timestamp
当前时间戳(ms)。timestamp 和 start_timestamp 之间的差值,就是NGINX 的 uptime
-
worker_count
NGINX worker数
-
connections
lua 模块获取不到,通过这个c模块获取
-
accepted
曾经接收到的所有终端连接总数(TODO:暂时获取不到)
-
dropped
drop 掉的所有终端连接总数(TODO:暂时获取不到)
-
current
当前所有连接数,包括读、写和空闲
-
active
当前活跃的终端连接数,不包括空闲连接数
-
idle
当前空闲的终端连接数。avtive 和 idle 的和,就是 Current。
-
writing
当前 NGINX 正在 write response 给终端的连接数
-
reading
当前 NGINX 正在 read 终端请求头的连接数
-
-
requests
-
total
NGINX 处理的终端请求总数。自从上一次 stop-start 开始计数,reload 不会影响这个数字。
-
qps
每秒请求数。
-
success
NGINX 处理成功的请求总数。内部是通过http应答码小于400来判断的。
-
current
正在处理的终端请求数(TODO:意义不大,先不做实现)
-
-
server_zones
- server_zone(这个是 用户自定义 的字符串,不是关键字)
-
processing(TODO: 需要 access 阶段配合,稍后完成)
(The number of client requests that are currently being processed.)
-
requests
(The total number of client requests received from clients.)
-
discarded
(The total number of requests completed without sending a response.)
-
received
(The total number of bytes received from clients.)
-
receive_per_second
每秒接收到的数据大小,单位是 kb。两次received的差值除以秒数
-
sent
(The total number of bytes sent to clients.)
-
send_per_second
每秒发送的数据大小,单位是 kb。两次sent的差值除以秒数
-
responses
- total
The total number of responses sent to clients. - 1xx, 2xx, 3xx, 4xx, 5xx
The number of responses with status codes 1xx, 2xx, 3xx, 4xx, and 5xx.
- total
-
- server_zone(这个是 用户自定义 的字符串,不是关键字)
-
upstreams
- upstream_name(这个是 用户自定义 的字符串,不是关键字)
- peers
-
id
server 的 ID
-
server
server 的 IP:port
-
backup
布尔值。标记是否为 backup server
-
weight
这个 server 的权重
-
state(TODO:逻辑比较复杂,稍后完成)
当前健康状态。值为 “up”, “draining”, “down”, “unavail”, “unhealthy” 中的一个
-
active
当前活跃连接数
-
max_conns (TODO:暂时获取不到)
这个 server 的最大连接数限制
-
requests
经过这个 server 的所有终端请求总数
-
qps
每秒请求数
-
response
-
total
这个 server 返回响应的总数
-
1xx, 2xx, 3xx, 4xx, 5xx
每个 response 状态码的总数
-
-
sent
发送给这个 server 的字节总数。
-
send_per_second
通过 sent 两次的差值计算出每秒的速率并显示
-
received
这个 server 接收到的字节总数。
-
receive_per_second
通过 received 两次的差值计算出每秒的速率并显示
-
fails 尝试和这个 server 通信,没有成功的总次数
-
unavail
这个 server 变为『unavail』状态的次数。变为unavail是因为 fails 超过了 max_fails 定义的阈值
-
health_checks( TODO:整个这一项都拿不到,NGINX 没有暴露出来)
-
checks
已经发送的health check 的次数
-
fails
健康检查失败的次数
-
unhealthy
这个 server 变为 『unhealthy』状态的次数
-
last_passed
布尔值。上一次健康检查是否成功,并且通过了match测试
-
-
latency
-
mean
server 总的平均响应时间,单位是 ms
-
per_minute_mean
1分钟内的平均响应时间,单位是 ms
-
per_minute_min
1分钟内的最小响应时间,单位是 ms
-
per_minute_max
1分钟内的最大响应时间,单位是 ms
-
-
- peers
- upstream_name(这个是 用户自定义 的字符串,不是关键字)
出于性能考虑,summary 的统计数据会先放在 lru cache 中,由 timer 定时同步到 shared dict 中。 summary 接口的返回值格式为 json,示例:
day:{
"/hello":{
"total":123, -- 接口访问总数
"avg_time":0.008, -- 平均返回时间
"avg_size":10, -- 返回值的平均 body 大小
"1xx":0,
"2xx":100982,
"3xx":0,
"4xx":222,
"5xx":112
},
"/status":{}
}
- 当天的 summary 获取:curl http://127.0.0.1/summary
- 昨天和前天的历史 summary 获取:curl http://127.0.0.1/summary_history
- 最近一分钟的 summary 获取:curl http://127.0.0.1/summary_one_minute