Ceph Luminous monitoring #3387

jiribroulik · 2017-10-25T12:04:29Z

Although the following setup works for monitoring Ceph Jewel, In Luminous I do not have a lot of metrics visible.

I have the following setup:

[[inputs.ceph]]
gather_admin_socket_stats = true
gather_cluster_stats = true
gather_pool_loads = true

syslog logs: ERROR in input [inputs.ceph]: error parsing output: WARNING ceph - unable to decode deep health

In Grafana Health status says: warning. Mon nodes says N/A. No stats for Capacity section and pools.

In prometheus I do not see any ceph_pool_usage_objects and any pool related metrics. I also do not see any ceph_overall_health metric and ceph_service_service_health, etc.

If I put the following setup:

[[inputs.ceph]]
gather_admin_socket_stats = true
gather_cluster_stats = false
gather_pool_loads = true

syslog logs say: ERROR in input [inputs.ceph]: error running rados df: fork/exec : no such file or directory

Do you guys know how to fix it please?

danielnelson · 2017-10-25T22:26:17Z

I have heard one report that Luminous is working.

What is the full output of the following command:

ceph --conf <ceph_config> --name <ceph_user> --format json status

yoke88 · 2017-11-07T00:34:56Z

ceph status --format json output of Luminous

{
    "fsid": "7b4d7a07-6d7e-4626-8d84-3b681a1d71fc",
    "health": {
        "checks": {},
        "status": "HEALTH_OK",
        "overall_status": "HEALTH_WARN"
    },
    "election_epoch": 46,
    "quorum": [
        0,
        1,
        2
    ],
    "quorum_names": [
        "ceph1",
        "ceph2",
        "ceph3"
    ],
    "monmap": {
        "epoch": 1,
        "fsid": "7b4d7a07-6d7e-4626-8d84-3b681a1d71fc",
        "modified": "2017-10-31 16:07:31.113155",
        "created": "2017-10-31 16:07:31.113155",
        "features": {
            "persistent": [
                "kraken",
                "luminous"
            ],
            "optional": []
        },
        "mons": [
            {
                "rank": 0,
                "name": "ceph1",
                "addr": "192.168.235.130:6789/0",
                "public_addr": "192.168.235.130:6789/0"
            },
            {
                "rank": 1,
                "name": "ceph2",
                "addr": "192.168.235.136:6789/0",
                "public_addr": "192.168.235.136:6789/0"
            },
            {
                "rank": 2,
                "name": "ceph3",
                "addr": "192.168.235.191:6789/0",
                "public_addr": "192.168.235.191:6789/0"
            }
        ]
    },
    "osdmap": {
        "osdmap": {
            "epoch": 226,
            "num_osds": 24,
            "num_up_osds": 24,
            "num_in_osds": 24,
            "full": false,
            "nearfull": false,
            "num_remapped_pgs": 0
        }
    },
    "pgmap": {
        "pgs_by_state": [
            {
                "state_name": "active+clean",
                "count": 256
            }
        ],
        "num_pgs": 256,
        "num_pools": 1,
        "num_objects": 33,
        "data_bytes": 21733045,
        "bytes_used": 27678130176,
        "bytes_avail": 3492468080640,
        "bytes_total": 3520146210816
    },
    "fsmap": {
        "epoch": 1,
        "by_rank": []
    },
    "mgrmap": {
        "epoch": 21,
        "active_gid": 4378,
        "active_name": "ceph3",
        "active_addr": "192.168.235.191:6824/1061",
        "available": true,
        "standbys": [
            {
                "gid": 4651,
                "name": "ceph1",
                "available_modules": [
                    "dashboard",
                    "prometheus",
                    "restful",
                    "status",
                    "zabbix"
                ]
            },
            {
                "gid": 54392,
                "name": "ceph2",
                "available_modules": [
                    "dashboard",
                    "prometheus",
                    "restful",
                    "status",
                    "zabbix"
                ]
            }
        ],
        "modules": [
            "restful",
            "status"
        ],
        "available_modules": [
            "dashboard",
            "prometheus",
            "restful",
            "status",
            "zabbix"
        ]
    },
    "servicemap": {
        "epoch": 1,
        "modified": "0.000000",
        "services": {}
    }
}

jiribroulik · 2017-11-08T12:15:08Z

This is the ceph status:

If I deploy Jewel, everything works fine. But if I upgrade or redeploy Luminous version. I do not see any pool stats and ceph_service_service_health metric and also the ceph_overall_health metric is showing incorrectly number 2. And should be health_ok which is 1.

{"fsid":"1fc6c029-51cb-4176-adcb-c1dd54cef41c","health":{"checks":{},"status":"HEALTH_OK","overall_status":"HEALTH_WARN"},"election_epoch":24,"quorum":[0,1,2],"quorum_names":["cmn01","cmn02","cmn03"],"monmap":{"epoch":3,"fsid":"1fc6c029-51cb-4176-adcb-c1dd54cef41c","modified":"2017-11-08 11:18:50.811518","created":"2017-11-08 10:51:09.259525","features":{"persistent":["kraken","luminous"],"optional":[]},"mons":[{"rank":0,"name":"cmn01","addr":"172.16.47.147:6789/0","public_addr":"172.16.47.147:6789/0"},{"rank":1,"name":"cmn02","addr":"172.16.47.148:6789/0","public_addr":"172.16.47.148:6789/0"},{"rank":2,"name":"cmn03","addr":"172.16.47.149:6789/0","public_addr":"172.16.47.149:6789/0"}]},"osdmap":{"osdmap":{"epoch":81,"num_osds":8,"num_up_osds":8,"num_in_osds":8,"full":false,"nearfull":false,"num_remapped_pgs":0}},"pgmap":{"pgs_by_state":[{"state_name":"active+clean","count":480}],"num_pgs":480,"num_pools":14,"num_objects":275,"data_bytes":106306773,"bytes_used":659144704,"bytes_avail":9508442161152,"bytes_total":9509101305856},"fsmap":{"epoch":1,"by_rank":[]},"mgrmap":{"epoch":25,"active_gid":114221,"active_name":"cmn01","active_addr":"172.16.47.147:6800/3938","available":true,"standbys":[{"gid":104233,"name":"cmn02","available_modules":["dashboard","prometheus","restful","status","zabbix"]},{"gid":104251,"name":"cmn03","available_modules":["dashboard","prometheus","restful","status","zabbix"]}],"modules":["dashboard"],"available_modules":["dashboard","prometheus","restful","status","zabbix"]},"servicemap":{"epoch":3,"modified":"2017-11-08 11:28:43.487739","services":{"rgw":{"daemons":{"summary":"","rgw01":{"start_epoch":2,"start_stamp":"2017-11-08 11:28:20.885032","gid":114314,"addr":"172.16.47.159:0/4011815907","metadata":{"arch":"x86_64","ceph_version":"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)","cpu":"QEMU Virtual CPU version 2.5+","distro":"ubuntu","distro_description":"Ubuntu 16.04.2 LTS","distro_version":"16.04","frontend_config#0":"civetweb port=172.16.47.159:8080 num_threads=50","frontend_type#0":"civetweb","hostname":"rgw01","kernel_description":"#4416.04.1-Ubuntu SMP Fri Mar 3 17:11:16 UTC 2017","kernel_version":"4.8.0-41-generic","mem_swap_kb":"0","mem_total_kb":"16431428","num_handles":"1","os":"Linux","pid":"23506","zone_id":"abead38d-290d-47d9-bc60-1365b1cd4c93","zone_name":"default","zonegroup_id":"de836aff-2386-4797-94b6-83e31a7ab0fe","zonegroup_name":"default"}},"rgw02":{"start_epoch":3,"start_stamp":"2017-11-08 11:28:43.014412","gid":114320,"addr":"172.16.47.160:0/436073520","metadata":{"arch":"x86_64","ceph_version":"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)","cpu":"QEMU Virtual CPU version 2.4.0","distro":"ubuntu","distro_description":"Ubuntu 16.04.2 LTS","distro_version":"16.04","frontend_config#0":"civetweb port=172.16.47.160:8080 num_threads=50","frontend_type#0":"civetweb","hostname":"rgw02","kernel_description":"#4416.04.1-Ubuntu SMP Fri Mar 3 17:11:16 UTC 2017","kernel_version":"4.8.0-41-generic","mem_swap_kb":"0","mem_total_kb":"16431444","num_handles":"1","os":"Linux","pid":"23133","zone_id":"abead38d-290d-47d9-bc60-1365b1cd4c93","zone_name":"default","zonegroup_id":"de836aff-2386-4797-94b6-83e31a7ab0fe","zonegroup_name":"default"}}}}}}}

jiribroulik · 2017-11-08T16:16:41Z

package https://dl.influxdata.com/telegraf/nightlies/telegraf_nightly_amd64.deb
works fine for ceph pools but it does not return ceph_service_service_health and ceph_overall_health

ppetit · 2018-08-08T15:48:56Z

Hello,
I am facing the same problem. When I picked up this plugin I strongly expected that at a minimum I would get overall Ceph health metrics but apparently it doesn't 👎 The monmap would be interesting too. Is that something on the contributor's todo list? Thanks.

amarao · 2018-11-05T13:45:23Z

Any news for update for stable? 1.8.3 still misses pool_stats for Luminous.

glinton · 2019-02-21T16:48:16Z

Regarding ceph_service_service_health and ceph_overall_health, what are you expecting to be populating those fields?

If you are missing ceph_pool_stats, can you run:

ceph --conf <ceph_config> --name <ceph_user> --format json osd pool stats

And verify it's not just an empty array []. If it is not and you are not getting a ceph_pool_stats measurement, please paste the output here, thanks.

danielnelson added the area/ceph label Nov 6, 2017

danielnelson mentioned this issue Nov 6, 2017

ceph influxdb output no ceph_pool_stat #3431

Closed

danielnelson mentioned this issue Jan 10, 2019

ceph input plugin cluster metrics broken on Luminous #5277

Closed

This was referenced Feb 21, 2019

Add alternative fieldnames for ceph input #5466

Merged

Feature/3387 #5482

Merged

danielnelson closed this as completed in #5466 Feb 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ceph Luminous monitoring #3387

Ceph Luminous monitoring #3387

jiribroulik commented Oct 25, 2017 •

edited

Loading

danielnelson commented Oct 25, 2017

yoke88 commented Nov 7, 2017

jiribroulik commented Nov 8, 2017

jiribroulik commented Nov 8, 2017

ppetit commented Aug 8, 2018

amarao commented Nov 5, 2018

glinton commented Feb 21, 2019

Ceph Luminous monitoring #3387

Ceph Luminous monitoring #3387

Comments

jiribroulik commented Oct 25, 2017 • edited Loading

danielnelson commented Oct 25, 2017

yoke88 commented Nov 7, 2017

jiribroulik commented Nov 8, 2017

jiribroulik commented Nov 8, 2017

ppetit commented Aug 8, 2018

amarao commented Nov 5, 2018

glinton commented Feb 21, 2019

jiribroulik commented Oct 25, 2017 •

edited

Loading