Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
299 changes: 299 additions & 0 deletions specs/sosreport_integration.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@
// vim: tw=79

= SOS Report integration

The propose of this file is to identify data-points and techniques for
SOS Report integration with Tendrl components.

== Problem description
The current SOS report is not having any feature or plugin to analyze Tendrl components. Since the Tendrl components function as individual services, these components can be analysed with the help of SOS report in case of failures.

== Use Cases

One way to integrate SOS Report with Tendrl is creating plugin for each Tendrl component.

* Usually SOS Report is run by an admin on each node to get the report. Assuming a multi-cluster environment with large number of nodes, two types of situations can come into picture here :

** One or a few nodes fail and admin runs SOS-Report on them.

** There is a multi-node failure. Will it feasible to let admin run SOS Report on all of the failed nodes?

* Using policies in SOS Report it is decided how it will behave on a particular distribution. It has to be decided for which distributions the policies have to be written.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need suggestions on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If N number of nodes fail, the admin has to run sosreport all the nodes.


== Proposed change

For different Tendrl services their respective plugins have to be written.

* Following are the data-points which can be used for the plugins:

** Tendrl-node-agent
*** Rpm versions for common and node-agent
*** If tendrl-tendrl-epel-7.repo is enabled
*** Configurations in /etc/tendrl/node-node/
*** Status of tendrl-node-agent.socket service
*** SELinux configurations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*** Firewall status and configurations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*** Package requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should figure out some way to get logs outta etcd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way this can be done is shipping a small script with tendrl (or as a part of the plugin itself )which can be run by the sos-report and this can be used to capture those logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can start by testing the existing etcd plugin for sosreport https://github.com/sosreport/sos/blob/master/sos/plugins/etcd.py

I can see in that plugin they arent collecting the actual data store, but I guess we can add that part in the tendrl specific sos report plugin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which all directories are we looking to get out of etcd ? @r0h4n

Copy link
Member Author

@anmolsachan anmolsachan May 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a script like this and call from the plugin if it exists and store the dump

import etcd
import json
import os
from ruamel import yaml


config = "/etc/tendrl/node-agent/node-agent.conf.yaml"
dump_location = "etcd_dump"


def get_etcd_ip(fil):
    if os.path.isfile(fil):
        with open(fil, 'r') as confyml:
            cfg = yaml.load(confyml)
            return cfg.get("etcd_connection", None)


etcd_ip = get_etcd_ip(config)
if etcd_ip:
    keys = ["/queue", "/clusters"]
    data = {}
    for key in keys:
        try:
            client = etcd.Client(host=etcd_ip, port=2379)
            data[key] = client.read(key, recursive=True).__dict__
        except etcd.EtcdKeyNotFound:
            print("key %s not found" % key)
    with open(dump_location, "w") as dump:
        dump.write(json.dumps(data))
else:
    with open(dump_location, "w") as dump:
        dump.write("Etcd Ip not found.")



** Tendrl-gluster-integration
*** Rpm versions of commons, node-agent and gluster-integration
*** Tendrl-node-agent service status
*** Gdeploy status
*** Configurations in /etc/tendrl/gluster-integration/
*** Package requirements

** Tendrl-ceph-integration
*** Rpm versions of commons, node-agent and ceph-integration
*** Tendrl-node-agent service status
*** Ceph cluster health ("ceph -w" or "ceph status || ceph -w")
*** Node-agent service status
*** Configuration in /etc/tendrl/ceph-integration/
*** Package requirements

** Tendrl-performance-monitoring
*** Rpm versions of commons, node-agent and performance-monitoring
*** Tendrl-node-agent service status
*** GraphiteDB status and required permissions
*** Carbon-cache service status
*** Configurations in /etc/tendrl/performance-monitoring/
*** Package requirements

** Tendrl-api
*** Installed ruby version
*** Package requirements
*** Gem dependencies
*** Apache httpd process status and configurations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*** Etcd connection configuration

* Since logging is common for all the Tendrl services the logs can be captured from syslog.
** According to current rsyslog config the log messages are present in /var/log/messages

* Sample code :

[source, python]
.tendrl-node-agent.py
----
from sos.plugins import Plugin


class TendrlNodeAgent(Plugin):
"""Tendrl Node Agent
"""
plugin_name = "tendrl_node_agent"
profiles = ('tendrl',)

def setup(self):

self.limit = self.get_option("log_size")
self.add_copy_spec_limit("/var/log/messages-*", sizelimit=self.limit)
self.add_copy_spec("/etc/tendrl/node-agent/")
self.add_cmd_output("rpm -qa | grep tendrl",
suggest_filename="tendrl_rpm_version")
self.add_cmd_output(["yum repolist | grep tendrl",
"systemctl status tendrl-node-agent.socket"
])

----

[source, python]
.tendrl-gluster-integration.py
----
from sos.plugins import Plugin


class TendrlGlusterIntegration(Plugin):
"""Tendrl Gluster Integration
"""
plugin_name = "tendrl_gluster_integration"
profiles = ('tendrl',)

def setup(self):

self.limit = self.get_option("log_size")
self.add_copy_spec_limit("/var/log/messages-*", sizelimit=self.limit)
self.add_copy_spec("/etc/tendrl/gluster-integration/")
self.add_cmd_output("rpm -qa | grep tendrl",
suggest_filename="tendrl_rpm_version")
self.add_cmd_output("systemctl status tendrl-node-agent")
----

[source, python]
.tendrl-ceph-integration.py
----
from sos.plugins import Plugin


class TendrlCephIntegration(Plugin):
"""Tendrl Ceph Integration
"""
plugin_name = "tendrl_ceph_integration"
profiles = ('tendrl',)

def setup(self):

self.limit = self.get_option("log_size")
self.add_copy_spec_limit("/var/log/messages-*", sizelimit=self.limit)
self.add_copy_spec("/etc/tendrl/ceph-integration/")
self.add_cmd_output("rpm -qa | grep tendrl",
suggest_filename="tendrl_rpm_version")
self.add_cmd_output("systemctl status tendrl-node-agent")
----

[source, python]
.tendrl-performance-monitoring.py
----
from sos.plugins import Plugin


class TendrlPerformanceMonitoring(Plugin):
"""Tendrl Performance Monitoring
"""
plugin_name = "tendrl_performance_monitoring"
profiles = ('tendrl',)

def setup(self):

self.limit = self.get_option("log_size")
self.add_copy_spec_limit("/var/log/messages-*", sizelimit=self.limit)
self.add_copy_spec("/etc/tendrl/performance-monitoring/")
self.add_cmd_output("rpm -qa | grep tendrl",
suggest_filename="tendrl_rpm_version")
self.add_cmd_output(["systemctl status tendrl-node-agent.socket",
"systemctl status carbon-cache",
"ls -la /var/lib/graphite-web/graphite.db"
])

----

[source, python]
.tendrl-api.py
----
from sos.plugins import Plugin


class TendrlApi(Plugin):
"""Tendrl Node Agent
"""
plugin_name = "tendrl_node_agent"
profiles = ('tendrl',)

def setup(self):

self.add_copy_spec("/etc/tendrl/etcd.yml")
self.add_cmd_output(["ruby -v",
"gem --version"
])
self.add_cmd_output(["systemctl status httpd.service"])
----

=== Alternatives

* Rather than creating different plugings for different tendrl services, a
single plugin can also be taken into consideration.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need review on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put up a question on the sosreport repo about multi plugin or single plugin approach
https://github.com/sosreport/sos


[source, python]
.tendrl.py
----
from sos.plugins import Plugin

class Tendrl(Plugin):
"""Tendrl
"""
plugin_name = "tendrl"
profiles = ('tendrl', 'storage')

def setup(self):

self.limit = self.get_option("log_size")
self.add_copy_spec_limit("/var/log/messages-*", sizelimit=self.limit)
self.add_copy_spec("/etc/tendrl/")
self.add_cmd_output("rpm -qa | grep tendrl",
suggest_filename="tendrl_rpm_version")
self.add_cmd_output(["yum repolist | grep tendrl",
"systemctl status tendrl-node-agent.socket",
"systemctl status tendrl-node-agent",
"systemctl status carbon-cache",
"ls -la /var/lib/graphite-web/graphite.db",
"ruby -v",
"gem --version",
"systemctl status httpd.service"
])
----

=== Data model impact:

None

=== Impacted Modules:

None

==== Tendrl API impact:

None

==== Notifications/Monitoring impact:

None

==== Tendrl/common impact:

None

==== Tendrl/node_agent impact:

None

==== Sds integration impact:

None

=== Security impact:

None

=== Other end user impact:

None

=== Performance impact:

None

=== Other deployer impact:

None

=== Developer impact:

None

== Implementation:

None

=== Assignee(s):

Primary assignee:
anmolsachan

=== Work Items:

To be decided.

== Dependencies:

Listed in proposed change section.

== Testing:

None

== Documentation impact:

None

== References:

* https://github.com/Tendrl/documentation/wiki/Tendrl-Package-Installation-Reference
* https://github.com/Tendrl/api#_deployment_requirements