Skip to content

Commit 697746c

Browse files
author
Shubhendu
committed
Added spec specs/unmanage_cluster.adoc
Signed-off-by: Shubhendu <shtripat@redhat.com>
1 parent da44433 commit 697746c

File tree

1 file changed

+213
-0
lines changed

1 file changed

+213
-0
lines changed

specs/unmanage_cluster.adoc

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
= Introduce a un-manage cluster mechanism in tendrl
2+
3+
The intent of this change is to introduce an un-manage cluster functionality in
4+
tendrl. This makes the cluster known to tendrl but not managed anymore, meaning
5+
the monitoring, alerting and management of the cluster is no more possible from
6+
tendrl. At later stage (if required) admin can decide to re-import the cluster
7+
to start managing it again.
8+
9+
The un-manage functionality is helpful for scenario where admin wants to bring
10+
down the cluster for some critical maintenance activities and doesn't want the
11+
monitoring etc to be performed for that period.
12+
13+
== Problem description
14+
15+
There are situations when admin needs some critical maintenance of the cluster
16+
and during this period he doesn't want any monitoring etc taking place. Also
17+
of he decides to dismantle the cluster at some stage we should have a mechsnism
18+
using which the cluster could be marked as un-managed from tendrl side.
19+
20+
Tendrl also should provide a provision to re-import the cluster at later stage
21+
if admin wants and the process should be quite seamless and no or very less
22+
manual intervention required for this job to be performed.
23+
24+
25+
== Use Cases
26+
27+
This addresses the un-managing and re-import an un-managed cluster at later
28+
stage. The un-manage functionality in tendrl needs to take care of below things
29+
30+
* Un-install any components which got installed as part of tendrl managing the
31+
storage nodes and disable the services
32+
* Ste the cluster state properly so that the same is marked and listed as
33+
un-managed in UI dashboards. No operations should be allowed on the un-managed
34+
cluster and there should not be any monitoring, alerting or entities management
35+
supported on this cluster anymore
36+
* User should have an option to re-import the cluster if needed later and it
37+
should seamlessly work as usual
38+
39+
40+
== Proposed change
41+
42+
* On un-manage cluster start a flow in tendrl server node's node-agent which
43+
creates child jobs on storage nodes to stop tendrl specific services like
44+
collectd and tendrl-gluster-integration
45+
46+
* Mark the cluster flag `is_managed` as `False` so that the cluster could be
47+
listed as un-managed in UI dashboards and all the possible actions could be
48+
disabled for it
49+
50+
* Archive the graphite (monitoring) data for the cluster in archive location so
51+
the grafana dashboards dont list the cluster and its entities anymore
52+
53+
* Delete the grafana alert dashboards for the cluster and its dependent entities
54+
55+
The logic here goes like
56+
57+
** Start a flow in node-agent on tendrl server node for un-manage cluster
58+
59+
** The first atom of the above flow invokes child jobs on the storage node's
60+
node-agent to stop tendrl specific services and marking them dissabled
61+
62+
** In the main atom of the un-manage cluster flow remove if any etcd details for
63+
the cluster and then mark the cluster is_managed flag as `False`
64+
65+
** One of the atoms now un-manage cluster flow, invokes a flow in
66+
monitoring-integration to archive the graphite data for the cluser
67+
68+
** Finally another atom invokes a flow in monitoring-integration to remove the
69+
grafana alert dashboards for the cluster and its dependent entities
70+
71+
So the structure of the un-manage cluster flow would look something as below
72+
73+
```
74+
UnmanageCluster:
75+
tags:
76+
- "tendrl/monitor"
77+
atoms:
78+
- tendrl.objects.Cluster.atoms.StopMonitoringServices
79+
- tendrl.objects.Cluster.atoms.StopIntegrationServices
80+
- tendrl.objects.Cluster.atoms.DeleteClusterDetails
81+
- tendrl.objects.Cluster.atoms.DeleteMonitoringDetails
82+
help: "Unmanage a Gluster Cluster"
83+
enabled: true
84+
inputs:
85+
mandatory:
86+
- TendrlContext.integration_id
87+
run: tendrl.flows.UnmanageCluster
88+
type: Update
89+
uuid: 2f94a48a-05d7-408c-b400-e27827f4efed
90+
version: 1
91+
```
92+
93+
=== Alternatives
94+
95+
None
96+
97+
=== Data model impact
98+
99+
None
100+
101+
=== Impacted Modules:
102+
103+
==== Tendrl API impact:
104+
105+
* Introduce an API `cluster/{int-id}/unmanage` for triggering an un-manage
106+
cluster fow
107+
108+
==== Notifications/Monitoring impact:
109+
110+
* A flow to archive the cluster specific graphite data
111+
112+
* A flow to remove the grafana alerts dashboards for the cluster and its
113+
dependent entities
114+
115+
* Raise an alert once cluster got un-managed with details like where to look
116+
for old graphite data etc
117+
118+
==== Tendrl/common impact:
119+
120+
* A flow un-manage cluster to be tergetted at tendrl server node
121+
122+
==== Tendrl/node_agent impact:
123+
124+
None
125+
126+
==== Sds integration impact:
127+
128+
None
129+
130+
==== Tendrl Dashboard impact:
131+
132+
* UX requirements for invoking an un-manage cluster flow for an existing cluster
133+
is captured at https://redhat.invisionapp.com/share/8QCOEVEY9
134+
135+
=== Security impact:
136+
137+
None
138+
139+
=== Other end user impact:
140+
141+
User gets an option to un-mnaage an existing cluster and can re-import at later
142+
stage
143+
144+
=== Performance impact:
145+
146+
None
147+
148+
=== Other deployer impact:
149+
150+
The tendrl-ansible module need to provide a mechanism to setup tendrl components
151+
and dependencies on additional new node in the cluster.
152+
153+
<TBD> details to be added here of the plyabooks etc.
154+
155+
=== Developer impact:
156+
157+
None
158+
159+
160+
== Implementation:
161+
162+
* https://github.com/Tendrl/commons/issues/797
163+
164+
165+
=== Assignee(s):
166+
167+
Primary assignee:
168+
shtripat
169+
mbukatov
170+
171+
=== Work Items:
172+
173+
* https://github.com/Tendrl/specifications/issues/252
174+
175+
176+
== Dependencies:
177+
178+
None
179+
180+
== Testing:
181+
182+
* Check if UI dashboard has an option to trigget un-manage cluster flow
183+
184+
* Check if the flow gets completed successfully and verify if the grafana
185+
dashboard reflects and cluster details available now for the selected cluster
186+
187+
* Verify that not grafana alert dashboards available now for the un-managed
188+
cluster
189+
190+
* Verify that the clusters list report the cluster as un-managed and import
191+
option is enabled now
192+
193+
* Try to import the cluster back and it should be successful. All grafana
194+
dashboards, grafana alert dashboards and UI reflect the cluster details back
195+
196+
* Invoke the REST end point `clusters/{int-id}/unmanage` and the cluster should
197+
be un-managed successfully
198+
199+
200+
== Documentation impact:
201+
202+
* New un-manage cluster feature should be documented with details like what all
203+
gets disabled / removed in case a cluster is un-managed
204+
205+
* New API end point should be documented with sample input / output structures
206+
207+
== References:
208+
209+
* https://redhat.invisionapp.com/share/8QCOEVEY9
210+
211+
* https://github.com/Tendrl/commons/pull/798
212+
213+
* https://github.com/Tendrl/monitoring-integration/pull/317

0 commit comments

Comments
 (0)