Skip to content

Conversation

@shtripat
Copy link
Member

Signed-off-by: Shubhendu shtripat@redhat.com

@shtripat
Copy link
Member Author

@r0h4n @nthomas-redhat @julienlim @mbukatov plz check and ack

@shtripat shtripat force-pushed the spec/unmanage-cluster branch from 697746c to 6d013a0 Compare January 22, 2018 03:52
Copy link

@jjkabrown1 jjkabrown1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very understandable specification. Thanks

@shtripat
Copy link
Member Author

Very understandable specification. Thanks

Thanks Jeff :)

@julienlim
Copy link
Member

@shtripat @r0h4n @nthomas-redhat @mbukatov @jjkabrown1

The spec overall looks good. Just a couple things.

Is there a section for Logging impact? I'm guessing this is indirectly assumed right now.

How do we handle error handling? Specifically, if a user is looking at a Grafana dashboard about the cluster that is going to be unmanaged, what happens? I know some of this is out of our control, but it would be good to set some expectations on the expected behavior and if we have to document what to expect. Since it may provide a possible awkward experience on Grafana dashboard (where you're looking at a Dashboard and suddenly the data is archived and I'm assuming Grafana throws some kind of error message), do we want to consider some kind of warning message as part of the "Unmanage Cluster" action -- this depends on what kind of behavior we get from Grafana. Note: This may be an edge case we may say we choose to handle via documentation and take up later when users provide feedback about it.

Thoughts?

@shtripat
Copy link
Member Author

shtripat commented Feb 1, 2018

@julienlim may be I would details under acceptance criteria about some expected behaviour on this. Also document impact I would add so that admin guide can set the proper expectations.

@julienlim
Copy link
Member

@shtripat Sounds good!

@shtripat
Copy link
Member Author

shtripat commented Feb 5, 2018

@julienlim the final alert raised post un-manage of the cluster, may mention that details from grafana dashboard and grafana alert dashboard would vanish post this. This would make sure user is aware of the expected behavior of this flow.

@shtripat shtripat force-pushed the spec/unmanage-cluster branch from 6d013a0 to 5379797 Compare February 5, 2018 05:54
@shtripat
Copy link
Member Author

shtripat commented Feb 5, 2018

@julienlim I have added few more expected behavior details as suggested above. Please check.

@julienlim
Copy link
Member

@shtripat I reviewed the updates which include a few more expected behavior details. Looks good. Thanks.

@julienlim
Copy link
Member

@shtripat @r0h4n @nthomas-redhat

Just noting some of our comments and questions that came up earlier:
Unmanage cluster

  • includes cleanup of data and will not stop any agents
  • some of the cluster data is archived, i.e. logs and notifications
  • what happens when you import again (to the notifications and logs tied to the old cluster id)? We will need maintain cluster entity so we don't end up with orphaned archived data.
  • What happens with graphite archived data? Currently, there is no easy way to bring it back with current cluster, and we will leave it archived. Notification should provide link to where the data is archived (and we can address provide manual instructions if needed).

@shtripat shtripat force-pushed the spec/unmanage-cluster branch from 5379797 to 241d52c Compare February 7, 2018 07:16
Shubhendu added 2 commits February 7, 2018 12:47
tendrl-bug-id: Tendrl#252
Signed-off-by: Shubhendu <shtripat@redhat.com>
tendrl-bug-id: Tendrl#252
Signed-off-by: Shubhendu <shtripat@redhat.com>
@shtripat shtripat force-pushed the spec/unmanage-cluster branch from 241d52c to 326ba53 Compare February 7, 2018 07:17
@shtripat
Copy link
Member Author

shtripat commented Feb 7, 2018

Added UI impact details as well as part of spec.
@julienlim @r0h4n @a2batic @nthomas-redhat @gnehapk please have a look.

tendrl-bug-id: Tendrl#252
Signed-off-by: Shubhendu <shtripat@redhat.com>
@shtripat shtripat force-pushed the spec/unmanage-cluster branch from 326ba53 to 53fdc5d Compare February 7, 2018 08:21

*** The columns `Volume Profiling`, `Volumes` and `Alerts` would be hidden

*** `View Details` link would be available to check the task details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a2batic View Details link is used for showing error list. The link should be named as Track Progress similar to Import task detail page which will take user to unmanage task detail page. @julienlim @mcarrano Where the unmanage task detail page will be shown in UI? Will it be similar to import task detail view or do we need to create a global task list view to display all the global tasks.

@a2batic @shtripat Will the cluster list API response provides unmanage task_id ? What will be the value of is_managed property after triggering Unmanage action and before the unmanage task gets completed? @a2batic Please add the API details too for eg. - API url, request data, response.

Copy link
Member

@a2batic a2batic Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnehapk, yes, it should be Track Progress.
@gnehapk, the value for is_managed should be 'no' after triggering unmanage action and before the unmanage task gets completed. I will add the API details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnehapk @a2batic
@mcarrano

View Details is used to show error list, it is also used to show import task details (as it's importing). See https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/244738628.

For the unmanaged task details, it will be similar to the import task details view. Reason is there is no active cluster anymore during an unmanage cluster.

@shtripat
Copy link
Member Author

shtripat commented Feb 7, 2018

@gnehapk regarding jobs running for the cluster at a time, I have given a thought and find that at a time there would only one task active for cluster. I would suggest to rename the field import_job_id to cluster_job_id.

This field would get updated while import/unmanage and any other cluster flow happening for the cluster and UI can consume the same for any purpose.

@r0h4n @nthomas-redhat comments/suggestions?

tendrl-bug-id: Tendrl#252
Signed-off-by: Shubhendu <shtripat@redhat.com>
@shtripat shtripat force-pushed the spec/unmanage-cluster branch from 37b8032 to 08cad58 Compare February 8, 2018 06:33
tendrl-bug-id: Tendrl#252
Signed-off-by: Shubhendu <shtripat@redhat.com>
Copy link
Member

@julienlim julienlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes reflect what was discussed in https://github.com/Tendrl/documentation/wiki/Architecture-Meeting-Notes 13 Feb 2018

version: 1
```

* While import flow in progress the values of `current_job` and `statuss`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo statuss

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

available for the same. User can opt to close this dialog and later user context
menus to check the task updates

** Once a cluster is being moved to un-managed state, the changes in properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this after cluster moved to un-manage state or just after the task is created and in-progress?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its while un-manage in progress. @a2batic can you ack this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shtripat @nthomas-redhat, its when the task for unmanage has been submitted from UI and API has acknowledged the submission of task.

creates child jobs on storage nodes to stop tendrl specific services like
collectd and tendrl-gluster-integration

* Mark the cluster flag `is_managed` as `False` so that the cluster could be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are deleting the cluster data also from the central store, please mention that as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a line talking about the same

** Once the un-manage cluster task gets completed a global notification gets
received

** If task was successful, the state of the cluster would be changed to ready to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that cluster data is removed completely from etcd and this cluster detection logic detects it as a fresh cluster

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fresh cluster detected is actually a ready to import cluster


If task failed due to some issues, the cluster details would listed as below in

*** `Import Status` changed to `Unmanage failed`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose it failed half-way through, what is way forward?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally user would be allowed to execute un-manage again for the cluster (as discussed in 13th Feb 2018 arch call)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shtripat @nthomas-redhat @julienlim @mcarrano , user will be able to execute unmanage again, but before the user confirms unmanage, there should be a popup saying "The cluster unamange has been failed once with the <job_id>, it is recommended to resolve the error and then unmanage" or something similar ?
What are your thoughts?

tendrl-bug-id: Tendrl#252
Signed-off-by: Shubhendu <shtripat@redhat.com>
@shtripat shtripat force-pushed the spec/unmanage-cluster branch from 5f3a436 to 0887eb1 Compare February 14, 2018 09:18
@a2batic
Copy link
Member

a2batic commented Feb 14, 2018

@shtripat @nthomas-redhat @r0h4n @gnehapk @cloudbehl, as discussion with @julienlim, we can add following points : -

For milestone 2:-

  1. If cluster import fails or misconfigured, import button and unmanage button will be enabled.
    When user clicks on import button, user lands in import cluster view/page. If there was a previous import failed job, then modal shows up the and message should read something like "Import cluster previously failed with <job_id>. Before Import, you need to correct the issue and then Unmanage the cluster." Followed by OK and cancel button.

  2. In case of unmanage failure, provide a tooltip/info tip over failure message to say, "If unmanage fails, resolve the issue, then try Unmanage cluster again."
    We should show a message to say Unmanage Cluster failed followed by "View details (hyperlink)" in the cluster list.

Note: The design might change for milestone 3 as the cleanup option will be introduced
@rghatvis, It might be a good idea to add these points as troubleshooting tips in docs.

@shtripat
Copy link
Member Author

@a2batic ack. will update the spec accordingly

tendrl-bug-id: Tendrl#252
Signed-off-by: Shubhendu <shtripat@redhat.com>
@shtripat
Copy link
Member Author

@a2batic @julienlim I have added the points mentioned in #255 (comment) to the specifications document as well.

* Verify the scenatio when a cluster import fails, and user is able to start
a un-manage + reimport cluster option from UI. UI should be able to list details
of both the tasks in this scenario

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one other scenario which should be tested and that's when un-manage fails.
Moreover check of View Details link should be here too - it should be present during un-manage task run and if it fails. It should provide almost the same as for import.

@mcarrano
Copy link

@julienlim and I have updated the wireframe designs on InVision to address how this should work for Milestone 3.
https://redhat.invisionapp.com/share/8QCOEVEY9

Changes are included in the Cluster list view, Import page, and Import Task Details pages. Let us know if you have any questions.

@julienlim
Copy link
Member

@a2batic @shtripat @gnehapk @r0h4n @nthomas-redhat @Tendrl/tendrl-qe @mcarrano

Please note that the Task Details design has changed to allow for showing multiple tasks within the same view through the use of a master-detail (table) design. Even if there is a single task, we would also follow this same design for consistency.

@cybernth
Copy link
Contributor

Please note that the Task Details design has changed to allow for showing multiple tasks within the same view through the use of a master-detail (table) design. Even if there is a single task, we would also follow this same design for consistency.

@a2batic , can you create a separate issue to track this?

previous import failure and if user confirms to go ahead about un-manage and
then import the cluster, UI should submit an un-manage cluster first. If the
un-manage cluster task succeeds, then UI should submit a import for the same
cluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question here. If user chooses to click on import button again how will the View Details link work? Will it link first to Task Details of un-manage task and later details of import task?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ltrilety
If a user chooses to click on Import button, the View Details will show the link to the last task run.

If an unmanage ran successfully, then import ran successfully/unsuccessfully, it will show the import.

If an unmanage did not run successfully (then import does not run), it will show the unmanage.

@r0h4n r0h4n added this to the Milestone 2 (2018) milestone Feb 17, 2018
* Verify the scenatio when a cluster import fails, and user is able to start
a un-manage + reimport cluster option from UI. UI should be able to list details
of both the tasks in this scenario

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be also tested if services collectd and tendrl-gluster-integration are stopped and disabled on hosts of unmanaged cluster.

*** `View Details` link would be available to check the task details

*** `Dashboard` button would be disabled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julienlim @mcarrano, we added to disable 'Dashboard' button when unmanage is in progress, but design[1] has 'import' button disabled.

[1] https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/244738628

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a2batic @mcarrano

After Unmanage action is clicked, you will see get the modal (https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/273239640) followed by https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/273239844.

Once you close out of the modals, the cluster page now show for the cluster being unmanaged what's shown in https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/244738628 (the row pointed to by annotation #13) or https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/244738625 (last row in the cluster list).

There is no "Dashboard" button as when you unmanage, there should no longer be Dashboard access.

@r0h4n
Copy link
Contributor

r0h4n commented Mar 19, 2018

@Tendrl/specs Please approve individually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants