-
Notifications
You must be signed in to change notification settings - Fork 66
Recurring/intermittent issue - user account env reset fails (configmaps fabric8-environments not found) #3500
Comments
I just ran into this. I ran the reset again and it cleared it out. |
For the most part, a retry resolves this problem. However, I have seem instances where multiple retries are needed. No details other than the 500 error are logged. |
I think this is a duplicate of #2867 |
Seeing some additional patterns. Before the reset operation that is to fail is started, the reset env page only displays a subset of the user's current spaces - and the formatting of the horizontal row for the spaces is truncated - in the following example, there were actually 2 spaces in the user's account: Here's the correct set of spaces: So - perhaps the error does not occur in the actual deletion of the spaces, but in the collecting of the spaces before the reset operation is performed? |
Seeing a pattern where additional information is available: The 500/server error returned includes:
|
This is happening repeatedly - this should be a SEV2. |
could be related to #3556 |
+1 - that is the same error message as in #3556 |
@aslakknutsen do you have any info or ideas on this? |
I think what is happening (with some help from @jiekang), is that the frontend is calling the Delete Space API and Clean Tenant API asynchronously [1]. Before these two APIs operated on different backend resources, but this is no longer true now that Delete Space also cleans up OpenShift resources. Interleaving these two API calls could trigger a variety of errors. Perhaps we should make cleaning up OpenShift resources optional in the Delete Space API. Then it could do so when deleting individual spaces, but defer to the more robust/faster cleanup in Clean Tenant when resetting the environment. |
@ebaron I wouldn't mind fixing the race condition in cleanup.component.ts. What's the suggested API to use now? Is having it run the delete spaces one by one, followed by clean tenant, an okay fix? |
@jiekang Yes, I think what you described would be the way to fix it in the frontend. We could also add an optional parameter to the API where you can skip deleting OpenShift resources when deleting a space. Then I don't believe we would have to synchronize between deleting spaces and cleaning the tenant. |
Ah; so the longer path is to provide a parameter in the API, and then have the UI use it. Hmm... |
@ebaron Were you planning to take a look at providing that parameter? |
@jiekang Sure, if there are no objections, I can add this parameter to the Delete Space API. The frontend will have to make sure that this argument is set to true only when resetting the environment. |
I've assigned myself here as well. I can alter the frontend cleanup code to set the parameter to true. |
Just to confirm - the fix will be to delete spaces one by one, followed by clean tenant - correct? |
@ldimaggi We're currently going with a different fix: Make delete space API have parameter that, when true, makes delete API not clash with clean tenant API. Then front-end is free to request delete spaces and clean tenant APIs at the same time. |
However I'm open to any opinions otherwise. |
I am open to suggestions as well. The clean tenant API clears a superset of what is cleared by the delete space API. When resetting the environment, this makes any OpenShift cleanup done by the delete space API redundant and also prone to racing with the clean tenant API. |
I'd prefer a single endpoint exist for the 'reset environment' action that handles it however it wants, but that might be exposing too much at a single point. If there aren't any opinions I think we can just go ahead with the optional parameter until someone objects :) |
@joshuawilson @jiekang Just a heads up, I opened a PR for the backend portion of the fix I proposed above (#3500 (comment)): fabric8-services/fabric8-wit#2121 |
…2121) When a user resets their environment, the front-end makes calls to the Delete Space API and Clean Tenant API. It makes these calls asynchronously, and due to both APIs acting on the same resources, I suspect this is the reason we are seeing a variety of errors in openshiftio/openshift.io#3500. Since the Clean Tenant API cleans out the user's entire namespaces, it is not necessary in this case for Delete Space to delete anything from OpenShift. This PR adds an optional parameter skipCluster, to the Delete Space API, which if true, will not attempt to delete any deployments from OpenShift. The front-end could then use this parameter only when resetting the user's environment. An alternative would be for the front-end to synchronize between deleting spaces and calling Clean Tenant, but this would be less efficient. Fixes (partially): openshiftio/openshift.io#3500
**commit** fabric8-services/fabric8-wit@cefe36a **Author:** Rohit Kumar Rai <rohitkrai03@gmail.com> **Date:** Wed Jul 11 17:54:40 2018 +0530 Changed SHA checksum for dep-darwin-amd64 and setting UNAME_S variable (fabric8-services/fabric8-wit#2163) - For macOS dep package was updated with a new SHA. This SHA value is checked in `Makefile` with a hardcoded SHA checksum to verify dep package. - Updated the SHA value to new one. - Added initialization of `$(UNAME_S)` variable `UNAME_S=$(shell uname -s)` since it was being used but never set which lead to explicitly exporting the variable from shell. Similar PR for fabric8-auth - fabric8-services/fabric8-auth#549 Note: We can probably look into automating the process where SHA value is fetched dynamically instead of hard coding. **commit** fabric8-services/fabric8-wit@f13e094 **Author:** Elliott Baron <ebaron@redhat.com> **Date:** Wed Jul 11 13:46:08 2018 -0400 Add parameter to Delete Space to skip deleting OpenShift resources (fabric8-services/fabric8-wit#2121) When a user resets their environment, the front-end makes calls to the Delete Space API and Clean Tenant API. It makes these calls asynchronously, and due to both APIs acting on the same resources, I suspect this is the reason we are seeing a variety of errors in openshiftio/openshift.io#3500. Since the Clean Tenant API cleans out the user's entire namespaces, it is not necessary in this case for Delete Space to delete anything from OpenShift. This PR adds an optional parameter skipCluster, to the Delete Space API, which if true, will not attempt to delete any deployments from OpenShift. The front-end could then use this parameter only when resetting the user's environment. An alternative would be for the front-end to synchronize between deleting spaces and calling Clean Tenant, but this would be less efficient. Fixes (partially): openshiftio/openshift.io#3500 **commit** fabric8-services/fabric8-wit@cb85aa7 **Author:** Ibrahim Jarif <jarifibrahim@gmail.com> **Date:** Fri Jul 13 12:57:43 2018 +0530 Refactor search_blackbox_test.go (fabric8-services/fabric8-wit#2148) **commit** eae146f7d3cdf1d6c40588608e60d688aaf6ad83 **Author:** Dhriti Shikhar <dhriti.shikhar.rokz@gmail.com> **Date:** Fri Jul 13 16:55:40 2018 +0530 Increase paging limit (fabric8-services/fabric8-wit#2166) **commit** fabric8-services/fabric8-wit@d198813 **Author:** Baiju Muthukadan <baiju.m.mail@gmail.com> **Date:** Fri Jul 13 17:46:12 2018 +0530 Revert "List work items part of child iterations (fabric8-services/fabric8-wit#2146)" (fabric8-services/fabric8-wit#2168) This reverts commit 68996d1555a29a9ef310403403855b47559d5a71. This is required to address #3974 **commit** fabric8-services/fabric8-wit@a4d9061 **Author:** Michael Kleinhenz <kleinhenz@redhat.com> **Date:** Fri Jul 13 16:24:37 2018 +0200 feat(boardview): Board View for WIT. (fabric8-services/fabric8-wit#2111)
**commit** fabric8-services/fabric8-wit@cefe36a **Author:** Rohit Kumar Rai <rohitkrai03@gmail.com> **Date:** Wed Jul 11 17:54:40 2018 +0530 Changed SHA checksum for dep-darwin-amd64 and setting UNAME_S variable (fabric8-services/fabric8-wit#2163) - For macOS dep package was updated with a new SHA. This SHA value is checked in `Makefile` with a hardcoded SHA checksum to verify dep package. - Updated the SHA value to new one. - Added initialization of `$(UNAME_S)` variable `UNAME_S=$(shell uname -s)` since it was being used but never set which lead to explicitly exporting the variable from shell. Similar PR for fabric8-auth - fabric8-services/fabric8-auth#549 Note: We can probably look into automating the process where SHA value is fetched dynamically instead of hard coding. **commit** fabric8-services/fabric8-wit@f13e094 **Author:** Elliott Baron <ebaron@redhat.com> **Date:** Wed Jul 11 13:46:08 2018 -0400 Add parameter to Delete Space to skip deleting OpenShift resources (fabric8-services/fabric8-wit#2121) When a user resets their environment, the front-end makes calls to the Delete Space API and Clean Tenant API. It makes these calls asynchronously, and due to both APIs acting on the same resources, I suspect this is the reason we are seeing a variety of errors in openshiftio/openshift.io#3500. Since the Clean Tenant API cleans out the user's entire namespaces, it is not necessary in this case for Delete Space to delete anything from OpenShift. This PR adds an optional parameter skipCluster, to the Delete Space API, which if true, will not attempt to delete any deployments from OpenShift. The front-end could then use this parameter only when resetting the user's environment. An alternative would be for the front-end to synchronize between deleting spaces and calling Clean Tenant, but this would be less efficient. Fixes (partially): openshiftio/openshift.io#3500 **commit** fabric8-services/fabric8-wit@cb85aa7 **Author:** Ibrahim Jarif <jarifibrahim@gmail.com> **Date:** Fri Jul 13 12:57:43 2018 +0530 Refactor search_blackbox_test.go (fabric8-services/fabric8-wit#2148) **commit** eae146f7d3cdf1d6c40588608e60d688aaf6ad83 **Author:** Dhriti Shikhar <dhriti.shikhar.rokz@gmail.com> **Date:** Fri Jul 13 16:55:40 2018 +0530 Increase paging limit (fabric8-services/fabric8-wit#2166) **commit** fabric8-services/fabric8-wit@d198813 **Author:** Baiju Muthukadan <baiju.m.mail@gmail.com> **Date:** Fri Jul 13 17:46:12 2018 +0530 Revert "List work items part of child iterations (fabric8-services/fabric8-wit#2146)" (fabric8-services/fabric8-wit#2168) This reverts commit 68996d1555a29a9ef310403403855b47559d5a71. This is required to address #3974 **commit** fabric8-services/fabric8-wit@a4d9061 **Author:** Michael Kleinhenz <kleinhenz@redhat.com> **Date:** Fri Jul 13 16:24:37 2018 +0200 feat(boardview): Board View for WIT. (fabric8-services/fabric8-wit#2111)
@jiekang there is now a "skipCluster" boolean argument for the delete space API in production. By setting this to true (e.g. |
Okay I will look at opening a PR for that. |
All PRs have been merged to make use of the "skipCluster" argument when deleting spaces during an environment reset, and are now in production. If the issue reoccurs, feel free to reopen. |
Resetting a user's environment sometimes fails with this error:
Over time, I have seen this error less than 5% of the time - in debugging tests today, I have been seeing it ~20% of the time - randomly.
The text was updated successfully, but these errors were encountered: