Skip to content
This repository has been archived by the owner on Jul 23, 2020. It is now read-only.

Intermittent issue - application is not available on run environment #4009

Open
ljelinkova opened this issue Jul 17, 2018 · 43 comments
Open

Comments

@ljelinkova
Copy link
Collaborator

The E2E tests fail intermittently because the application is not available on run environment.

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/07-02-run.html

The E2E workflow is:

  1. Create space
  2. Create Vert.x with REST API and Rollout to Run strategy
  3. Waits until the pipeline is finished and promotes to Run
  4. Checks the stage application
  5. Checks the run application - this step fails

This is the Jenkins log
http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/05-01-jenkins-log.html

This is output of a script that lists Jenkins pods, we can add any oc command you might need to debug this issue

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/oc-logs-output.txt

@ljelinkova ljelinkova changed the title Intermittent issue - application is not available to run environment Intermittent issue - application is not available on run environment Jul 17, 2018
@ldimaggi
Copy link
Collaborator

Is there any possibility that this is a timing issue? I have seen random instances of the deployment to run requiring a long time to complete.

Or, might this be a resources issue? Do you see anything in the logs related to a quota being reached? Maybe the env reset is not removing any existing deployments to run?

@hrishin
Copy link

hrishin commented Jul 18, 2018

@ljelinkova did you see resource quota? sometimes quota gets full for target namespace. OS accepts the new DC request successfully but not able to deploy it if the quota is reached.

@ljelinkova
Copy link
Collaborator Author

@hrishin The resource quota should not be the problem since we reset the whole environment after each test.

@ldimaggi It might be the timing issue, maybe if the tests waited for some time the app would start.

But the main question is - how does random user know that the deployment failed? Or that the deployment is finished? Should not this be part of "Pipeline"? I simple assumed that once the Pipeline is finished I can assume all is set and ready and I can start using the application.

@ppitonak
Copy link
Collaborator

ppitonak commented Jul 26, 2018

I agree that it should be part of the pipeline, i.e. "Rollout to Run" step should be marked as successful only when the app was deployed successfully. WDYT @openshiftio/uxd-team @catrobson

@catrobson
Copy link
Collaborator

@ppitonak Agree we would only mark that step successful when the app was deployed successfully.

@kwk
Copy link
Contributor

kwk commented Aug 13, 2018

I consider this to be a P1 issue because if we ignore this failure we cannot push to prod.

@ppitonak
Copy link
Collaborator

ppitonak commented Aug 14, 2018

We implemented a workaround in e2e tests (fabric8io/fabric8-test#949).

I had a chat with @aslakknutsen @bartoszmajsak @ljelinkova @jiekang @joshuawilson ... the result of discussion is that we are not able to guarantee that the application is deployed and working at any point in time after the pipeline finished. The dev team is against adding the readiness probe to the pipeline.

@fabric8-ui/uxd I still think that we should signal to the users

  1. that their application was working at some point after pipeline finished (e.g. another step in pipeline) or
  2. that their application is not working at the moment (e.g. status icon next to the link to run env or link to deployments page)

@alexeykazakov
Copy link
Member

While I see the point of separating pipelines and service/pod readiness I also believe there is an UX issue.
We got many reports when user got confused when saw finished piplenes with unavailable app.
For many users it looks like a bug.

@bartoszmajsak
Copy link
Contributor

The dev team is against adding the readiness probe to the pipeline.

@ppitonak I might have missed that part of the long discussion - can you shed some more light on why dev team is against that?

@ljelinkova
Copy link
Collaborator Author

There are other scenarios where the pipeline is confusing. As @rhopp suggested, imagine this scenario.

  1. You have successfully deployed application in version 1.0.1 to both stage and run
  2. You trigger new build, that finishes the step Rollout to stage for version 1.0.2
  3. You are asked for promotion to Run environment
  4. You click the link to stage on Pipelines screen to see the application on stage

And now the question: What version is on the stage? Is it possible that you're still looking into version 1.0.1? Or is there already 1.0.2?

@ppitonak
Copy link
Collaborator

@bartoszmajsak you are right that nobody explicitly said that they would be against, but nobody supported my suggestion. @aslakknutsen argued that we cannot guarantee that the application works at any point in time... while I agree with that I think that adding readiness probe to the pipeline itself will reduce user's confusion.

We would still need to solve problems described by Lucia and Aslak.

@jiekang
Copy link
Collaborator

jiekang commented Aug 15, 2018

To restate my perspective from the discussion Aug 13 2018 on Mattermost

"I do agree that it would be worthwhile making sure we can display probe information if available. I think if the application has a probe, the OSO console sees that and is more clear. Our OSIO pages don't do anything with probes as far as I'm aware."

@bartoszmajsak
Copy link
Contributor

And now the question: What version is on the stage? Is it possible that you're still looking into version 1.0.1? Or is there already 1.0.2?

@ljelinkova you can see that in the openshift deployment object. There is a version label which can tell you that. Is that user-facing information? No. Can you test it to see if your assumption is valid? Yes.

Of course, the application itself could also expose this information, but that's up to the application to do or not.

@ljelinkova
Copy link
Collaborator Author

One of my colleagues from different team tried OSIO and was also confused by the fact that the application was not available after the pipeline finished.

This seems also like a usability issue, so I am adding UX team label too.

@ljelinkova
Copy link
Collaborator Author

@serenamarie125 Could somebody from UX team have a look at this?

The issue here is that some users expect the application to be deployed and ready when the pipeline is finished and that is not true. The end of the pipeline means that the deployment was triggered but the application might not be available for several minutes. However, the link to the deployed application is clickable and user gets the Application is not available page. While this behavior is technically correct, it might be quite confusing.

@hrishin
Copy link

hrishin commented Sep 27, 2018

The new OSIO-pipeline library has verify deployment check which fails the job if deployment is not up and running.

@sthaha @rupalibehera

@ldimaggi
Copy link
Collaborator

So - the UI would show "in progress" until the app endpoint was available, at which point, the checkbox/arrow icon would be displayed, yes?

What about the scenario that Pavel mentioned where a new version of an app is deployed? When do we disable the link to the previously deployed version of the app? When the user starts the build for the new version of the app?

@muruGanesan
Copy link
Collaborator

@ppitonak, thanks for the screenshot.

@ppitonak
Copy link
Collaborator

What about the scenario that Pavel mentioned where a new version of an app is deployed? When do we disable the link to the previously deployed version of the app? When the user starts the build for the new version of the app?

When Build 2 starts, Build 1 is hidden so the issue doesn't exist until the link is displayed. In other words, if first run of pipeline is implemented correctly, there is no issue with second run of pipeline.

@joshuawilson
Copy link
Member

One of the problems is that if it goes green and the link is still inactive and they have the page open, they will just go and refresh the page. If the pipeline is not green till it is ready then we are giving the user a clue that they should not try.

@muruGanesan
Copy link
Collaborator

@ldimaggi , @ppitonak, @joshuawilson,

Please find the 1st draft version & provide ur feedback
https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness

@joshuawilson
Copy link
Member

lgtm

@muruGanesan
Copy link
Collaborator

@ldimaggi , @ppitonak, @alexeykazakov, @hrishin, @bartoszmajsak,@kwk, @ljelinkova, @sthaha, @piyush-garg
Please look at the UX recommendation:
https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness

Note: @alexeykazakov provided his feedback and I responded back with details in the above 'Invision' file.
Feel free to add your review comments. If everyone is fine with the UX recommendation please provide 'thumbs-up'.

@muruGanesan
Copy link
Collaborator

muruGanesan commented Oct 24, 2018

< Iteration -3>
@ldimaggi , @ppitonak, @alexeykazakov, @hrishin, @bartoszmajsak,@kwk, @ljelinkova, @sthaha, @piyush-garg
I discussed with @sthaha on the following use cases:

  1. case-1: Only one application (1 URL)
  2. case-2: There is no application (0 URL) e.g. bot deployment
  3. case-3: Multi-clusters ( > 1 URLs) - this is a future requirement

I covered the "case-1 and 2" and updated the flow. Please review the same and provide your feedback if any.
https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness

NOTE: @alexeykazakov, please verify 'use case-2' and provide your inputs.

@ljelinkova
Copy link
Collaborator Author

@muruGanesan I like the proposal.

@muruGanesan
Copy link
Collaborator

@ljelinkova, Thanks.
If the stakeholders are fine with UX recommendation, I request @sthaha, @joshuawilson to assign to an appropriate team.

@joshuawilson
Copy link
Member

When @sthaha confirms the backend supports it and there is a decision on the design, the UI team can pick up the changes to the pipeline page.

@muruGanesan
Copy link
Collaborator

@ljelinkova, @joshuawilson, @sthaha,
Since UXD proposal is accepted, I am removing 'UX label' and my name from the assignee list.
Please add me/include me if anyone needs any clarification from the UX side.

CC: @serenamarie125

@muruGanesan muruGanesan removed their assignment Nov 10, 2018
@serenamarie125
Copy link
Collaborator

@muruGanesan should we also remove area/ux label?

@muruGanesan
Copy link
Collaborator

@serenamarie125, No, don't have to remove 'area/UX' label because the issue touches some portion of UX.
In addition, UX team is responsible when the label is 'team/ux' - which I removed already.

@joshuawilson
Copy link
Member

@sthaha when will the new OSIO-pipelines be ready (with the verify deployment check)?

@piyush-garg
Copy link
Collaborator

@ppitonak
Copy link
Collaborator

@piyush-garg is it in prod already?

@piyush-garg
Copy link
Collaborator

@ppitonak New pipeline library is not in prod. We are working on moving java booster to the new pipeline. Apart from that there 2-3 other things that need to resolved to get that in production like new pipeline support for analytics and updating upstream boosters application.yaml

@ppitonak
Copy link
Collaborator

Does it make sense to deploy it to prod as an experimental feature and improve it step-by-step instead of doing a big-bang release?

@christianvogt
Copy link
Collaborator

When the updates are available in prod, please assign to UI team.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests