Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Operator] ShuffleSever cannot be deleted even though there are no more application. #1629

Closed
3 tasks done
zhengchenyu opened this issue Apr 9, 2024 · 4 comments · Fixed by #1630
Closed
3 tasks done

Comments

@zhengchenyu
Copy link
Collaborator

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

Even though there are no more application, the pod of shuffle server cant not be deleted, will throw there are some apps still running in shuffle server.
The log of webhook show: get last app number of (xx.xx.xx.xx) failed: json: cannot unmarshal string into Go struct field MetricItem.metrics.value of type float32
The problem is that since #1286 , add Summary metrics, may return "NaN", is not string but not number.

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@rickyma
Copy link
Contributor

rickyma commented Apr 9, 2024

To digress a bit, are you running the operator based on the latest version of the code? Have you encountered any other issues during the process? Is Uniffle currently stable on k8s? Because we are going to deploy on K8S too.

@zhengchenyu
Copy link
Collaborator Author

zhengchenyu commented Apr 9, 2024

To digress a bit, are you running the operator based on the latest version of the code? Have you encountered any other issues during the process? Is Uniffle currently stable on k8s? Because we are going to deploy on K8S too.

Yes, now I develop on master, almost the latest version.
I think the k8s just start shuffle server, there was no problem deploying on the k8s, the stability problem mainly comes from the shuffle server

@rickyma
Copy link
Contributor

rickyma commented Apr 9, 2024

To digress a bit, are you running the operator based on the latest version of the code? Have you encountered any other issues during the process? Is Uniffle currently stable on k8s? Because we are going to deploy on K8S too.

Yes, now I develop on master, almost the latest version. I think the k8s just start shuffle server, there was no problem deploying on the k8s, the stability problem mainly comes from the shuffle server

Sounds good. We will use it soon, maybe in this month.

@jerqi
Copy link
Contributor

jerqi commented Apr 10, 2024

To digress a bit, are you running the operator based on the latest version of the code? Have you encountered any other issues during the process? Is Uniffle currently stable on k8s? Because we are going to deploy on K8S too.

Yes, now I develop on master, almost the latest version. I think the k8s just start shuffle server, there was no problem deploying on the k8s, the stability problem mainly comes from the shuffle server

Sounds good. We will use it soon, maybe in this month.

The author of the operator is wangao of Tencent. You can communicate with him.

advancedxy pushed a commit that referenced this issue Apr 16, 2024
)

### What changes were proposed in this pull request?
When parsing json, handle special cases where the value might be NaN

### Why are the changes needed?
Fix: #1629 

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
test in real cluster and add unit test
jerqi pushed a commit that referenced this issue Apr 30, 2024
)

### What changes were proposed in this pull request?
When parsing json, handle special cases where the value might be NaN

### Why are the changes needed?
Fix: #1629 

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
test in real cluster and add unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants