Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve copy promises #2038

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

sjawhar
Copy link

@sjawhar sjawhar commented Nov 20, 2024

The promise returned by cpToPod never resolves. Followed the example from here to fix: #982 (comment)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sjawhar
Once this PR has been reviewed and has the lgtm label, please assign brendandburns for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

linux-foundation-easycla bot commented Nov 20, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @sjawhar!

It looks like this is your first PR to kubernetes-client/javascript 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-client/javascript has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 20, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 20, 2024
sjawhar added a commit to METR/vivaria that referenced this pull request Nov 21, 2024
We gotta stop using `echo` and the like for score logs, it can get too
big and need to use stdin and files.

Details:
* Well, k8s javascript client [has a
bug](kubernetes-client/javascript#2038), so I
re-implemented a fixed version of it.
* Tested it out locally using kind. So I had to make the k8s setup work
with non-EKS clusters.
* Also documented how to set up local k8s development environment while
I was at it

Testing:
* the automated tests honestly aren't great here. Would feel safer
having integration tests against an actual k8s cluster
* But here's a screenshot showing a working run, which requires copying
`settings.json` into the pod
<img width="1231" alt="image"
src="https://github.com/user-attachments/assets/07512016-fa9e-4d7a-953a-c6a0445c32fb">

* I also tested that I was able to copy a large score log that broke the
previous version of the function
* Here's a task test

![image](https://github.com/user-attachments/assets/cc3dd29d-a266-4de8-b29a-1d85a37c147b)

* Test of a big score log
<img width="1851" alt="image"
src="https://github.com/user-attachments/assets/a450fdf1-a375-40fd-bcc1-20c4c438698b">
@mstruebing
Copy link
Member

Hey @sjawhar thanks for your pr but I think the tests are broken, could you have a look?
You can run them locally via npm test

src/cp.ts Outdated
@@ -66,6 +66,11 @@ export class Cp {
}
},
)
.then((conn) => {
conn.onclose = (event) => {
resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a resolve above at line 63. I don't think that this resolve is required. In fact I'm not sure it's what we want, because this will resolve when the websocket is returned. We want to resolve when the copy completes.

Copy link
Author

@sjawhar sjawhar Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That callback only gets called if status != null:

javascript/src/exec.ts

Lines 53 to 62 in 8db906a

const conn = await this.handler.connect(path, null, (streamNum: number, buff: Buffer): boolean => {
const status = WebSocketHandler.handleStandardStreams(streamNum, buff, stdout, stderr);
if (status != null) {
if (statusCallback) {
statusCallback(status);
}
return false;
}
return true;
});

For copies (or maybe for all exec calls that use stdin), status == null, so the callback never gets called, so the promise never resolves. awaiting calls to cpToPod hang forever.

Copy link
Author

@sjawhar sjawhar Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the copy isn't complete when the websocket connection is closed, how can the caller ever know that it has completed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that you are doing this on the 'close' event. My mistake. But we should probably make sure that we don't resolve the promise twice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added a check for that in a way that shouldn't cause things to break if changes elsewhere cause this logic to change.

@sjawhar
Copy link
Author

sjawhar commented Nov 23, 2024

Hey @sjawhar thanks for your pr but I think the tests are broken, could you have a look? You can run them locally via npm test

Yeah, sorry about that, I had quickly made the edits in the browser to make sure I didn't forget. Now I've actually, like, tested it and stuff 😅

@cjihrig
Copy link
Contributor

cjihrig commented Nov 23, 2024

Is there a test that can be added for this?

@sjawhar
Copy link
Author

sjawhar commented Nov 23, 2024

Is there a test that can be added for this?

I've spend some time on it but didn't figure out the different mocks well enough to understand why the tests were passing before instead of timing out as expected. The lines in the test that send a V1Status seem not to match actual behavior. Removing them causes the cp promise to never resolve (even after the fix), so some other change is also needed.

If it helps at all, I've read through the code and set breakpoints to interactively debug and this the issue largely comes down to this:

public static handleStandardInput(
ws: WebSocket.WebSocket,
stdin: stream.Readable | any,
streamNum: number = 0,
): boolean {
stdin.on('data', (data) => {
const buff = Buffer.alloc(data.length + 1);
buff.writeInt8(streamNum, 0);
if (data instanceof Buffer) {
data.copy(buff, 1);
} else {
buff.write(data, 1);
}
ws.send(buff);
});
stdin.on('end', () => {
ws.close();
});

It looks to me like when stdin is provided, the websocket is closed without waiting for a StatusStream response, so the status callback never gets called. Maybe that'll help you fix the tests, or maybe the right fix is actually upstream of this logic (maybe #2037 ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cp.cpToPod() does not work
5 participants