Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: apply a workload Service instead of using juju created one #173

Merged
merged 7 commits into from
Jul 1, 2024

Conversation

DnPlas
Copy link
Contributor

@DnPlas DnPlas commented Jun 26, 2024

To avoid inconsistent behaviours, it is preferrable to apply and use a Service owned by the charm so it can be rendered as needed by the controller.

To avoid inconsistent behaviours, it is preferrable to apply and use a Service
owned by the charm so it can be rendered as needed by the controller.
@DnPlas DnPlas requested a review from a team as a code owner June 26, 2024 19:44
Copy link
Contributor

@NohaIhab NohaIhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @DnPlas
first, left a small comment
second, can you explain what was the issue you were seeing? what is the problem this PR is trying to solve? I'm struggling to interpret that. Would be better if it is documented in an issue so we have a reference, wdyt?

src/templates/secret.yaml.j2 Outdated Show resolved Hide resolved
@DnPlas
Copy link
Contributor Author

DnPlas commented Jun 28, 2024

hi @DnPlas first, left a small comment second, can you explain what was the issue you were seeing? what is the problem this PR is trying to solve? I'm struggling to interpret that. Would be better if it is documented in an issue so we have a reference, wdyt?

The problem is that the Service created by juju will try to be used by two different Pods (the training operator charm and the training operator workload). This will cause inconsistencies like when we try to reach the metrics endpoint. We know that the training operator workload has it, but the charm doesn't, so sometimes trying to reach this endpoint will result in:

curl -v 10.152.183.20:8080/metrics
*   Trying 10.152.183.20:8080...
* connect to 10.152.183.20 port 8080 failed: Connection refused
* Failed to connect to 10.152.183.20 port 8080 after 0 ms: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 10.152.183.20 port 8080 after 0 ms: Connection refused

To avoid having both Pods "fighting" over the same Service (which btw is not at all recommended, it goes against the logic of services), I'm proposing that the workload has its own.

Copy link
Contributor

@NohaIhab NohaIhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx @DnPlas, tested and I was able to curl the metrics endpoint successfully.
Small nit for a typo then I'm approving, feel free to ping me to approve.

src/charm.py Outdated Show resolved Hide resolved
Copy link
Contributor

@NohaIhab NohaIhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DnPlas DnPlas merged commit d93ea52 into KF-5692-training-1.8-dev-branch Jul 1, 2024
8 checks passed
@DnPlas DnPlas deleted the KF-5692-add-missing-service branch July 1, 2024 14:10
DnPlas added a commit that referenced this pull request Jul 4, 2024
…173)

* refactor: apply a workload Service instead of using juju created one

To avoid inconsistent behaviours, it is preferrable to apply and use a Service
owned by the charm so it can be rendered as needed by the controller.
DnPlas added a commit that referenced this pull request Jul 9, 2024
…` for workload, also bumps training-operator 1.7->1.8 (#167)

* pin integration test dependencies, refactor constants in tests (#164)
* refactor: deploy the training-operator with kubernetes resources (#161)
* chore: bump training-operator v1.7 -> v1.8 (#162)
* refactor: apply a workload Service instead of using juju created one (#173)
* tests: skip test_upgrade due to #170 (#171)
* build, tests: bump charmed-kubeflow-chisme 0.4.0 -> 0.4.1 (#172)

Fixes #159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants