-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pipeline-ui] Retrieve pod logs from argo archive #2081
[pipeline-ui] Retrieve pod logs from argo archive #2081
Conversation
Hi @eterna2. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1b7fb57
to
01340b6
Compare
frontend/server/server.ts
Outdated
/** helper function to retrieve pod logs from argo artifactory. */ | ||
const getPodLogsFromArtifactory = _as_bool(ARGO_ARCHIVE_LOGS) ? k8sHelper.getPodLogsFromArtifactoryHelper( | ||
ARGO_ARCHIVE_ARTIFACTORY==='minio' ? minioClient : s3Client, | ||
ARGO_ARCHIVE_BUCKETNAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the actual log location URL be taken from the Workflow status object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
Just a question on which client to use.
The workflow status does provide the output as well as the artifactory configs.
Do u think it is better to try to infer the pre-created client to use, or retrieve the secrets and creates a new client each time?
If the latter case, should we do it for the artifacts retrieval also? i.e. skip the configuration for minio outright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question on which client to use.
AFAIK, in the Frontend the s3Client
is actually and instance of minioClient
. There is some weird logic where s3
reads data uncompressed while minio
expects tar.gz
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I feel that the UI handling of artifacts can get some improvement. If you see some ways to streamline and improve that part, I hope @jingzhang36 and other people working on UX will be glad to see that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
My other issue is IAM role for minio-js. Minio-js does not support IAM role (unlike minio-go), I made a PR to include this capability to minio-js, but they are asking for a relatively big rewrite (which I don't think I have time for it) instead of the monkey patch which I provided. Wondering if u are open to me adding a short routine to support this outside of minio-js (retrieving and updating AWS IAM tmp credentials) inside here. Because of this limitation, our current deployment relies on minio-gateway as a proxy to our s3 buckets. The crux of the issue is that in some places in kf, we use s3:// while for UI specifically, we have to use minio://. This is quite confusing for our data scientist. |
/ok-to-test |
/retest |
1 similar comment
/retest |
@Ark-kun pod logsworkflow status does provide the archive information only if the job completes without errors. So we might still need the user-provided env variables as an optional fallback. So the logic now is:
AWS instance profileA wrapper over the minio-js to mimic similar behavior as minio-go, so that pipeline ui can retrieve objects from s3 w/o access key/secrets, and instead use the ec2 instance profile (or kube2iam credentials). This means if the IAM role for the pod/ec2 running the pipeline-ui can be used to interact with s3 instead of access key/secret. k8s role/service accountUpdated manifest for ui role to be able to get secrets as well as workflow. |
}); | ||
case 'minio': | ||
try { | ||
res.send(await getTarObjectAsString({bucket, key, client: minioClient})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next PR, can you please make a function that auto-detects whether the data is gzipped and unpacks if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking whether to change from node-tar
to tar-stream
+ maybe-gzip
.
But probably need more testing to ensure it works.
I think this is a good PR that also improves the handling of data access by Frontend. |
@eterna2 Thanks for the contribution! This is a great PR. I briefly went through the change, and there's no obvious problems. I think you can start on some testing. Our integration test infra doesn't run tests on AWS, so you cannot rely on that. I recommend unit tests for the node server that covers end to end flow of your use case. Please keep in mind we can only rely on your tests to make sure we don't break your feature, so prefer tests that cover the whole flow, rather than implementation details. I need some time to gather enough context about minio, argo workflow and the exact approach you took. I will add more comments when I have time for a more detailed review. |
@Bobgy Let's try to prioritize this PR a bit since there are some UX improvements that might depend on some of the refactorings introduced here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned with 0 test coverage. @Ark-kun what's your opinion? Do you think follow up tests in separate PR or no test coverage for now is okay?
I'd prefer having at least tests that cover the happy path for new features.
I need some time to write the tests. I dun see anything specific currently in this repo thou. I do see some tests far down the line, but those are more end-2-end tests. How do u want me to include the tests? Would adding Standard jest tests that mocks some of the AWS and K8s requests to test the various fallbacks. @Ark-kun |
…ead workflow status for argo archive location for pod logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Bobgy, neuromage The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@eterna2 I got a general question - how does the argo pod log got persisted in the artifactory in the first place? where did you specify the env var such as ARGO_ARCHIVE_LOGS? Did you set it in workflow controller? |
Read closer it seems the log persistence is set in argo output artifact repository api does our DSL expose this for now? @Ark-kun |
It is not done by kfp. It is currently set by the cluster admin (i.e. via the Argo manifest). Argo allows u to config a default artifactory, and there is an archive logs flag u can set. How my PR handles this when unable to retrieve from the pod directly, is to
|
Argo artifact configmap is implementation detail that we are not intend to expose, either to cluster user or admin. @Ark-kun thoughts? |
I think it's important for cluster administrators to control the artifact storage location. P.S. I consider archiving the logs to be beneficial. We should enable it by default. Otherwise the logs are lost in non-GCP environments. |
/hold cancel |
Can anyone give a bit more details on how to enable this? From a master branch installation of Kubeflow, if I set the below env vars on
...and Is the above setup supposed to work? And is thyhere a way of enabling this for e.g. KF 0.7? |
Hmmm. It works on my cluster thou - but I am on AWS eks with my custom manifest rather than kf's manifest. Can u check what version of pipeline is deployed? And what is the returned error msg? U might need to update the pipeline-ui service account to have access to k8s secrets and Argo workflow crd. Because the under the hood, it will first try with k8s API to get the pod logs, followed by getting it from the Argo workflow crd status (which will tell u where the artifacts are stored, it will load the secrets and try to retrieve the logs). And if that fails, it will just try to read the logs based on the env var u set. I have set it to return diff error msg depending on where it failed. |
I just checked. Kf 0.7 uses 0.1.31. This PR are only merged in for versions > 0.1.34. U can try using 0.1.34 or 0.1.35. u shld also update the pipeline-ui service account to have access to k8s secrets and Argo workflow crd to. U can look at the manifest changes in this PR. This is not updated for the main kubeflow manifest repo. |
Thanks @eterna2! 0.1.34 doesn't have the change but 0.1.35 works. I used your kubeflow-aws as a basis and created a non-AWS-specific version (since I'm using Minio not S3): https://github.com/mattiasarro/kubeflow-0-7-argo-minio-logs |
@eterna2 Is this feature supposed to work for non-AWS installations (e.g. GCP) as well? There are some reports that it might not be the case. |
Yes it should work with both minio and s3. But there are a few configuration they need to set. By default it is off. What it does under the hood is just getting the workflow status and retrieving the log artifacts provided in the status. There are a few ways that it will fail. When workflow error out, sometimes the status will be incomplete - no artifact info. Then in this case, it will not work. And if secret is required to retrieve the artifacts, the UI sa must have permission to retrieve these secrets. I don't think I have updated the main kubeflow manifest for this feature. |
Is there a good place for us to document some of the configuration guide? |
A possible workaround for #1803
Currently, I persist pod logs with argo's archiveLogs config to s3 bucket.
This PR allow the UI to fallback to retrieving GCed pods logs from the argo archive artifactory - this can be either minio store or a s3 bucket.
This feature can be enabled by setting the following env variables:
Updates 20 sep 2019:
Investigate the feasibility of getting pod logs archive location from argo workflow status:
Changes
Updated @types/minio to match the minio package (
useSSL
instead ofinsecure
<- old)Created a few additional helpers modules:
aws-helper
provides utils to query and handles AWS instance profile session credentials (akakube2iam
or ec2 profiles can be used instead of providing access key and secret to minio client)workflow-helper
provides utils to retrieve pod logs archive info from argo workflow statusminio-helper
provides utils to retrieve objects from minio or s3 backends, includes capability to use AWS ec2 credentials to access s3Replace existing handlers for
s3
andminio
objects with utils fromminio-helper
Update get pod logs handler to try getting the pod logs in the following orders:
I have build the image and test on my own cluster in AWS. However, i only tested on the pod logs, have not really test on the ui-metadata yet.
TODO
This change is