Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download large file from S3 cause memory rise in the init container #1322

Closed
ghost opened this issue Apr 12, 2019 · 19 comments
Closed

Download large file from S3 cause memory rise in the init container #1322

ghost opened this issue Apr 12, 2019 · 19 comments
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc type/bug

Comments

@ghost
Copy link

ghost commented Apr 12, 2019

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

What happened:
demo argo yaml:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: demo-
spec:
  entrypoint: start

  templates:
    - name: start
      dag:
        tasks:
          - name: demo
            template: demo

    - name: demo
      inputs:
        artifacts:
        # models is 48G
        - name: models
          path: /s3/models
          s3:
            endpoint: ***
            bucket: ***
            key: ##
      container:
        image: demo:beta
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "15"
            memory: "60G"
          limits:
            cpu: "15"
            memory: "65G"

we download a big directory(/s3/models) of 48G from s3, we find that the memory of init container is increate to 50G!
991554972841_ pic_hd

What you expected to happen:
the memory of init container is not very high.

Anything else we need to know?:
No

Environment:

  • Argo version: v2.2.0
  • Kubernetes version : Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.6", GitCommit:"a21fdbd78dde8f5447f5f6c331f7eb6f80bd684e", GitTreeState:"clean", BuildDate:"2018-07-26T10:04:08Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
@ghost ghost changed the title Download large file from s3 cause memory create in init container Download large file from s3 cause memory rise in init container Apr 12, 2019
@jessesuen
Copy link
Member

S3 is implemented using the minio go-client. We can explore if theres way to curtail memory usages with that client. This issue seems to indicate it's possible (at least with Put):

minio/minio-go#1081

@ghost
Copy link
Author

ghost commented Apr 12, 2019

I found a method to reduce memory use with go-client, I will try and submit a PR 😄

@jessesuen
Copy link
Member

Thank you!

@alexec
Copy link
Contributor

alexec commented Jul 10, 2020

See #3376

@alexec
Copy link
Contributor

alexec commented Jul 10, 2020

We could try upgrading the Minio Go client and see if that helps.

@alexec
Copy link
Contributor

alexec commented Jul 16, 2020

See #2748

@alexec alexec self-assigned this Jul 31, 2020
@alexec alexec changed the title Download large file from s3 cause memory rise in init container Download large file from S3 cause memory rise in the init container Jul 31, 2020
@alexec
Copy link
Contributor

alexec commented Jul 31, 2020

argoproj/pkg#24

@alexec alexec removed their assignment Aug 3, 2020
@alexec alexec added wontfix and removed backlog labels Aug 3, 2020
@stale stale bot closed this as completed Aug 10, 2020
@akloss-cibo
Copy link

Can we un-stale this? Probably just removing the memory overcommit in the init container would be enough to mostly fix it.

@sirakav
Copy link

sirakav commented Sep 16, 2021

This is causing a lot of issues for us when dealing with large artifacts and it forces us to use really high memory limits on init containers when 99% of the time they don't need them.

@sarabala1979
Copy link
Member

@sirakav which argo version are you using? can you provide the what issues you are facing?

@sirakav
Copy link

sirakav commented Sep 16, 2021

Currently, I am using the v3.2.0-rc2 version, but this issue was also present when I used v3.1.9.

The main issue is high init container memory usage which is caused by large artifacts.

For eg. today I had a workflow that generated more than 200GB of output artifacts in 4096 files.
All of these files had to end up in a reduce type task that would simply join them together.
This task immediately fails because of the init container's high memory usage when downloading inputs artifacts from S3.
In fact, the init container used more than 2Gi of RAM and even received OOM kill when given 3Gi limits.

This is a deal-breaker for me when using Argo Workflows for certain DAG's because I feel uncomfortable giving away the huge amounts of resources to init containers just to download files from S3.

I know this could be solved by using volume mounts, but this should also work while using the whole artifact system backed by S3 without using up a lot of resources.

This original issue sums up everything pretty well, but if you need I can provide more specific details like monitoring information or manifests.

@akloss-cibo
Copy link

FWIW, as I think of it, as long as the sum of the resources of the init containers is less than the sum of the resources of the regular containers, you're not wasting anything.

@sirakav
Copy link

sirakav commented Sep 16, 2021

@akloss-cibo I would agree but in my case, the init container resources would be a lot higher than the main container's.

This would also waste resources because the executor resource configuration is global and I run a lot of small workflows (small artifacts) in the same namespace.

@alexec
Copy link
Contributor

alexec commented Sep 16, 2021

There is an open ticket to stream the data to/from artifacts.
Aside - Argo Dataflow streams data by default, so does not have this problem.

@jessesuen
Copy link
Member

This is a deal-breaker for me when using Argo Workflows for certain DAG's because I feel uncomfortable giving away the huge amounts of resources to init containers just to download files from S3.

Isn't an easy workaround for this just limit memory to the argo exec container? The executor: setting in the controller:

https://argoproj.github.io/argo-workflows/workflow-controller-configmap.yaml

@sarabala1979
Copy link
Member

sarabala1979 commented Sep 17, 2021 via email

@hbrewster-splunk
Copy link

Is it possible to override the argo wait container resource limits on a per-workflow basis? This would remove the wastage. +1 for this issue

@sarabala1979
Copy link
Member

sarabala1979 commented Jun 1, 2022 via email

@tooptoop4
Copy link
Contributor

i think this is similar to #9525 i'm facing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc type/bug
Projects
None yet
Development

No branches or pull requests

8 participants