Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high memory usage when opening a largish file #134

Closed
wkloucek opened this issue Nov 20, 2023 · 12 comments · Fixed by #136
Closed

high memory usage when opening a largish file #134

wkloucek opened this issue Nov 20, 2023 · 12 comments · Fixed by #136

Comments

@wkloucek
Copy link
Contributor

wkloucek commented Nov 20, 2023

We're running the wopiserver in Kubernetes with the wopiserver Helm Chart.

We're using resource limits for memory (https://github.com/cs3org/charts/blob/1f79697713223984ad5ca78d81de838f8cf053f0/wopiserver/values.yaml#L77-L85).

We noticed that the wopiserver will be OOM killed when one opens largish files.

I empirically found those memory limits to work out:

269 MB PDF file -> needs 600Mi memory limit
532 MB PDF file -> needs 1200Mi memory limit

In general it seems like we need to have 2x the file size as memory limit.

Background:
wopiserver v10.1.0 using CS3 api with ownCloud InfiniteScale 4.0.1

@dj4oC
Copy link

dj4oC commented Nov 21, 2023

Dear @glpatcern, could you perhaps take a look at this? That would be a great help to us. Thank you very much

@glpatcern
Copy link
Member

Hi @dj4oC and @wkloucek, I'll take a look, but I have a question: why would you open a PDF through a WOPI application? (which app do you use?)

I must say that in our experience, that is with MS Office and Collabora, the application engine typically fails to open such large files in the first place. The largest I have tried is a 180 MB pptx file with videos, which did not explode the memory, but I'd have to check with even larger files.

@glpatcern
Copy link
Member

Actually, I have a strong suspect. The xrootd interface is properly streaming the file's content, whereas the cs3 interface reads it all (in the requests buffer) and streams it to the WOPI application (likely duplicating that buffer, which would explain the 2x factor you observed).

The solution would then be to stream the content at the HTTP level, as detailed in https://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow

I can look into that but not immediately. Of course a PR is welcome!

@dj4oC
Copy link

dj4oC commented Nov 21, 2023

Thank you!
OnlyOffice supports PDF Annotations with Version 7.5 that's why we open PDF via WOPI.
Let me check whether I do find a Python Guru for a first PR

@DeepDiver1975
Copy link
Contributor

Let me check whether I do find a Python Guru for a first PR

👋 🤣

@DeepDiver1975
Copy link
Contributor

With writefile we have the same trouble as far as I can tell from the source code. PR will follow on Monday. 👋

@glpatcern
Copy link
Member

Yep, I saw that too, but in that case I think the wopi client (MS) does not stream, does it? How do you do in your php implementation?

@DeepDiver1975
Copy link
Contributor

Form my perspective a network resource is always a stream of data - regardless of what the wopi client is doing.

The writefile() declaration has to be changed as it expects all data at once. This will be a bigger change.

@glpatcern
Copy link
Member

Correct, the signature of writefile would have to be changed. But the question remains, if WOPI clients send all data at once it seems moot to stream it chunk by chunk to the storage (which in itself requires Content-Range headers for HTTP, a server capable to support them, and a rewrite of that logic including failure scenarios), when the buffer is already with the wopiserver.

It turns out that WOPI clients MAY support chunked uploads, but with a different set of APIs that belong to the CSPP Plus program: https://learn.microsoft.com/en-us/microsoft-365/cloud-storage-partner-program/plus/file-transfer/multi-request-file-upload-overview

I guess we leave things as they are...

@DeepDiver1975
Copy link
Contributor

WOPI clients are performing POST requests. POST headers contain the Content-Length header which tells us the total length. The content is a stream by the technical nature and the Flask request object allows us to access it as stream via https://flask.palletsprojects.com/en/3.0.x/api/#flask.Request.stream

Looking at the implementation of get_data() - it is read from the stream as well.

All we need to do is (same as within readfile()) to iterate over the stream.

@glpatcern
Copy link
Member

Reopening until we test #141

@glpatcern glpatcern reopened this Jan 9, 2024
@glpatcern
Copy link
Member

#141 was tested and merged, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants