Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing memory leak under normal usage. #92

Closed
SkPhilipp opened this issue Jun 2, 2017 · 8 comments
Closed

Seeing memory leak under normal usage. #92

SkPhilipp opened this issue Jun 2, 2017 · 8 comments

Comments

@SkPhilipp
Copy link

SkPhilipp commented Jun 2, 2017

Hi, we're using image proxy in a docker container, running on our on-premise Kubernetes environment.

When trying it out in a load test for the same image, we see memory use going up, and then our system killing the process eventually when it takes up too much memory. This happens very quickly when we have somewhat of a reasonable load on it. Our docker image is practically the same as https://hub.docker.com/r/willnorris/imageproxy/~/dockerfile/ except that we start it with a domain whitelist. Our configuration with which it is started is CMD /go/bin/imageproxy -addr 0.0.0.0:80 -whitelist (... list of our domains )

We simply GET /400x,q80/https://cmgtcontent.ahold.com.kpnis.nl/cmgtcontent/media//001746500/000/001746560_001_superhero_BBQ_170523_(1).jpg. After about 20_000 requests the docker instance reaches its memory limit, after which it is killed.
Under no usage, it stays at its normal ~30 MB.

@willnorris
Copy link
Owner

Interesting. And just to ask a dumb question which you sort of already answered, but I'll ask anyway... you're not using the in-memory cache right? (by passing -cache memory) 😄

@SkPhilipp
Copy link
Author

We didn't pass any additional caching options, Memory is default right? Anyway, we're only pulling the same image with the same parameters, I'd expect the in-memory cache to fill up if I were requesting many different images, but this is just the same image over and over.

@Shimmi
Copy link

Shimmi commented Jun 14, 2017

Hi,
we faced the same problem. Although we tried three different cache providers it did not worked well for us and it was killing the machine. We tried in-memory, S3 and file cache. We did some tests and even for one single link with same parameters we get (we think) somehow non-cached image and with each request the memory went higher and higher resulting in the proxy beeing killed requesting the same URL for approx 30 times in a row.

We saw the image was properly saved to cache (tested on S3 and file systems) but it was not loaded. For every reload of the image the image was resaved to cache.

After those tests, we configured nginx proxy cache, and the server went from beeing constantly overwhelmed and killed twice a day to almost zero utilization. Weird... :)

@willnorris
Copy link
Owner

Memory is default right?

Default is no caching at all.

Based on @Shimmi's additional info, this definitely sounds like an issue. I'll try to look into it.

@willnorris
Copy link
Owner

okay, I'm pretty sure I've figured out what is going on here. The short answer is that transformation was being performed on already cached images, resulting in a bunch of extra memory allocations and CPU usage. A fix will be pushed shortly.

A slightly longer explanation is below.

HTTP Caching

First, some background which you probably already know. There are two main ways of doing caching in HTTP. The Expires and Cache-Control headers allow the server to specify the lifetime of a resource such that a client can continue to use a cached copy for a specified amount of time. If the client already has the resource cached and it hasn't expired, then no additional round trips are needed at all to use the resource.

The Etag and Last-Modified headers allow a client that has a cached version of a resource to check with the server to see if a new version is available. This always requires an additional round trip to the server, but assuming there have been no changes, the server can simply respond with a 304 status code, which instructs the client to use their cached copy. There's no need to send the resource over the wire again.

httpcache

Imageproxy uses the gregjones/httpcache package to handle caching. For connections between imageproxy and remote servers, httpcache takes care of everything for us, including both flavors of caching mentioned above. It enforces all the right validation checks, sends etags to the server when needed, etc. If the server responds with a 304, then httpcache will return its cached copy with a 200 status. Unless you inspect the X-From-Cache header, there's no way to know if the response came from the cache or the remote server.

For "downstream" connections (those between the browser and imageproxy), we reuse the same caching headers as the upstream resource. That is, we use the same values for Expires, Cache-Control, Etag, and Last-Modified. For Expires and Cache-Control, this means that clients can just keep using their cached versions without talking to us again. But for Etag and Last-Modified, we need to handle those requests and respond with a 304 status when appropriate. Currently, this is happening here.

So, for a remote image (like this one) that serves an Etag and Last-Modified header but not an Expires or Cache-Control header, then the flow looks something like:

  1. Check for transformed image in cache
    • found it in the cache, but it doesn't have an Expires or Cache-Control, so we need to check to see if the original image has been updated
  2. Check for original (non-transformed) image in cache
    • found it in the cache, but it doesn't have an Expires or Cache-Control, so we need to check with the server to see if it's been updated
  3. Refetch the remote image, including the etag and last-modified values in the request
    • remote image hasn't been updated (got a 304 response from remote server), so use the cached versions
  4. Transform the image as requested
  5. Check if the original request contained etag or last-modified data
    • one or both match, so we can return a 304 response

Steps 1-3 still need to happen to make sure the remote image hasn't changed, but steps 4 and 5 should be flipped. That was just an oversight on my part when I originally implemented this and never noticed. We were still returning a 304 response, so from the client's perspective everything looks fine. But inside the server, we're doing a transformation in step 4 that isn't needed.

Anyway, like I said... fix will be pushed shortly. I'll leave this bug open until you confirm that the new version seems to have fixed the problem for you.

@willnorris
Copy link
Owner

oh, also meant to add that I was able to easily replicate imageproxy using hundreds of megs of memory when requesting the same cached image thousands of times (rakyll/hey is great for testing this by the way). After the forthcoming fix, it stayed constant at 65mb even with 20,000 requests.

willnorris added a commit that referenced this issue Jun 15, 2017
If the caching headers in the request are valid, return a 304 response
instead of doing the transformation.

Ref #92
@Shimmi
Copy link

Shimmi commented Jun 15, 2017

@willnorris Many thanks! That was fast :)

I've tested it and can confirm the memory consumption is OK for me.

But we are still facing big CPU load and the images beeing re-modified (tested with the file cache).

Hitting F5 for the same URL image for 30x times resulted in high CPU load:
imageproxy_htop

Also the file in file cache is beeing modified with every new request:
imageproxy_filecache

Behaviour, I would expect is if the source image or URL parameters does not change, the image proxy should simply take the image from the cache and do not touch it again. Not sure what causes this...

@willnorris
Copy link
Owner

okay, I'm going to close this as having fixed the original reported problem, the memory leak. Please open a new issue to focus on the high CPU load, and I'll try to investigate.

As for the cached file being rewritten, that will end up needing to be fixed in the httpcache package. I'd suggest opening a bug there, and I'll try and take a look at what would be involved in fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants