-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse connections for S3 Get requests #247
Comments
For what it's worth, this is the commit I used to fix it. There is an option to turn on Note that the connection will close based on the ReadTimeout, which seems to be 2 seconds if left as 0. Then next request made to the client after this timeout will fail and need to be retried. Still working on a fix for this. |
@paulmach thanks, |
I just created a PR, let the discussion start :) https://github.com/crowdmob/goamz/pull/248 It looks like this issue has been attacked before (https://github.com/crowdmob/goamz/pull/171, https://github.com/crowdmob/goamz/issues/108) but the changes were reverted. I don't know why. For starters, the golang net/http documentation states "Clients and Transports are safe for concurrent use by multiple goroutines and for efficiency should only be created once and re-used." In my experience, using this library for a long time, s3 can have many issues. You can end up waiting forever, odd errors etc. so I've been setting low timeouts and implementing retries in my code. I think this change will only make that better as problems during connection creation will happen less often as there will less connections created overall. The net/http Client maintains it's own connection pool with transport.MaxIdleConnsPerHost in the pool at anytime (default is 2). So every time you make a request, it checks the pool, if there isn't a connection is makes a new one. When you're done, if it's not closed, it puts it back, unless there are already too many waiting, then it just closes it. The problem you can run into is you get a connection from the pool that originally had a ReadTimeout of 60 seconds, but now it only has 100ms left (it's been 59.9 seconds). During your 200ms request to s3 the connection will close and you'll get a io timeout error bubble up to the library user. However, if it's been 61 seconds since the connection was created, the http.Client will create a new connection. Anyway, I've been digging into this quite a bit the last few days and that's the way I understand. |
Maybe among more recent contributors to /s3 also @cfstras , @jtblin , @drombosky , @dylanmei , @jigish sorry if I missed others. |
So I've been running this code over the weekend. First, for bulk usage, this has fixed my socket issues. I'm doing 1000qps with my script using this branch no problem. Before I would run out of local TCP ports. So connection reuse is working. For my other workload that is probably more typical (web request requires 5ish 10kb concurrent calls from s3), the results are mixed. Connection errors are down, obviously, since I'm making less of them. I still get a few errors, but they seem to fail fast and my retry code handles them fine (and the old/current version of this lib was getting these too, probably an s3 thing). The bigger issue is sometimes the read can take 2+ seconds vs the average 100-200 milliseconds to complete the get. I think this due to the ReadTimeout I'm setting. I'm setting it to 15 minutes, so the s3 connections lives that long. However, when making a "bad" s3 request there is virtually no timeout. Previously I had the ReadTimeout set to 1 second and then my code would retry, bailing on these bad requests. I'm not sure how to get a ReadTimeout, ConnectTime and a ConnectionTimeout. The I'm going to implement a timeout in my code to handle this, but I'm not sure how to deal with this issue in the context of this library :( |
Thanks for sharing your results, I wish I had something to say or help, but I don't. Please keep up the good work. |
@paulmach So, at this moment, whats the eventual solution for this ? |
I don't know but I suggest having a look at https://github.com/goamz/goamz/blob/master/aws/client.go [ this is not the crowdmob/goamz it is goamz/goamz ]. |
@alimoeeny Thanks for the pointer. But, I think I am missing something here. |
Hi @meson10 sorry, I don't know, I must have confused two set of conversations and the threads got mixed in my mind. I hope I have not wasted much of your time. |
For each bucket.Get() a new http.Client is created (https://github.com/crowdmob/goamz/blob/94ebb8df2d498469ee950fbc4e2ab3c901dea007/s3/s3.go#L1023) Also, the request is explicitly told to close after finish (https://github.com/crowdmob/goamz/blob/94ebb8df2d498469ee950fbc4e2ab3c901dea007/s3/s3.go#L1004). So each request requires a new TCP connection and this creates overhead as well as leaving many sockets in a TIME_WAIT state if you're doing a lot of connections.
I've "hacked in a fix" where the http.Client is only created once for each Bucket object. and then the req.Close is set to false. This fixes the issue and it does reuse connections properly. However, I'm not totally sure how to fix this in the context of this library, specifically the timeouts would not be fixed.
This may apply to all S3 requests, but I'm currently only interested in downloading lots of data fast. :) My use case is downloading small (10-50k) files from S3 to an ec2 instance. At rates above 200 qps I get lots of sockets in the TIME_WAIT state and I can't create new connections until then close out. This fix allowed me to max out at 400 qps (from 300) and the code CPU bound presumably the logic of processing the data.
The text was updated successfully, but these errors were encountered: