-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"OverflowError: string longer than 2147483647 bytes" when trying requests.put #2717
Comments
Rather than reading the entire file and sending it across in a single request, would it be possible for you to use chunked transfer encoding? http://docs.python-requests.org/en/latest/user/advanced/#chunk-encoded-requests |
This limitation is in datafile = 'someHugeFile'
with open(datafile, 'rb') as myfile:
resp = requests.put(url, data=myfile, verify=False) |
@Lukasa that's inaccurate. The traceback comes from an SSL wrapped socket. This has nothing to do with httplib from what I can see. |
Unfortunately it looks like it cannot be avoided when you do a POST request with several headers. Then the file (or the files) is always read completely. It would be great when this could be avoided in requests since I often have to send files which are longer than the available main memory on the system. |
I'm just gonna chime in. If you're trying to send files via the Web that are larger than your On Tue, 11 Oct 2016, 5:54 AM Erik Tews notifications@github.com wrote:
|
That's true, but I don't get to decide on the protocol and endpoint. Doing the request with curl works fine, and as a workaround, I'm currently printing a curl command to STDOUT so that the user can launch it. |
@eriktews can you share how you're doing the upload? There are ways to stream uploads (like Lukasa's comment shows). Is there a reason you cannot do that (if you are not already trying that)? Also, can you provide your actual traceback? |
So the comment from Lukasa seems to work when you are uploading a single file, then you can do a streaming upload. But I have to do a normal post request with several variables in the data part and the file as a part of a multipart upload. There is an API documentation at https://canvas.instructure.com/doc/api/file.file_uploads.html which shows a curl command in the "Step 2" section. Basically I wanna replicate that call with the requests package in streaming mode. I don't have the traceback at the moment, but when I get it, I will post it here. |
Have you tried using the |
No, but that looks like right what I need. I will give it a try and see whether this works. I didn't know about the toolbelt package at all, so maybe you should reference it in the normal requests package documentation. |
It is :) |
@Lukasa 's method does not work - even with httplib off the signing still happens for the transport itself. In my case I have a 2GB+ POST request (not a file, just POST data). This is for an elasticsearch bulk update. The endpoint only has HTTPS so I'm working through other solutions now.
|
Sorry, can you demonstrate your code please? |
This should throw the error:
If |
@adamn That was not my proposed solution. My proposed solution was to not read the file in manually at all. You are bumping into the same error as before, which is that we are sending a single gigantic string to httplib. This is a behaviour we can fix: if we spot someone uploading a gigantic single string via Python then we can resolve it. But at this point I strongly recommend you use an intermediary file object: either one on disk, or by doing the urlencoding yourself and wrapping the result in a |
I've already come up with a workaround so won't be able to dig deeper into this unfortunately. I still suspect that the SSL payload needs to be signed/encrypted so the same thing will happen regardless of whether there is a file object or not since the exception is raised by |
@adamn No, that's not necessary. TLS uses stream encryption, it does not need the entire payload at once. What you're missing is that when given a file object, requests will automatically stream it in smaller chunks (specifically, 8192 kb chunks). Those cause no problem. |
Sorry to comment on an old issue, but this looks similar to an issue we've run into and I'm trying to decide whether it's worth opening a new issue for it. Again, This behaviour is worse that an exception being raised. |
Has the remote peer been ACKing at the TCP level? Is it still reading from the receive buffer? Has it TCP FIN'd? |
Yes, the remote end is sending ACKs appropriately, no FIN or anything like that. In fact, if you have a large file |
Hrm. Is it possible for you to put together a small repro scenario? Do you see the same effect with other hosts? |
As luck would have it, I have already done so. Github doesn't seem to want to let me attach files, so:
And a server to run it against:
Obviously this isn't the server we were running against when we first encountered this problem :) |
Well as a first logical note I should point out that this is necessarily not the same problem as originally reported on this issue, as the original report affected TLS only, as discussed above. 😉 Regardless, let's dig into this a bit. |
Ah, sorry. Is it worth me opening a new issue then, or should I just leave it here, since you're already looking at it? |
Let's leave it here for now. =) |
Huh. That behaves...very oddly. On my machine, over the loopback, I don't see any data sent at all: it's like Requests just gave up on sending it. Further debugging seems to show this is happening at the level of Naturally, the reason this happens is the same as the reason Python does lots of other stupid crap: In fact, it definitely is, since the current Python master has a changed That makes this ultimately a duplicate of CPython issue #18100. This has been open a long time in need of patch review, and given that Python 2.7 is now only getting security fixes I doubt the CPython developers will fix it at this point. This is a difficult issue for Requests to sensibly police. We can tell when people will definitely hit it (e.g. because the input is a string which a length greater than 2GB), but there are many situations where people will hit it but we can't tell (e.g. because the string plus the headers is greater than 2GB in size, or because there is a different type in use that CPython will treat as "stringish" that is larger than 2GB). So my initial inclination is, given that this is an issue that can be solved by moving to a newer version of Python, and that it can be worked around by not reading gigantic strings into memory (which is a best-practice anyway), and that if we ever move off httplib we'll fix it automatically anyway, I'm inclined to suggest that we probably don't have a huge pressure to resolve the issue? For my part, I think this is getting pretty close to "Dr, it hurts when I do this." "So don't do that then!" territory. However, I'm willing to be disagreed with here. |
The workaround is really simple - just wrap it in StringIO/BytesIO, but when you run into it, it's difficult to diagnose, so any help that requests could give, even if it's just a warning in the documentation, would be appreciated. |
I can get behind the idea of a PR that adds a note in the documentation. |
Running into the same problem,I dont this thread has an accepted solution,anyone has a solution for this ? |
This also results in an OverflowError in
The file is 3GB in size. |
@SpoonMeiser Wrapping the file contents in a |
I'm having the same basic issue as @gjedeer and see the same behavior as @cmbasnett (that wrapping in BytesIO is not a solution). I'm trying to use a file object to upload something larger than 2GB using a TLS encrypted post. Specifically I'm trying to use a presigned url to upload a file to S3. It appears that the underlying ssl library in python doesn't like files over 2GB. Is there an accepted workaround to this? Stack trace: Basic code:
|
Requests can't handle a put call for very large data objects. However, it can accept the data as a file-like object instead, and the size issue will not show up. Documented here: psf/requests#2717. Issue: IT-19717 Change-Id: I826d3fa2eebbd3ba0389a0d7042701b4389e406d Signed-off-by: Eric Ball <eball@linuxfoundation.org>
Regarding the newer (since 2018) questions for a solution here: The requests-code to upload was the following:
...leading to these kwargs: Like this it worked perfectly for me. |
Hi,
I'm trying to upload a file that weight about 3GB and I'm getting the following error:
"OverflowError: string longer than 2147483647 bytes"
If I understand correctly it seems like there's a 2GB limit? didnt manage to find any reference to such limiation or how to bypass it (if possible).
The code i'm using is:
For smaller files this code works fine for me.
The text was updated successfully, but these errors were encountered: