-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
Description
This might be an interesting one... I found that r.text
and r.json()
can return different results in some specific cases. I don't understand why the difference between the two cases is changing the result of r.text
.
Maybe r.text
should always default to using utf-8
as the decoding if application/json
is set as the response's content type, following https://www.ietf.org/rfc/rfc4627.txt (ctrl+f for JSON text SHALL be encoded in Unicode.
).
I'm using the latest version of requests.
Expected Result
I would expect r.text
and r.json()
to return ~ the same thing. More specifically, I would expect json.loads(r.text)
and r.json()
to return the same thing, but the issue seems to be with r.text
's decoding specifically.
Actual Result
In the following code, I am making a sample request and replacing the request's response with custom content so we have full control over it. The custom content is utf-8 encoded. In the next version of this code, you'll see the name
change when it shouldn't.
import requests
import json
r = requests.get("https://api.covidtracking.com/v1/us/current.json")
r._content = b'{"name":"rd\xce\xba"}' # This is utf-8
r.headers = {
"Content-Type": "application/json",
}
print(r.json())
print(json.loads(r.text))
The above code prints:
{'name': 'rdκ'}
{'name': 'rdκ'}
which is fantastic.
Replacing the request's content with b'{"name":"rd\xce\xba","uuid":"1234"}'
, which simply adds a uuid
field to the JSON, and running the code again prints:
{'name': 'rdκ', 'uuid': '1234'}
{'name': 'rdκ', 'uuid': '1234'}
The name
is different even though it did not change at all! The existence of "uuid":"1234"
in the response's contents somehow changes the decoding. I have no clue why.
Reproduction Steps
Run this code:
import requests
import json
r = requests.get("https://api.covidtracking.com/v1/us/current.json")
r._content = b'{"name":"rd\xce\xba","uuid":"1234"}'
r.headers = {
"Content-Type": "application/json",
}
print(r.json())
print(json.loads(r.text))
The issue should be fixed when the two print statements match... I think.
System Information
$ python -m requests.help
{
"chardet": {
"version": "3.0.4"
},
"cryptography": {
"version": "2.7"
},
"idna": {
"version": "2.8"
},
"implementation": {
"name": "CPython",
"version": "3.7.4"
},
"platform": {
"release": "10",
"system": "Windows"
},
"pyOpenSSL": {
"openssl_version": "1010103f",
"version": "19.0.0"
},
"requests": {
"version": "2.25.0"
},
"system_ssl": {
"version": "1010104f"
},
"urllib3": {
"version": "1.24.2"
},
"using_pyopenssl": true
}