Skip to content

r.text and r.json() return different results *in some cases* #5667

@jjmaldonis

Description

@jjmaldonis

This might be an interesting one... I found that r.text and r.json() can return different results in some specific cases. I don't understand why the difference between the two cases is changing the result of r.text.

Maybe r.text should always default to using utf-8 as the decoding if application/json is set as the response's content type, following https://www.ietf.org/rfc/rfc4627.txt (ctrl+f for JSON text SHALL be encoded in Unicode.).

I'm using the latest version of requests.

Expected Result

I would expect r.text and r.json() to return ~ the same thing. More specifically, I would expect json.loads(r.text) and r.json() to return the same thing, but the issue seems to be with r.text's decoding specifically.

Actual Result

In the following code, I am making a sample request and replacing the request's response with custom content so we have full control over it. The custom content is utf-8 encoded. In the next version of this code, you'll see the name change when it shouldn't.

import requests
import json

r = requests.get("https://api.covidtracking.com/v1/us/current.json")
r._content = b'{"name":"rd\xce\xba"}'  # This is utf-8
r.headers = {
    "Content-Type": "application/json",
}
print(r.json())
print(json.loads(r.text))

The above code prints:

{'name': 'rdκ'}
{'name': 'rdκ'}

which is fantastic.

Replacing the request's content with b'{"name":"rd\xce\xba","uuid":"1234"}', which simply adds a uuid field to the JSON, and running the code again prints:

{'name': 'rdκ', 'uuid': '1234'}
{'name': 'rdκ', 'uuid': '1234'}

The name is different even though it did not change at all! The existence of "uuid":"1234" in the response's contents somehow changes the decoding. I have no clue why.

Reproduction Steps

Run this code:

import requests
import json

r = requests.get("https://api.covidtracking.com/v1/us/current.json")
r._content = b'{"name":"rd\xce\xba","uuid":"1234"}'
r.headers = {
    "Content-Type": "application/json",
}
print(r.json())
print(json.loads(r.text))

The issue should be fixed when the two print statements match... I think.

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": "2.7"
  },
  "idna": {
    "version": "2.8"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.4"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "1010103f",
    "version": "19.0.0"
  },
  "requests": {
    "version": "2.25.0"
  },
  "system_ssl": {
    "version": "1010104f"
  },
  "urllib3": {
    "version": "1.24.2"
  },
  "using_pyopenssl": true
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions