Description
About
CrateDB’s HTTP interface supports gzip and deflate compressed requests, but the crate-python client currently does not utilize this capability. Adding request compression would reduce bandwidth usage, improve performance for large queries and bulk inserts, and align crate-python with best practices seen in other database clients.
As a user, I want the option to send compressed requests to CrateDB to improve performance on congested networks.
Requirements:
- Add a configuration option to enable request compression (gzip or deflate) when sending requests to CrateDB.
- The default should enable compression
- TBD: Introduce a size threshold to determine when compression is applied.
Context: Sending a Content-Encoding header for every request adds unnecessary overhead, so compression should only be used when the request size exceeds a configurable threshold (e.g., 1 KB, 2 KB, or 4 KB, similar to other libraries).
Warning
This is primarily about request encoding / compression. HTTP response encoding is vulnerable to BREACH and therefore requires additional measurements.
@proddata said:
It seems like CrateDB's HTTP interface accepts gzip / deflate compressed data.
It might also be interesting to add this capability tocrate-python
.
@surister said:
import gzip
import json
import requests
objects = [
[1, "test"] for _ in range(200_000)
]
body = {
"stmt": "INSERT INTO t VALUES (?, ?)",
"bulk_args": objects
}
response = requests.post('http://192.168.88.251:4200/_sql', json=body)
print(response.request.headers.get('content-length'))
response = requests.post('http://192.168.88.251:4200/_sql',
data=gzip.compress(json.dumps(body).encode('utf8')),
headers={'Content-Encoding': 'gzip',
'Content-Type': 'application/gzip; charset=utf-8'})
print(response.request.headers.get('content-length'))
2600054
5149