Skip to content

Support HTTP compression #700

Open
Open
@amotl

Description

@amotl

About

CrateDB’s HTTP interface supports gzip and deflate compressed requests, but the crate-python client currently does not utilize this capability. Adding request compression would reduce bandwidth usage, improve performance for large queries and bulk inserts, and align crate-python with best practices seen in other database clients.

As a user, I want the option to send compressed requests to CrateDB to improve performance on congested networks.

Requirements:

  • Add a configuration option to enable request compression (gzip or deflate) when sending requests to CrateDB.
  • The default should enable compression
  • TBD: Introduce a size threshold to determine when compression is applied.
    Context: Sending a Content-Encoding header for every request adds unnecessary overhead, so compression should only be used when the request size exceeds a configurable threshold (e.g., 1 KB, 2 KB, or 4 KB, similar to other libraries).

Warning

This is primarily about request encoding / compression. HTTP response encoding is vulnerable to BREACH and therefore requires additional measurements.


@proddata said:

It seems like CrateDB's HTTP interface accepts gzip / deflate compressed data.
It might also be interesting to add this capability to crate-python.

@surister said:

import gzip
import json
import requests

objects = [
    [1, "test"] for _ in range(200_000)
]

body = {
    "stmt": "INSERT INTO t VALUES (?, ?)",
    "bulk_args": objects
}
response = requests.post('http://192.168.88.251:4200/_sql', json=body)


print(response.request.headers.get('content-length'))

response = requests.post('http://192.168.88.251:4200/_sql',
                         data=gzip.compress(json.dumps(body).encode('utf8')),
                         headers={'Content-Encoding': 'gzip',
                                  'Content-Type': 'application/gzip; charset=utf-8'})

print(response.request.headers.get('content-length'))
2600054
5149

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions