Skip to content

Conversation

@slabko
Copy link
Contributor

@slabko slabko commented Nov 2, 2025

The PR enables compression for data sent from the server to the client. Additionally, it ensures that any mid-stream exceptions thrown by ClickHouse, as well as other possible connection issues, are properly handled.

Note: ClickHouse does not produce an exception marker in versions prior to 24.11, so this mid-stream exception handling only works for ClickHouse version 24.11 and above.

This is still a work in progress. remaining parts are:

  • Configure an extra DSN with compression enabled for the tests.
  • Clean up the CMake configuration for zstd; possibly use the ClickHouse configuration (if it also works well on Windows).
  • Add documentation describing the changes introduced to Poco (specifically to Poco::HTTPClientSession).
  • Try increasing Poco’s allocator chunk size to match zstd’s recommended output size.
  • Add a test that checks whether the Compression parameter is correctly passed to the server with the required compression parameters.
  • Nice to have: throw an appropriate error if the stream contain compressed data without corresponding Content-Encoding header. This would catch another error that would otherwise result in famous non-debuggable Incomplete input stream error.

Outside of this PR one thing is badly missing:

  • Write an article about how to handle "Incomplete input stream" errors.

slabko and others added 5 commits November 2, 2025 20:48
Add a test generator to the networking tests.
* support_compression: initial

* support_compression: slightly works

* support_compression: LZ4 seems to be working

* support_compression: code cleanup

* support_compression: RowBinaryWithNamesAndTypes, tiny cleanup

* support_compression: some doc, test

* support_compression: need_more_input + minor improvements

* support_compression: minor things per windsurf-bot' review

* support_compression: irrelevant session_id change reverted, minor

* support_compression: one more null pointer check

* support_compression: destruction order fix (sanitizer complain)

* support_compression: add lz4 library

* support_compression: include vector (for Win and mac)

* support_compression: tiny cleanup

* support_compression: stream owning model modified

* Revert "support_compression: stream owning model modified"

This reverts commit 8c82dc3.

* Revert "support_compression: tiny cleanup"

This reverts commit 638008d.

* Revert "support_compression: include vector (for Win and mac)"

This reverts commit 9e98da6.

* Revert "support_compression: add lz4 library"

This reverts commit 19a0829.

* Revert "support_compression: destruction order fix (sanitizer complain)"

This reverts commit b15b466.

* Revert "support_compression: one more null pointer check"

This reverts commit b31874f.

* Revert "support_compression: irrelevant session_id change reverted, minor"

This reverts commit db61e10.

* Revert "support_compression: minor things per windsurf-bot' review"

This reverts commit 937b78f.

* Revert "support_compression: need_more_input + minor improvements"

This reverts commit 347db71.

* Revert "support_compression: RowBinaryWithNamesAndTypes, tiny cleanup"

This reverts commit a089e5d.

* Revert "support_compression: code cleanup"

This reverts commit da17f18.

* Revert "support_compression: LZ4 seems to be working"

This reverts commit 7b92ea0.

* support_compression: reapplied everything non-LZ4 related
Rename `Compress` setting to `Compression`
@slabko slabko force-pushed the chunked-encoded-compression branch from 12ed3ab to 74b6328 Compare November 2, 2025 19:49
@slabko slabko changed the title Enable zstd-based compression Implement zstd-based compression Nov 2, 2025
@slabko slabko force-pushed the chunked-encoded-compression branch 2 times, most recently from 564706b to 865f162 Compare November 4, 2025 10:18
@slabko slabko marked this pull request as ready for review November 5, 2025 14:49
Copy link

@windsurf-bot windsurf-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other comments (15)
  • driver/test/networking_it.cpp (209-211) The check for ZSTD_compressStream2 return value is incorrect. According to zstd documentation, a return value > 0 indicates the number of bytes still to flush, not an error. Only negative values indicate errors.
            auto res = ZSTD_compressStream2(zstream.get(), &out, &in, end_op);
            if (ZSTD_isError(res))
                throw std::runtime_error(std::string("zstd compression error: ") + ZSTD_getErrorName(res));
    
    
  • driver/test/networking_it.cpp (203-203) The output buffer size is set to the same size as the input buffer, but zstd compressed data can sometimes be larger than the input data (especially for small inputs or incompressible data). Consider using ZSTD_compressBound() to determine the safe output buffer size.
  • driver/test/client_test_base.h (52-52) The buffer size for query_id_data has been reduced from 74 to 37 bytes. Can you confirm that 37 bytes is always sufficient for storing the query ID? The previous implementation seemed to account for different encoding scenarios that might require more space.
  • driver/test/client_test_base.h (60-60) The previous implementation had specific logic to handle different encodings based on the length of the query ID. The new implementation directly calls `toUTF8(query_id_data)` without these checks. Is this simplification intentional, and have you verified that it works correctly for all encoding scenarios?
  • contrib/poco/Net/src/HTTPClientSession.cpp (372-376) The Content-Encoding header is being retrieved twice unnecessarily. You can simplify this by using the already stored `content_encoding` variable in the comparison.
    	static const std::string default_content_encoding = "";
    	HTTPCompressionType compression = HTTPCompressionType::None;
    	auto content_encoding = response.get("Content-Encoding", default_content_encoding);
    	if (icompare(content_encoding, "zstd") == 0)
    		compression = HTTPCompressionType::ZSTD;
    
  • driver/api/impl/impl.cpp (509-509) This change creates an unnecessary copy of the query_id string. Since the original reference remains valid throughout this scope, it would be more efficient to keep using the reference version:
    auto & query_id = statement.getQueryId();
    
  • driver/connection.cpp (424-433) Consider also handling 'true'/'false' string values for the compression parameter, similar to how other boolean parameters are handled in this file. The current implementation only handles numeric values (0/1) and yes/no variants.
  • driver/test/client_test_base.h (58-58) The use of `std::ssize()` requires C++20 support. If the codebase needs to maintain compatibility with older C++ standards, consider using `static_cast(std::size(query_id_data))` instead.
  • contrib/poco/Net/CMakeLists.txt (40-40) This change adds a dependency on `ch_contrib::zstd`, but there's no check to verify if this target exists before linking against it. Consider adding a find_package or similar check to ensure the zstd library is available, or document the dependency requirement clearly.
  • test/docker-compose.yml (3-3) The version update from 24.3 to 25.3 aligns with the PR description which mentions that mid-stream exception handling only works for ClickHouse version 24.11 and above. This change ensures the tests will run against a compatible version. Consider adding a comment in the file explaining why this specific version is required for the compression features being implemented.
  • .github/workflows/Coverage.yml (139-148) There's an inconsistency in the server URLs between the ZSTD DSN configurations. The ANSI version uses `${CLICKHOUSE_SERVER_IP}` while the Unicode version uses `localhost:8123`. For consistency, both should use the same approach.
  • contrib/poco/Net/include/Poco/Net/HTTPBufferAllocator.h (39-39) This is a significant increase in buffer size (32x larger). While it makes sense for zstd compression, have you measured the memory impact when there are many concurrent connections? Consider adding a comment explaining why this specific size was chosen and any performance trade-offs.
  • contrib/poco/LocalChanges.md (157-158) Similar to the previous comment, this commit entry also shows a date in 2025, which appears to be incorrect.
  • contrib/poco/Net/include/Poco/Net/HTTPChunkedStream.h (156-158) Consider reformatting this comment as a multi-line comment for better readability:
    	bool _zstd_completed; // Marks the stream as correctly finished. We need to carry this state to the next
    	                      // fetch operation because the last operation that returns EOF would not give us
    	                      // any data to call zstd correctly.
    
  • contrib/poco/Net/include/Poco/Net/HTTPChunkedStream.h (154-154) There's a typo in the comment for `_zstd_context`:
    	std::unique_ptr<ZstdContext> _zstd_context; // Opaque wrapper around ZSTD_DStream to avoid `#include <zstd.h>` here
    

💡 To request another review, post a new comment with "/windsurf-review".

Comment on lines +145 to +146
Author: Andrew Slabko<br/>
Date: Sat Nov 1 19:00:21 2025 +0100<br/>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dates in the newly added commit entries (lines 145, 157) show years as 2025, which appears to be incorrect since we're currently in 2023. This could cause confusion for anyone reading the documentation.

if (head == out_buffer.size())
head = 0;

out.pos == 0 ? ++stalled_loop_steps : stalled_loop_steps = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential issue with the stalled loop detection logic. The ternary expression uses different operators (increment vs assignment):

Suggested change
out.pos == 0 ? ++stalled_loop_steps : stalled_loop_steps = 0;
stalled_loop_steps = out.pos == 0 ? stalled_loop_steps + 1 : 0;

void commitWrite(size_t size);

private:
class ZstdContext;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ZstdContext class is declared as a nested private class but its implementation isn't visible here. When using it with std::unique_ptr, you need to ensure the destructor is defined where the complete type is known. Make sure the implementation file includes the proper forward declaration or definition before the HTTPChunkedStreamBuf destructor is defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants