Skip to content

For a given query execution, allow data to be returned in columnar format and deserialized to numpy objects #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 4, 2025

Conversation

paxcodes
Copy link
Contributor

@paxcodes paxcodes commented Apr 2, 2025

This PR will allow SQL query results to be returned in columns and to be deserialized to numpy objects, without affecting other queries that could run in the system (e.g. Database introspection queries, INSERT queries, etc.)

Relates:

@paxw-panevo paxw-panevo force-pushed the feat/allow-columnar-data-soln branch from 4434620 to bd1c670 Compare April 2, 2025 00:37
@paxcodes paxcodes marked this pull request as ready for review April 2, 2025 00:40
@paxcodes paxcodes closed this Apr 2, 2025
@paxcodes paxcodes reopened this Apr 2, 2025
@paxcodes paxcodes closed this Apr 2, 2025
@paxcodes paxcodes reopened this Apr 2, 2025
@paxcodes
Copy link
Contributor Author

paxcodes commented Apr 2, 2025

The tests are failing due to an issue in the pipeline configuration:

image

@paxw-panevo
Copy link

@jayvynl This is ready for review. Although it's failing on the latest clickhouse image, I've verified that it's passing on 24.10 (the version currently deployed in ClickHouse Cloud) / python 3.9.

image

My suspicion is that the latest ClickHouse image has a bug (it generated a password for the default user when it shouldn't have) or a backwards-incompatible change (it started requiring a password for the default user).

I already have some tests created. I'll continue to improve the test coverage after you approve the overall implementation.

@paxcodes
Copy link
Contributor Author

paxcodes commented Apr 2, 2025

Tests created in this PR are passing and not skipped when I run locally:

image

@jayvynl
Copy link
Owner

jayvynl commented Apr 3, 2025

Although I think use_numpy and columnar should be implemented as features in clickhouse_driver like Cursor.set_settings. But I will accept your PR becase I think this may help others.

Please update the tox.ini file to add numpy and pandas

commands =
    pip install pandas
    # Use local clickhouse_backend package so that coverage works properly.
    pip install -e .
    coverage run tests/runtests.py --debug-sql {posargs}

Please update document and changelog.

@paxcodes
Copy link
Contributor Author

paxcodes commented Apr 3, 2025

@jayvynl The changes have been made:

  • tox.ini has been modified and pandas has been installed
  • changelog entry and documentation has been added

The checks look good now, aside from running tests against the latest clickhouse image, for the same reason mentioned before.

I wonder whether it'd be valuable to add the clickhouse version deployed in ClickHouse Cloud in the matrix. I can create PRs in the future to make sure the CI/CD pipeline config is updated whenever ClickHouse Cloud has deployed new ClickHouse versions.

@jayvynl jayvynl merged commit 8f0bee7 into jayvynl:main Apr 4, 2025
32 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants