Skip to content

Support Parquet unsigned integer types#149405

Open
swallez wants to merge 2 commits into
elastic:mainfrom
swallez:parquet-unsigned-integers
Open

Support Parquet unsigned integer types#149405
swallez wants to merge 2 commits into
elastic:mainfrom
swallez:parquet-unsigned-integers

Conversation

@swallez
Copy link
Copy Markdown
Contributor

@swallez swallez commented May 19, 2026

Summary

Parquet supports unsigned integer annotations (UINT_8, UINT_16, UINT_32, UINT_64) on top of its 32-bit and 64-bit physical integer types. Before this change, these annotations were ignored and the values were read as signed integers, causing data corruption for values that exceed the signed range (e.g., a UINT_32 value of 3,000,000,000 would be read as -1,294,967,296).

Type mapping (ParquetFormatReader):

  • INT32 with unsigned annotation and 32-bit width → LONG, to hold the full [0, 2^32) unsigned range
  • INT32 with unsigned annotation and smaller width (8 or 16 bits) → INTEGER, since values fit within the signed int range
  • INT64 with unsigned annotation → UNSIGNED_LONG

Value reading (PageColumnReader, ParquetFormatReader):

  • UNSIGNED_LONG is now handled alongside LONG in the INT32-backed dispatch paths
  • Values are widened with Integer.toUnsignedLong() so the bit pattern is preserved rather than sign-extended

Tests (ParquetFormatReaderTests):

  • testLargeUint32: 0xFFFFFFFF in a UINT_32 column maps to type LONG and reads as 4294967295
  • testLargeUint8: value 200 in a UINT_8 column maps to INTEGER and reads correctly
  • testLargeUnsignedLong: 0xFFFFFFFFFFFFFFFFL in a UINT_64 column maps to UNSIGNED_LONG and round-trips correctly

The dispatch logic between PageColumnReader#readBatch and ParquetFormatReader#readColumnBlock remains duplicated; a KEEP IN SYNC warning comment was added to both sites.

Fixes https://github.com/elastic/esql-planning/issues/316

@elasticsearchmachine elasticsearchmachine added v9.5.0 needs:triage Requires assignment of a team area label labels May 19, 2026
@swallez swallez added >bug :Analytics/ES|QL AKA ESQL ES|QL|DS ES|QL datasources labels May 19, 2026
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 19, 2026
@swallez swallez requested a review from costin May 19, 2026 14:28
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label May 19, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @swallez, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Preview links for changed docs

⏳ Building and deploying preview... View progress

This comment will be updated with preview links when the build is complete.

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants