Skip to content

Add Decimal type support to arrow-avro #7832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 1, 2025

Conversation

jecsand838
Copy link
Contributor

@jecsand838 jecsand838 commented Jun 30, 2025

Which issue does this PR close?

Rationale for this change

This PR addresses a feature gap by introducing support for the Avro decimal logical type, which is currently unimplemented as indicated by the test_decimal_logical_type_not_implemented test case. The decimal type is crucial for handling precise numerical data common in financial and scientific applications, making this a necessary addition for broader Avro compatibility.

What changes are included in this PR?

This PR introduces the necessary changes to both parse and decode the Avro decimal logical type into the corresponding Arrow Decimal128 or Decimal256 data types.

The main changes are:

  1. Schema Parsing (codec.rs):

    • Implemented the logic within make_data_type to correctly parse the decimal logical type from the Avro schema.
    • The Codec enum's Decimal variant now correctly stores the precision, scale, and optional fixed-size from the schema's attributes.
  2. Decoding Logic (record.rs):

    • Added Decoder::Decimal128 and Decoder::Decimal256 variants to handle decoding of decimal values from both bytes and fixed Avro types.
    • The implementation correctly handles sign extension for negative numbers to ensure accurate representation in Arrow's decimal arrays.

Are these changes tested?

This PR includes comprehensive tests to validate the new functionality:

  • The existing test_decimal_logical_type_not_implemented test has been replaced with concrete test cases.
  • Added unit tests in record.rs (test_decimal_decoding_fixed256, test_decimal_decoding_fixed128, test_decimal_decoding_bytes_with_nulls, etc.) to cover various scenarios, including:
    • Decoding from Avro fixed and bytes primitive types.
    • Handling different precisions to select between Decimal128 and Decimal256.
    • Correctly processing null values within decimal arrays.

Are there any user-facing changes?

N/A

- Introduced Decimal128 and Decimal256 mapping for Avro Decimal types.
- Added parsing logic for Decimal attributes including precision, scale, and size.
- Enhanced codec handling to convert Avro fixed or binary logical types into Arrow Decimal arrays.
- Implemented `Decoder` support for reading and decoding Decimal data.
- Added unit tests for Decimal128 and Decimal256 decoding with various configurations.
- Updated `criterion` to version `0.6.0`.
@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 30, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement to me -- thank you @jecsand838

@@ -732,4 +838,122 @@ mod tests {
assert_eq!(list_arr.len(), 1);
assert_eq!(list_arr.value_length(0), 0);
}

#[test]
fn test_decimal_decoding_fixed256() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have any way to verify that this is doing the right thing (like I don't know / have any way to verify if row1 and row2 byte arrays are valid avro)

One thing I suggest (in a different PR) is starting to test that the reader will work "end to end". Specifically, at least these two cases:

  1. Testing reading avro files written with other implementations
  2. Test "round tripping" arrow data from Arrow --> Avro --> Arrow and ensuring the same data has arrived

@jecsand838 have you considered this type of testing?

Copy link
Contributor Author

@jecsand838 jecsand838 Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb That's a solid call out. I'll introduce those types of tests in my next PR for enum types.

@alamb alamb merged commit 6123956 into apache:main Jul 1, 2025
25 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 1, 2025

Thanks again @jecsand838

@jecsand838 jecsand838 deleted the avro-codec-decimal branch July 1, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants