Skip to content

Conversation

njaremko
Copy link
Contributor

@njaremko njaremko commented Jun 24, 2025

Note:

This PR has a test that requires a file that needs to be upstreamed

Which issue does this PR close?

Rationale for this change

There's no such requirement in the parquet logical type specification for map types. Spark used to have a similar assertion, that they've since removed

What changes are included in this PR?

Getting rid of the assertion, and adding a test

Are there any user-facing changes?

No

If there are any breaking changes to public APIs, please call them out.

@njaremko njaremko changed the title Nathan 06 24 remove primitive map key assertion on record reader Remove primitive map key assertion on record reader Jun 24, 2025
@njaremko njaremko force-pushed the nathan_06-24-remove_primitive_map_key_assertion_on_record_reader branch from 2d8b0b9 to ea6d5f6 Compare June 24, 2025 15:56
@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 24, 2025
@njaremko njaremko force-pushed the nathan_06-24-remove_primitive_map_key_assertion_on_record_reader branch from ea6d5f6 to edd9b70 Compare June 24, 2025 15:57
Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for correcting this oversight @njaremko. FWIW the arrow API handles the test file as expected. Just a few nits with the test.

}

#[test]
fn test_file_reader_rows_nullable1() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more descriptive name here would be nice...test_compound_map_key perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, whoops, yeah, definitely need to fix this


#[test]
fn test_file_reader_rows_nullable1() {
let rows = test_file_reader_rows("databricks.parquet", None).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, something more descriptive than "databricks".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was named this because it's a file generated by databricks with every supported column type, I'll look into regenerating it

]
),
(
"map_nested".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the meat of the PR; would it be possible to only have this column in the test file? This is a lot to slog through and obscures the point of the test.

@etseidl etseidl added the bug label Jun 24, 2025
@njaremko
Copy link
Contributor Author

I'll update this PR with the requested changes once this gets merged: apache/parquet-testing#87

@alamb
Copy link
Contributor

alamb commented Jul 13, 2025

This PR has a test that requires a file that needs to be upstreamed

THanks again @njaremko

Given that we don't really control the timeline of the upstream parquet-testing repo, if you want to unblock this PR you could potentially copy the test file into the arrow-rs repo with a TODO comment that says to change to use the copy upstream in parquet-testing when apache/parquet-testing#87 is merged

@alamb
Copy link
Contributor

alamb commented Sep 8, 2025

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft September 8, 2025 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to parse parquet files with maps with complex keys
3 participants