Skip to content

upgrade hightable to 0.13.1#180

Merged
severo merged 9 commits intomasterfrom
upgrade-hightable
Mar 27, 2025
Merged

upgrade hightable to 0.13.1#180
severo merged 9 commits intomasterfrom
upgrade-hightable

Conversation

@severo
Copy link
Contributor

@severo severo commented Mar 21, 2025

For reference, here is the CHANGELOG for hightable 0.13.0: https://github.com/hyparam/hightable/blob/master/CHANGELOG.md#0130---2025-03-21

  • adapt styles
  • handle the new orderBy argument in parquet dataframe, workers and view (descending order + multiple columns)

This PR was not trivial because it required handling multi-column sorting for parquet.

I implemented it with the following details:

  • I created a new function, getParquetColumn, to fetch the values of a given column without transposing to rows (using parquetRead and onChunk)
  • I sorted the values of a column, stored the sorted indexes, then computed the ranks instead (so that two matching values have the same rank, which is required to implement multi-column sort), and finally cached these ranks for a given column
  • I computed and cached the indexes for a given orderBy criteria by getting the ranks for each column and sorting the rows using the following column for each tie.

@severo severo force-pushed the upgrade-hightable branch from 52611e5 to a4d0998 Compare March 25, 2025 15:11
@severo severo changed the base branch from master to add-ts-rule March 25, 2025 15:11
@severo severo force-pushed the upgrade-hightable branch from 699b041 to 8ee869c Compare March 25, 2025 16:43
@severo severo requested a review from platypii March 26, 2025 08:58
@severo severo marked this pull request as ready for review March 26, 2025 08:59
@severo
Copy link
Contributor Author

severo commented Mar 26, 2025

Also, note that for parquet, as we're interested in getting the ranks of a sorted column, not the values, we could improve the performance by using the row group metadata (min/max). But I guess it's more something to be done in hyparquet.

Base automatically changed from add-ts-rule to master March 26, 2025 21:40
@severo severo force-pushed the upgrade-hightable branch from 27c04a1 to 7ddf178 Compare March 26, 2025 21:45
@severo severo changed the title upgrade hightable to 0.13.0 upgrade hightable to 0.13.1 Mar 27, 2025
@severo severo merged commit 125d20a into master Mar 27, 2025
4 checks passed
@severo severo deleted the upgrade-hightable branch March 27, 2025 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants