Skip to content

Add set_max_statistics_size to WriterProperties#581

Merged
adamreeve merged 6 commits into
G-Research:masterfrom
haianhng31:max_statistics_size
Oct 22, 2025
Merged

Add set_max_statistics_size to WriterProperties#581
adamreeve merged 6 commits into
G-Research:masterfrom
haianhng31:max_statistics_size

Conversation

@haianhng31
Copy link
Copy Markdown
Contributor

No description provided.

@haianhng31 haianhng31 marked this pull request as ready for review October 20, 2025 00:59
Comment thread csharp/WriterPropertiesBuilder.cs Outdated
Comment on lines +553 to +556
/// Specify max statistics size to store min max value.
/// Default 4KB.
/// </summary>
/// <param name="maxStatisticsSize">The max statistics size to store min max value</param>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unclear to me what this means exactly. This should specify the unit and explain more clearly how it is used, eg. whether it's a total size in the whole file or per statistic.

From https://github.com/apache/arrow/blob/cbd36b817fc77812f8df1a15bf24314de3b27f29/cpp/src/parquet/statistics.h#L148, it looks like this is applied for each individual min and max value, so is useful for string or binary typed columns that could be very long.

Suggested change
/// Specify max statistics size to store min max value.
/// Default 4KB.
/// </summary>
/// <param name="maxStatisticsSize">The max statistics size to store min max value</param>
/// Specify the maximum size in bytes for minimum and maximum values in page and column chunk statistics.
/// Default 4 KiB.
/// </summary>
/// <param name="maxStatisticsSize">The max statistics size in bytes</param>

Comment thread csharp/WriterProperties.cs Outdated

[DllImport(ParquetDll.Name)]
private static extern IntPtr WriterProperties_Max_Statistics_Size(IntPtr writerProperties, IntPtr path, [MarshalAs(UnmanagedType.I1)] out ulong maxStatisticsSize);
private static extern IntPtr WriterProperties_Max_Statistics_Size(IntPtr writerProperties, IntPtr path, [MarshalAs(UnmanagedType.U8)] out ulong maxStatisticsSize);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MarshalAs attribute shouldn't be necessary as U8 should be the default for a ulong. It looks like having I1 previously was a copy-paste error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got u! I'll remove it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants