Adding Index Setting for Data compression#20355
Adding Index Setting for Data compression#20355bharath-techie merged 5 commits intoopensearch-project:feature/datafusionfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
❌ Gradle check result for f60cfd8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
|
||
| // Enhanced native methods that handle validation and provide better error reporting | ||
| public static native void createWriter(String file, long schemaAddress) throws IOException; | ||
| public static native void createWriter(String file, long schemaAddress, boolean isCompressionEnabled) throws IOException; |
There was a problem hiding this comment.
The change makes it very specific for compression. Can we instead pass a map here which can be read and have attribute names? Also, there are certain settings which we may want to apply for certain columns only (some may get handled via mappings as well). Can you look into if we can make the argument passed to Rust aligned to handle the index/field level settings?
There was a problem hiding this comment.
Iceberg configuration for ref
| write.format.default | parquet | Default file format for the table; parquet, avro, or orc |
|---|---|---|
| write.delete.format.default | data file format | Default delete file format for the table; parquet, avro, or orc |
| write.parquet.row-group-size-bytes | 134217728 (128 MB) | Parquet row group size |
| write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size |
| write.parquet.page-row-limit | 20000 | Parquet page row limit |
| write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary page size |
| write.parquet.compression-codec | zstd | Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed |
| write.parquet.compression-level | null | Parquet compression level |
| write.parquet.bloom-filter-enabled.column.col1 | (not set) | Hint to parquet to write a bloom filter for the column: 'col1' |
| write.parquet.bloom-filter-max-bytes | 1048576 (1 MB) | The maximum number of bytes for a bloom filter bitset |
| write.parquet.bloom-filter-fpp.column.col1 | 0.01 | The false positive probability for a bloom filter applied to 'col1' (must > 0.0 and < 1.0) |
| ); | ||
|
|
||
| public static final Setting<Boolean> INDEX_COMPRESSION_ENABLED_SETTING = Setting.boolSetting( | ||
| "index.compression.enabled", |
There was a problem hiding this comment.
Should we use the existing codec setting here? We can validate what values to support in Parquet plugin itself.
There was a problem hiding this comment.
Also, some settings may come from parquet directly. If there is any setting which is specific to a dataformat, we should have the setting declared in the plugin itself
public List<Setting<?>> getSettings() {
return Collections.emptyList();
}
f60cfd8 to
04f4f7d
Compare
|
❌ Gradle check result for 04f4f7d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
fe1b049 to
c01b518
Compare
|
❌ Gradle check result for c01b518: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| "index.parquet.row_group_size_bytes", | ||
| new ByteSizeValue(128, ByteSizeUnit.MB), | ||
| Setting.Property.IndexScope, | ||
| Setting.Property.Dynamic |
There was a problem hiding this comment.
For now, its ok to be not dynamic
c01b518 to
969e025
Compare
|
❌ Gradle check result for 969e025: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| .set_compression(Compression::ZSTD(ZstdLevel::try_new(3).unwrap())) | ||
| .build(); | ||
|
|
||
| let props = WriterPropertiesBuilder::build(&config); |
|
|
||
| // Push current settings to Rust store once on construction, then keep in sync on updates | ||
| pushSettingsToRust(indexSettings); | ||
| indexSettings.getScopedSettings().addSettingsUpdateConsumer( |
There was a problem hiding this comment.
Since settings are not dynamic for now, we can remove these
| } | ||
|
|
||
| // Read configuration from the global settings store | ||
| let config: NativeSettings = SETTINGS_STORE.lock().unwrap().clone(); |
|
❌ Gradle check result for 178887e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
PR Reviewer Guide 🔍(Review updated until commit ee3dde9)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to ee3dde9 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit c1ebbbf
Suggestions up to commit 0cd71c7
Suggestions up to commit 58f7c11
|
|
❌ Gradle check result for 58f7c11: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
58f7c11 to
0cd71c7
Compare
|
Persistent review updated to latest commit 0cd71c7 |
|
❌ Gradle check result for 0cd71c7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
…rence Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
0cd71c7 to
c1ebbbf
Compare
|
Persistent review updated to latest commit c1ebbbf |
|
❌ Gradle check result for c1ebbbf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar Darji <darjisagar7@gmail.com>
c1ebbbf to
ee3dde9
Compare
|
Persistent review updated to latest commit ee3dde9 |
19e823e
into
opensearch-project:feature/datafusion
|
❌ Gradle check result for ee3dde9: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.