Skip to content

Conversation

Bill-hbrhbr
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Nov 27, 2024

Description

  1. Port the working version of LZMA compression code into open-source
  2. Refactor the compressor-related files as a continuation of refactor(core): Fix clang-tidy warnings in the streaming compressor interface and its implementations; Modernize and refactor test-StreamingCompression for conciseness. #599
  3. Add scripts to install LZMA dependencies for all builds

Validation performed

Updated test-StreamingCompression.cpp to include LZMA compressors. Tested that compressing workflow finishes without errors.

Summary by CodeRabbit

  • New Features

    • Introduced LZMA compression support, expanding available compression options.
    • Added new constants and enumerators related to LZMA compression.
    • Enhanced testing suite to support LZMA compression functionality.
  • Bug Fixes

    • Improved error handling and validation in the LZMA compressor implementation.
  • Chores

    • Updated installation scripts to include the LZMA library and related dependencies across various platforms.
    • Updated installation scripts to install the latest version of the Boost library.
    • Added additional packages to installation scripts for various platforms.

Copy link
Contributor

coderabbitai bot commented Nov 27, 2024

Walkthrough

This pull request introduces comprehensive support for LZMA compression in the CLP project. The changes span multiple files across the project, including configuration files, CMake scripts, source code, and installation scripts. The implementation adds LZMA as a new compression option, with modifications to the build system, compression interfaces, constants, and test suites. The changes also update library installation scripts for various operating systems to ensure proper setup and dependency management.

Changes

File Change Summary
components/core/.clang-format Added lzma to library header regex pattern
components/core/CMakeLists.txt Added LZMA compression support, library finding, and build configuration
components/core/src/clp/streaming_compression/Constants.hpp Added LZMA enumerator to CompressorType
components/core/src/clp/streaming_compression/lzma/Compressor.cpp Implemented LZMA compression logic and error handling
components/core/src/clp/streaming_compression/lzma/Compressor.hpp Defined LZMA compressor class and interfaces
components/core/src/clp/streaming_compression/lzma/Constants.hpp Defined LZMA-specific compression constants
components/core/tests/test-StreamingCompression.cpp Added LZMA compression test section
components/core/tools/scripts/lib_install/liblzma.sh Added script for installing liblzma library
components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh Updated boost version and added liblzma installation command
components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh Added installation of jq and liblzma-dev packages
components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh Added commands for installing boost and liblzma
components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh Replaced boost packages with jq and liblzma-dev
components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh Added diffutils and jq packages
components/core/tools/scripts/lib_install/macos/install-all.sh Added xz package to installation list

Possibly Related PRs

Suggested Reviewers

  • gibber9809
  • kirkrodrigues

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Bill-hbrhbr
Copy link
Contributor Author

I guess two comments:

  1. Let me know if you want me to remove the decompressor files. I did a direct copy-paste so that I can include unittest for LZMA in this PR.
  2. I added lib install scripts for Ubuntu machines. Do I need to also look into CentOS and MacOS?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Outside diff range and nitpick comments (18)
components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1)

23-23: Consider pinning the package version for reproducibility.

To ensure consistent builds across different environments and times, consider pinning the package version.

-  liblzma-dev \
+  liblzma-dev=5.2.5-2ubuntu1 \
components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh (2)

Line range hint 1-20: Consider adding version compatibility comments

The script structure is good, but it would be helpful to document any version interdependencies between the libraries.

Add a comment block at the top of the file:

 #!/usr/bin/env bash

+# Library versions and compatibility notes:
+# - liblzma 5.4.6: Compatible with libarchive 3.5.1
+
 # Exit on any error
 set -e

14-14: Consider error handling for LZMA installation

While the script has basic error handling with set -e, consider adding specific error handling for the LZMA installation.

-"$lib_install_scripts_dir"/liblzma.sh 5.4.6
+if ! "$lib_install_scripts_dir"/liblzma.sh 5.4.6; then
+    echo "Error: LZMA installation failed. Please check system requirements."
+    exit 1
+fi
components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh (1)

17-17: Consider installation order implications

The LZMA library is installed after libarchive. If libarchive is built with LZMA support, it might need to be rebuilt after LZMA installation.

Consider either:

  1. Moving the LZMA installation before libarchive
  2. Adding a rebuild step for libarchive after LZMA installation if necessary
components/core/tools/scripts/lib_install/liblzma.sh (1)

1-66: Consider adding logging, dry-run mode, and uninstall capability

To improve the script's maintainability and user experience, consider these architectural improvements:

  1. Add logging functionality with different verbosity levels
  2. Implement a dry-run mode for testing
  3. Add uninstall capability
  4. Consider adding a progress indicator for long-running operations

These improvements would make the script more robust and user-friendly for both development and production use.

Would you like me to provide an implementation for any of these features?

components/core/tests/test-StreamingCompression.cpp (2)

60-64: Consider enhancing test coverage for LZMA-specific scenarios.

While the basic compression/decompression test is good, consider adding LZMA-specific test cases to validate:

  • Different compression levels
  • Dictionary size variations
  • Error handling for corrupted LZMA streams

Would you like me to provide example test cases for these scenarios?


Line range hint 1-113: Consider refactoring to use parameterized tests.

The test structure is duplicated across compression types. Consider using Catch2's test case templates to reduce duplication and make it easier to add new compression methods.

Here's a suggested approach:

TEMPLATE_TEST_CASE("StreamingCompression", "[StreamingCompression]",
    std::pair<clp::streaming_compression::zstd::Compressor, clp::streaming_compression::zstd::Decompressor>,
    std::pair<clp::streaming_compression::passthrough::Compressor, clp::streaming_compression::passthrough::Decompressor>,
    std::pair<clp::streaming_compression::lzma::Compressor, clp::streaming_compression::lzma::Decompressor>
) {
    auto compressor = std::make_unique<typename TestType::first_type>();
    auto decompressor = std::make_unique<typename TestType::second_type>();
    // ... rest of the test ...
}
components/core/CMakeLists.txt (4)

21-23: Standardize the status message format

The status message format differs from other compressors. Consider using a consistent format:

-    message(STATUS "Using Lempel–Ziv–Markov chain Algorithm compression")
+    message(STATUS "Using LZMA compression")

241-241: Use STATUS for LZMA include directory message

The include directory message should use the STATUS level for consistency with other similar messages:

-message("LZMA Include Dir: ${LIBLZMA_INCLUDE_DIRS}")
+message(STATUS "LZMA Include Dir: ${LIBLZMA_INCLUDE_DIRS}")

231-232: Address the static/shared library support TODO

The TODO comment indicates missing functionality for handling static vs. shared libraries.

Would you like me to help create a CMake module to handle static/shared library selection for LZMA? This would align with the project's existing pattern for other libraries.


573-573: Consider grouping compression-related libraries together

For better organization, consider grouping LZMA with other compression-related libraries:

-        ${LIBLZMA_LIBRARIES}
-        ZStd::ZStd
+        ZStd::ZStd
+        ${LIBLZMA_LIBRARIES}
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)

120-120: Use type alias LzmaAction for consistency.

In the compress method, consider using the type alias LzmaAction instead of lzma_action to maintain consistency with the defined type aliases and improve readability.

Apply this diff for consistency:

-auto compress(lzma_action action) -> void;
+auto compress(LzmaAction action) -> void;
components/core/src/clp/streaming_compression/lzma/Decompressor.hpp (3)

9-9: Remove unnecessary inclusion of <zlib.h>.

The <zlib.h> header included at line 9 may be unnecessary since this decompressor uses LZMA compression. Eliminating unused headers can reduce dependencies and improve compile times.

Apply this diff to remove the unnecessary include:

-#include <zlib.h>

61-62: Complete or remove the incomplete Doxygen comment.

There is an incomplete Doxygen comment starting at line 61 without any description. To maintain clear documentation, please complete the comment or remove it if it's not needed.

Apply this diff to complete the comment:

 /**
+ * Reads an exact number of bytes from the decompressor into the buffer.
+ * @param buf The buffer to read data into.
+ * @param num_bytes_to_read The exact number of bytes to read.
+ * @param num_bytes_read The actual number of bytes read.
 */
 void exact_read(char* buf, size_t num_bytes_to_read, size_t& num_bytes_read);

89-90: Correct the reference to the appropriate method in the comment.

The comment at line 89 incorrectly references streaming_compression::zstd::Decompressor::try_seek_from_begin, which is not relevant to the LZMA decompressor. It should refer to the appropriate LZMA method or be updated to accurately reflect the functionality.

Apply this diff to correct the comment:

- * @return Same as streaming_compression::zstd::Decompressor::try_seek_from_begin
+ * @return Same as Decompressor::try_seek_from_begin
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

86-86: Use LZMA_STREAM_INIT for proper initialization of m_compression_stream

Instead of using memset to initialize m_compression_stream, it's recommended to use the LZMA_STREAM_INIT macro provided by the LZMA library. This ensures all fields are correctly initialized, including any internal pointers, and is the preferred method according to the LZMA documentation.

Apply this diff to initialize m_compression_stream correctly:

-    memset(m_compression_stream.get(), 0, sizeof(LzmaStream));
+    *m_compression_stream = LZMA_STREAM_INIT;
components/core/src/clp/streaming_compression/lzma/Decompressor.cpp (2)

113-113: Correct error messages to reference LZMA functions

The error messages incorrectly mention inflate(), which is associated with zlib. Since we're using LZMA, please update the messages to reference lzma_code() to accurately reflect the context.

Apply this diff to fix the error messages:

-                    SPDLOG_ERROR("streaming_compression::lzma::Decompressor inflate() ran out of memory");
+                    SPDLOG_ERROR("streaming_compression::lzma::Decompressor lzma_code() ran out of memory");

...

-                    SPDLOG_ERROR("inflate() returned an unexpected value - {}.", int(return_value));
+                    SPDLOG_ERROR("lzma_code() returned an unexpected value - {}.", int(return_value));

Also applies to: 117-117


130-130: Update comment to reflect LZMA instead of ZStd

The comment references 'ZStd', but this is an LZMA decompressor. Please correct the comment to avoid confusion.

Apply this diff to fix the comment:

-            // ZStd has no way for us to seek back to the desired position, so just reset the stream
+            // LZMA has no way for us to seek back to the desired position, so just reset the stream
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7aea626 and 707c412.

📒 Files selected for processing (14)
  • components/core/.clang-format (1 hunks)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/src/clp/streaming_compression/Constants.hpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Constants.hpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Decompressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Decompressor.hpp (1 hunks)
  • components/core/tests/test-StreamingCompression.cpp (2 hunks)
  • components/core/tools/scripts/lib_install/liblzma.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh (1 hunks)
  • components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • components/core/.clang-format
  • components/core/src/clp/streaming_compression/lzma/Constants.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
components/core/src/clp/streaming_compression/lzma/Decompressor.cpp

[error] 41-41: Rethrowing current exception with 'throw;', it seems there is no current exception to rethrow. If there is no current exception this calls std

(rethrowNoCurrentException)


[error] 45-45: Rethrowing current exception with 'throw;', it seems there is no current exception to rethrow. If there is no current exception this calls std

(rethrowNoCurrentException)

🔇 Additional comments (11)
components/core/src/clp/streaming_compression/Constants.hpp (2)

10-10: Well-structured enum value addition!

The chosen value (0x20) follows good practices:

  • Maintains consistent power-of-2 spacing with existing values
  • Leaves room for future compression types between values
  • Follows the established naming convention

10-10: Verify enum value handling across the codebase

Let's ensure the new LZMA enum value is properly handled in all relevant code paths.

✅ Verification successful

Let me gather more information about the CompressorType enum usage and switch statements.


Based on the comprehensive analysis of the codebase, I can now provide a final response:

LZMA enum value is properly integrated and handled

The new LZMA enum value (0x20) is correctly integrated into the codebase with proper handling:

  • The CompressorType enum is consistently defined across namespaces (clp::streaming_compression)
  • LZMA implementation follows the same pattern as existing compressors (ZSTD, Passthrough)
  • No switch statements directly use CompressorType enum, as the type is handled through class inheritance
  • Proper initialization of LZMA compressor/decompressor classes using CompressorType::LZMA
  • Test cases include LZMA compression alongside other compression types
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find potential switch statements and enum handling code that might need updates

# Search for switch statements handling CompressorType
echo "Checking for switch statements handling CompressorType..."
rg -l "switch.*CompressorType" 

# Search for serialization/deserialization code
echo "Checking for serialization/deserialization code..."
rg -l "serialize.*CompressorType|deserialize.*CompressorType"

# Look for direct CompressorType comparisons
echo "Checking for CompressorType comparisons..."
rg -l "CompressorType::(ZSTD|Passthrough)"

Length of output: 989


Script:

#!/bin/bash
# Search for CompressorType enum usage with context
echo "Checking CompressorType enum usage with context..."
rg -A 5 "CompressorType" 

# Search for switch statements in compression-related files
echo "Checking switch statements in compression files..."
fd -e cpp -e hpp | grep -i "compress" | xargs rg "switch.*\{" -A 5

# Look for factory or creation methods that might use CompressorType
echo "Checking factory methods..."
rg "create.*Compressor|make.*Compressor" -A 5

Length of output: 26592

components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1)

23-23: LGTM! Verify package availability in Ubuntu Jammy.

The addition of liblzma-dev is correct and properly placed in alphabetical order.

✅ Verification successful

Package addition is properly integrated and verified

The liblzma-dev package addition is well-integrated into the existing LZMA compression workflow:

  • Package exists in Ubuntu Jammy repositories
  • Consistent with Ubuntu Focal configuration
  • Properly supports the LZMA compression option in CMakeLists.txt
  • Aligns with existing LZMA build infrastructure (liblzma.sh)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify liblzma-dev package availability and its dependencies in Ubuntu Jammy
# Expected: Package should be available and not have conflicting dependencies

# Check if the package exists in Ubuntu Jammy repositories
curl -s "https://packages.ubuntu.com/jammy/liblzma-dev" | grep -q "Page not found" && echo "Package not found" || echo "Package exists"

# Get package details including version and dependencies
curl -s "https://packages.ubuntu.com/jammy/liblzma-dev" | grep -A 5 "Package:"

Length of output: 300


Script:

#!/bin/bash
# Let's verify the package dependencies and version availability more thoroughly
apt-cache policy liblzma-dev 2>/dev/null || echo "Package info not available in sandbox"

# Check if the package is already used elsewhere in the codebase
rg -l "liblzma-dev" --type sh

# Look for any LZMA related build dependencies or configurations
rg -l "lzma" --type cmake --type make --type sh

Length of output: 787


Script:

#!/bin/bash
# Let's check the CMakeLists.txt to understand LZMA integration
cat components/core/CMakeLists.txt | grep -A 5 -B 5 "lzma"

# Check the content of liblzma.sh to understand existing LZMA setup
cat components/core/tools/scripts/lib_install/liblzma.sh

# Check if the package is also used in Focal
cat components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh | grep "liblzma-dev"

Length of output: 3455

components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh (1)

14-14: Verify LZMA installation script and dependencies

The addition of LZMA library installation looks good, but let's verify a few important aspects:

✅ Verification successful

Let me gather more information about the liblzma.sh script content and its error handling:


LZMA installation script verified and properly configured

The installation script for LZMA 5.4.6 has been thoroughly verified and shows:

  • Proper error handling with set -e and set -u
  • Secure download from the official tukaani-project repository
  • Appropriate privilege management
  • CMake configuration with position-independent code
  • Parallel build support
  • Clean installation process

The script follows the project's established patterns and integrates well with the existing build system. The CMake configuration properly requires and validates LZMA presence.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify LZMA installation script and its dependencies

# Check if liblzma.sh exists and is executable
echo "Checking liblzma.sh existence and permissions..."
fd --type f "liblzma.sh" --exec ls -l {}

# Look for version constraints or dependencies in build files
echo -e "\nChecking for version constraints..."
rg -g "CMakeLists.txt" --multiline "find_package\(.*LZMA.*\)" -A 5

Length of output: 872


Script:

#!/bin/bash
# Check the content of liblzma.sh for error handling and version validation
echo "Checking liblzma.sh content..."
cat ./components/core/tools/scripts/lib_install/liblzma.sh

# Check if there are any version-specific dependencies or constraints in the codebase
echo -e "\nChecking for LZMA version references..."
rg "LZMA.*5\.4" --type cpp --type cmake

Length of output: 1777

components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh (1)

17-17: Verify LZMA version compatibility and security

Please ensure that LZMA version 5.4.6 meets the project's requirements and has no known vulnerabilities.

✅ Verification successful

Let me gather more information about LZMA usage and dependencies in the project.


LZMA 5.4.6 is compatible with the project requirements

Based on the codebase analysis:

  • LZMA is used as an optional general-purpose compressor in the project
  • The project uses CMake's find_package(LibLZMA REQUIRED) without version constraints
  • No known vulnerabilities were found for LZMA 5.4.6
  • The installation script location appears to be incorrect as liblzma.sh was not found at the expected path

The version 5.4.6 is suitable for use, but please ensure the installation script is properly placed at components/core/tools/scripts/lib_install/ubuntu-focal/liblzma.sh before proceeding.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for security advisories and compatibility issues with LZMA 5.4.6

# Check for CVEs in the National Vulnerability Database
curl -s "https://services.nvd.nist.gov/rest/json/cves/2.0?keywordSearch=liblzma%205.4.6" | jq '.vulnerabilities[].cve.descriptions[].value'

# Check if any other components in the project specify LZMA version requirements
rg -i "lzma.*version|version.*lzma" --type cmake --type=txt

Length of output: 302


Script:

#!/bin/bash
# Check for LZMA dependencies and usage
rg -i "lzma" --type cmake -A 3 -B 3

# Check the liblzma installation script
cat components/core/tools/scripts/lib_install/ubuntu-focal/liblzma.sh

# Check if there are any specific LZMA version requirements in package configs
fd -e pc -e cmake -x cat {} \; | rg -i "lzma"

Length of output: 4152

components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh (2)

23-23: LGTM! The LZMA development package is correctly added.

The addition of liblzma-dev is properly positioned in alphabetical order and aligns with the script's existing package installation pattern.


23-23: Consider documenting Ubuntu version compatibility.

Since this script is specifically for Ubuntu Focal (20.04), we should verify and document LZMA package compatibility for other Ubuntu versions that might be supported by the project.

components/core/tests/test-StreamingCompression.cpp (1)

18-19: LGTM! Headers are properly organized.

The LZMA compression headers follow the established pattern and are correctly placed with other compression-related includes.

components/core/src/clp/streaming_compression/lzma/Decompressor.hpp (2)

69-69: Verify the correctness of the method reference in the comment.

The comment at line 69 mentions ReaderInterface::try_read_exact_length, but it's unclear if this is the correct method reference. Please ensure that this comment accurately describes the return behaviour and references the appropriate method.


146-146: ⚠️ Potential issue

Ensure proper initialisation and cleanup of m_decompression_stream.

The member variable m_decompression_stream is a raw pointer initialised to nullptr. To prevent potential memory leaks, ensure that it is properly allocated during initialisation and that all resources are released in the destructor.

Would you like assistance in reviewing the resource management for m_decompression_stream?

components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

20-75: Initialization of the LZMA encoder is correctly implemented

The init_lzma_encoder function properly initializes the LZMA encoder with the specified compression level and dictionary size. Error handling for initialization failures is thorough and follows best practices.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
components/core/CMakeLists.txt (3)

234-235: Fix inconsistent variable name in status message

The status message uses Lzma instead of the package name LibLZMA:

-    message(STATUS "Found Lzma ${LIBLZMA_VERSION_STRING}")
+    message(STATUS "Found LibLZMA ${LIBLZMA_VERSION_STRING}")

241-241: Remove redundant debug message

The include directory message is redundant since CMake already logs this information at a more appropriate debug level.

-message("Lzma Include Dir: ${LIBLZMA_INCLUDE_DIRS}")

231-232: Add support for static/shared LZMA library selection

The TODO comment indicates missing support for selecting between static and shared LZMA libraries. This should be implemented to maintain consistency with other library configurations in the project.

Would you like me to help create a FindLZMA.cmake module that supports static/shared library selection?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 89b5707 and c646cea.

📒 Files selected for processing (4)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
  • components/core/tools/scripts/lib_install/liblzma.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp
  • components/core/tools/scripts/lib_install/liblzma.sh
🔇 Additional comments (3)
components/core/CMakeLists.txt (1)

481-485: Missing test coverage for LZMA compression implementation

The LZMA source files have been added to the test suite in CMakeLists.txt, but there are no corresponding test cases in the test-StreamingCompression.cpp file. This needs to be addressed to ensure the reliability of the LZMA compression functionality.

components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

173-183: ⚠️ Potential issue

Prevent potential infinite loops by handling LZMA_BUF_ERROR appropriately

In the compress function, when lzma_code returns LZMA_BUF_ERROR, the current implementation continues the loop without taking corrective action. This could lead to an infinite loop, as LZMA_BUF_ERROR indicates that no progress can be made due to insufficient input or output buffer space.

Consider handling LZMA_BUF_ERROR explicitly to prevent potential infinite loops.

Apply this diff to handle LZMA_BUF_ERROR:

             switch (rc) {
                 case LZMA_OK:
                     break;
-                case LZMA_BUF_ERROR:
-                    break;
+                case LZMA_BUF_ERROR:
+                    // No progress can be made; handle or report the error
+                    SPDLOG_ERROR("Compression stalled: LZMA_BUF_ERROR encountered.");
+                    throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
                 case LZMA_STREAM_END:
                     hit_stream_end = true;
                     break;

207-213: 🛠️ Refactor suggestion

Handle potential exceptions from m_compressed_stream_file_writer->write

Currently, the code assumes that the write operation always succeeds. To enhance robustness, consider handling any exceptions or errors that may occur during the write operation to ensure that failures are properly reported and managed.

Apply this diff to handle possible write errors:

 auto Compressor::pipe_data() -> void {
+    try {
         m_compressed_stream_file_writer->write(
             size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
             cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
         );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }
     m_compression_stream.next_out = m_compressed_stream_block_buffer.data();
     m_compression_stream.avail_out = cCompressedStreamBlockBufferSize;
 }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
components/core/tests/test-StreamingCompression.cpp (2)

59-61: Add documentation for intentionally omitted decompressor

While the LZMA test section follows the established pattern, it's worth documenting why the decompressor is intentionally not initialized here, especially since it differs from other test sections.

Consider adding a comment like:

 SECTION("LZMA compression") {
+    // Note: Decompressor intentionally not initialized as it's handled separately
     compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>();
 }

82-85: Consider adding test reporting for skipped decompression

While the null check correctly handles missing decompressors, it silently skips the decompression test. Consider using Catch2's INFO or WARN to report when decompression is skipped.

 if (nullptr == decompressor) {
+    WARN("Skipping decompression test - decompressor not initialized");
     boost::filesystem::remove(compressed_file_path);
     return;
 }
components/core/CMakeLists.txt (2)

241-241: Remove redundant debug message

The include directory message is redundant since CMake already logs this information when the package is found.

-message("Lzma Include Dir: ${LIBLZMA_INCLUDE_DIRS}")

231-232: Add support for static/shared LZMA library selection

The TODO comment indicates missing support for selecting between static and shared LZMA libraries. This should be implemented to maintain consistency with other compression libraries in the project.

Would you like me to help create a FindLibLZMA.cmake module that supports static/shared library selection?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c646cea and c91e5fb.

📒 Files selected for processing (2)
  • components/core/CMakeLists.txt (4 hunks)
  • components/core/tests/test-StreamingCompression.cpp (3 hunks)
🔇 Additional comments (4)
components/core/tests/test-StreamingCompression.cpp (2)

18-18: LGTM! Header inclusion follows project conventions

The LZMA compressor header is properly included alongside other compression-related headers.


59-61: Verify LZMA-specific test scenarios

While the test covers basic compression functionality, consider whether LZMA-specific scenarios need testing:

  • Different compression levels
  • Dictionary size variations
  • Error handling for invalid LZMA streams
✅ Verification successful

Let me gather more information about the LZMA implementation and test coverage.


Test coverage appears adequate for LZMA compression

The implementation already includes essential LZMA-specific configurations and error handling:

  • Compression level configuration via init_lzma_encoder
  • Dictionary size customization
  • Error handling for LZMA stream operations
  • CRC64 checksum verification

The basic test case is sufficient as it exercises the core compression path, which internally handles these configurations. The implementation's error handling is covered through the LZMA stream workflow.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any existing LZMA-specific test cases
rg -l "LZMA" --type cpp --glob "test-*.cpp"

# Check for LZMA configuration options in source
ast-grep --pattern 'class $_ {
  $$$
  // LZMA specific configuration
  $$$
}'

Length of output: 176


Script:

#!/bin/bash
# Check the LZMA compressor implementation and available configuration options
rg -A 10 "class.*[Ll]zma.*Compressor" --type cpp

# Look for LZMA-specific test cases in more detail
rg -B 2 -A 10 "SECTION.*LZMA" components/core/tests/test-StreamingCompression.cpp

# Search for any LZMA configuration or parameter settings
rg "lzma.*\(" --type cpp

Length of output: 2117

components/core/CMakeLists.txt (2)

235-235: Fix incorrect variable name in status message

The status message uses Lzma instead of the package name LibLZMA.


481-483: LGTM! Source files and library linking look good

The LZMA source files are properly organized in the streaming_compression directory, and the library is correctly linked in the target_link_libraries command.

Also applies to: 571-571

@Bill-hbrhbr Bill-hbrhbr requested a review from haiqi96 November 27, 2024 20:48
@Bill-hbrhbr Bill-hbrhbr requested a review from haiqi96 November 30, 2024 07:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)

31-35: Consider simplifying the comment block.

The comment about XZ Embedded could be more concise while maintaining its informative value.

-    // Initialize the encoder using a preset. Set the integrity to check
-    // to CRC64, which is the default in the xz command line tool. If
-    // the .xz file needs to be decompressed with XZ Embedded, use
-    // LZMA_CHECK_CRC32 instead.
+    // Initialize encoder with CRC64 integrity check (use LZMA_CHECK_CRC32 for XZ Embedded compatibility)

171-174: Document LZMA_BUF_ERROR handling strategy.

The code handles LZMA_BUF_ERROR but should document why it's treated the same as LZMA_OK.

             case LZMA_OK:
             case LZMA_BUF_ERROR:
+                // LZMA_BUF_ERROR indicates temporary buffer condition, safe to continue
                 break;

1-207: Consider adding performance optimizations for large datasets.

The current implementation could benefit from:

  1. Buffer size tuning based on input characteristics
  2. Parallel compression for large datasets
  3. Memory-mapped I/O for large files
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c91e5fb and 26b0663.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp
🔇 Additional comments (5)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (5)

1-17: LGTM! Header organization and includes are well-structured.

The includes are logically organized into standard libraries, external dependencies, and project-specific headers.


70-93: LGTM! Well-structured initialization with proper validation.

The method properly validates the initialization state and compression level before setting up the compression stream and buffers.


95-108: LGTM! Proper resource cleanup and finalization.

The method ensures proper cleanup of resources and follows RAII principles.


110-131: ⚠️ Potential issue

Add error handling for the compression operation.

The write operation should handle potential compression failures.

     run_lzma(LZMA_RUN);
+    if (!m_compression_stream_is_flushed) {
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }

199-206: ⚠️ Potential issue

Add exception handling for file write operations.

The write operation could fail and should be wrapped in a try-catch block.

 auto Compressor::pipe_data() -> void {
+    try {
         m_compressed_stream_file_writer->write(
                 clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
                 cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
         );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }
     m_compression_stream.next_out = m_compressed_stream_block_buffer.data();
     m_compression_stream.avail_out = cCompressedStreamBlockBufferSize;
 }

@haiqi96
Copy link
Contributor

haiqi96 commented Dec 2, 2024

  • I added lib install scripts for Ubuntu machines.

Sorry for missing this question. Yes I think you need to look into CentOS and MacOS. It looks like you have already added the dependency for centos.

Actually I would expect the build of MacOS to fail because LZMA library is not installed, and the updated Unittest should require the lzma dependency. Maybe it means the MacOS already comes with LZMA dependencies?

davidlion
davidlion previously approved these changes Dec 20, 2024
Copy link
Contributor

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the code's behavior looks weird to me:

  1. In write(), we always call detach_lzma_input before exiting, which will set m_lzma_stream.avail_in() to be 0.
  2. In close(), we call flush_lzma(LZMA_FULL_FLUSH) if (m_lzma_stream.avail_in() > 0). Because of (1), this should never happen.
  3. In flush(), we always call flush_lzma(LZMA_SYNC_FLUSH); Similarly, because of (1), m_lzma_stream.avail_in() = 0

It looks to me that there's some contradiction.
What I read from 2 is that flush_lzma(LZMA_SYNC_FLUSH) has something to do with avail_in. but in 3, we are not checking the avail_in at all.

I didn't read LZMA functions carefully, but my guess is that:
flush_lzma(LZMA_SYNC_FLUSH) has something to do with the data internally buffered by lzma_string. so we should call it in flush()
flush_lzma(LZMA_FINISH) may or may not handle the internally buffered data.

  • If it does not handle the internally buffered data, shouldn't close() always call flush_lzma(LZMA_SYNC_FLUSH) before calling flush_lzma(LZMA_FINISH)?
  • If it handle the internally buffered data, close() won't need to call flush_lzma(LZMA_SYNC_FLUSH).

If what my guess is wrong and there are some subtlety in the lzma flush API, then we should document it in flush_lzma()'s docstring,

@Bill-hbrhbr Bill-hbrhbr requested review from kirkrodrigues and removed request for kirkrodrigues December 20, 2024 22:11
@Bill-hbrhbr
Copy link
Contributor Author

Bill-hbrhbr commented Dec 20, 2024

Part of the code's behavior looks weird to me:

  1. In write(), we always call detach_lzma_input before exiting, which will set m_lzma_stream.avail_in() to be 0.
  2. In close(), we call flush_lzma(LZMA_FULL_FLUSH) if (m_lzma_stream.avail_in() > 0). Because of (1), this should never happen.
  3. In flush(), we always call flush_lzma(LZMA_SYNC_FLUSH); Similarly, because of (1), m_lzma_stream.avail_in() = 0

It looks to me that there's some contradiction. What I read from 2 is that flush_lzma(LZMA_SYNC_FLUSH) has something to do with avail_in. but in 3, we are not checking the avail_in at all.

I didn't read LZMA functions carefully, but my guess is that: flush_lzma(LZMA_SYNC_FLUSH) has something to do with the data internally buffered by lzma_string. so we should call it in flush() flush_lzma(LZMA_FINISH) may or may not handle the internally buffered data.

  • If it does not handle the internally buffered data, shouldn't close() always call flush_lzma(LZMA_SYNC_FLUSH) before calling flush_lzma(LZMA_FINISH)?
  • If it handle the internally buffered data, close() won't need to call flush_lzma(LZMA_SYNC_FLUSH).

If what my guess is wrong and there are some subtlety in the lzma flush API, then we should document it in flush_lzma()'s docstring,

LZMA_FINISH also flushes the internally buffered data, and potentially do some tear-down operations.
LZMA_SYNC_FLUSH only flushes but allows existing writes to continue/new writes coming in.
LZMA_FULL_FLUSH not only flushes but also processes input data until it exhausts. After that, need to have new input sources.

For (2) I'll make it a class operation error instead of spdlog error

@haiqi96
Copy link
Contributor

haiqi96 commented Dec 20, 2024

Part of the code's behavior looks weird to me:

  1. In write(), we always call detach_lzma_input before exiting, which will set m_lzma_stream.avail_in() to be 0.
  2. In close(), we call flush_lzma(LZMA_FULL_FLUSH) if (m_lzma_stream.avail_in() > 0). Because of (1), this should never happen.
  3. In flush(), we always call flush_lzma(LZMA_SYNC_FLUSH); Similarly, because of (1), m_lzma_stream.avail_in() = 0

It looks to me that there's some contradiction. What I read from 2 is that flush_lzma(LZMA_SYNC_FLUSH) has something to do with avail_in. but in 3, we are not checking the avail_in at all.
I didn't read LZMA functions carefully, but my guess is that: flush_lzma(LZMA_SYNC_FLUSH) has something to do with the data internally buffered by lzma_string. so we should call it in flush() flush_lzma(LZMA_FINISH) may or may not handle the internally buffered data.

  • If it does not handle the internally buffered data, shouldn't close() always call flush_lzma(LZMA_SYNC_FLUSH) before calling flush_lzma(LZMA_FINISH)?
  • If it handle the internally buffered data, close() won't need to call flush_lzma(LZMA_SYNC_FLUSH).

If what my guess is wrong and there are some subtlety in the lzma flush API, then we should document it in flush_lzma()'s docstring,

LZMA_FINISH also flushes the internally buffered data, and potentially do some tear-down operations. LZMA_SYNC_FLUSH only flushes but allows existing writes to continue/new writes coming in. LZMA_FULL_FLUSH not only flushes but also processes input data until it exhausts. After that, need to have new input sources.

For (2) I'll make it a class operation error instead of spdlog error

I see, yeah I think the change is fair

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)

17-33: Consider adding documentation for the open method.

The method handles initialization well, but adding documentation about its purpose and possible exceptions would improve maintainability.


35-47: Consider handling remaining data in close().

The method throws an error if there's unprocessed input data. Consider processing the remaining data before closing to ensure no data is lost.

 if (m_lzma_stream.avail_in() > 0) {
-    throw OperationFailed(ErrorCode_Corrupt, __FILENAME__, __LINE__);
+    SPDLOG_WARN("Processing remaining data before closing");
+    encode_lzma();
 }

160-202: Consider separating error handling into a dedicated method.

The error handling logic in the constructor is quite lengthy. Consider extracting it into a separate private method for better maintainability.

+    private:
+        static auto get_error_message(lzma_ret rc) -> const char* {
+            switch (rc) {
+                case LZMA_MEM_ERROR:
+                    return "Memory allocation failed";
+                case LZMA_OPTIONS_ERROR:
+                    return "Specified preset is not supported";
+                case LZMA_UNSUPPORTED_CHECK:
+                    return "Specified integrity check is not supported";
+                case LZMA_PROG_ERROR:
+                    return "Input arguments are not sane";
+                default:
+                    return "Unknown error";
+            }
+        }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68c4c36 and dcb843e.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_compression/lzma/Compressor.hpp
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
🔇 Additional comments (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

1-16: LGTM! Well-organized includes and namespace declaration.

The includes are properly organized and all necessary dependencies are present.


142-158: 🛠️ Refactor suggestion

Handle potential write failures.

The write operation to m_compressed_stream_file_writer should be wrapped in a try-catch block to handle potential I/O errors.

+    try {
         m_compressed_stream_file_writer->write(
                 clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
                 cCompressedStreamBlockBufferSize - m_lzma_stream.avail_out()
         );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
components/core/tests/test-StreamingCompression.cpp (1)

63-72: Consider enhancing error handling in the compress function

While the implementation is correct, consider adding error handling for the file operations and compression steps.

 auto compress(std::unique_ptr<Compressor> compressor, char const* src) -> void {
     FileWriter file_writer;
-    file_writer.open(string(cCompressedFilePath), FileWriter::OpenMode::CREATE_FOR_WRITING);
+    auto const result = file_writer.open(string(cCompressedFilePath), FileWriter::OpenMode::CREATE_FOR_WRITING);
+    REQUIRE(ErrorCode_Success == result);
     compressor->open(file_writer);
     for (auto const chunk_size : cCompressionChunkSizes) {
-        compressor->write(src, chunk_size);
+        REQUIRE(ErrorCode_Success == compressor->write(src, chunk_size));
     }
     compressor->close();
     file_writer.close();
 }
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)

35-47: Consider error handling refinement in close().

The error code ErrorCode_Corrupt on line 41 might not accurately represent the condition of having unprocessed input data. Consider using ErrorCode_Failure or a more descriptive error code.

-        throw OperationFailed(ErrorCode_Corrupt, __FILENAME__, __LINE__);
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);

79-100: Enhance LZMA_BUF_ERROR handling in encode_lzma.

The error message could be more specific about the nature of the error, especially since we know both buffers are available in this context.

             case LZMA_BUF_ERROR:
                 SPDLOG_ERROR("LZMA compressor input stream is corrupt. No encoding "
-                             "progress can be made.");
+                             "progress can be made with available input and output buffers.");
                 throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);

191-194: Consider more specific error handling for LZMA_PROG_ERROR.

The error message could provide more specific information about which input arguments are invalid.

         case LZMA_PROG_ERROR:
-            msg = "Input arguments are not sane";
+            msg = "Invalid input: compression level, dictionary size, or check parameter is out of range";
             break;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcb843e and b20162f.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/tests/test-StreamingCompression.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/tests/test-StreamingCompression.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (2)
components/core/tests/test-StreamingCompression.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
🔇 Additional comments (5)
components/core/tests/test-StreamingCompression.cpp (3)

7-8: LGTM! Good use of string_view for optimization

The addition of string_view and related headers is appropriate, and the organization of includes is clean and logical.

Also applies to: 12-12, 32-33


35-61: Well-structured test constants and helper functions

The interleaved no-op (0-sized) chunks in cCompressionChunkSizes effectively test the compressor state integrity between operations. The use of string_view for constants and clear function signatures enhance code efficiency.


140-143: Document LZMA section's compression-only scope

The LZMA section differs from other compression tests as it only tests compression without decompression. This should be documented for clarity.

-    SECTION("LZMA compression") {
+    SECTION("LZMA compression (compression-only test)") {
+        // Note: This section intentionally tests only compression functionality
+        // without decompression, as per the current implementation scope.
         compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>();
         compress(std::move(compressor), uncompressed_buffer.data());
     }
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

1-16: LGTM! Well-organized includes and namespace declaration.

The includes are logically grouped (standard library, external dependencies, and project headers) and all necessary dependencies are present.


142-158: LGTM! Robust buffer management implementation.

The method properly handles buffer checks, writes, and memory management with appropriate error handling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/core/tests/test-StreamingCompression.cpp (1)

34-54: Consider adding documentation for the helper functions

The constants and function declarations are well-structured, but adding documentation would improve maintainability.

Add function documentation like this:

+/**
+ * Compresses the source data using the provided compressor with varying chunk sizes
+ * @param compressor The compression algorithm implementation
+ * @param src The source data to compress
+ */
auto compress(std::unique_ptr<Compressor> compressor, char const* src) -> void;

+/**
+ * Decompresses the data and verifies it matches the original content
+ * @param decompressor The decompression algorithm implementation
+ * @param uncompressed_buffer The original uncompressed data for comparison
+ * @param decompressed_buffer Buffer to store decompressed data
+ */
auto decompress_and_compare(
        std::unique_ptr<Decompressor> decompressor,
        Array<char> const& uncompressed_buffer,
        Array<char>& decompressed_buffer
) -> void;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b20162f and df41b22.

📒 Files selected for processing (1)
  • components/core/tests/test-StreamingCompression.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/tests/test-StreamingCompression.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/tests/test-StreamingCompression.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
🔇 Additional comments (4)
components/core/tests/test-StreamingCompression.cpp (4)

7-8: LGTM! Header includes and using declarations are well-organized

The additions follow the existing pattern and improve code readability.

Also applies to: 20-20, 31-32


Line range hint 56-104: LGTM! Helper functions are well-implemented

The functions demonstrate good practices:

  • RAII for resource management
  • Proper error handling
  • Comprehensive validation of decompressed data

133-136: Improve documentation for LZMA compression test section

The LZMA section differs from other compression tests as it only tests compression. This should be clearly documented.

Apply this change:

-    SECTION("LZMA compression") {
+    SECTION("LZMA compression (compression-only test)") {
+        // Note: Decompression testing is intentionally omitted as this section
+        // focuses on verifying the compression functionality
         compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>();
         compress(std::move(compressor), uncompressed_buffer.data());
     }

138-138: LGTM! Proper cleanup of test artifacts

The cleanup code ensures no test files are left behind.

@Bill-hbrhbr Bill-hbrhbr requested a review from haiqi96 December 22, 2024 22:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
components/core/tests/test-StreamingCompression.cpp (1)

LZMA decompressor implementation is missing

Based on the codebase analysis, while LZMA compression is implemented (Compressor.hpp, Compressor.cpp, Constants.hpp), there is no corresponding decompressor implementation. This is unlike the ZStd and Passthrough implementations which have both compressor and decompressor components. The test file should not include decompressor header as it doesn't exist yet.

  • components/core/src/clp/streaming_compression/lzma/ needs a decompressor implementation
  • Remove LZMA-related test cases from test-StreamingCompression.cpp until decompression support is added
🔗 Analysis chain

Line range hint 20-24: Consider adding LZMA decompressor header for consistency

While the LZMA compressor header is included, the corresponding decompressor header is missing, unlike the ZStd and Passthrough implementations which include both. This might indicate an incomplete implementation or missing functionality.

Let's verify if the LZMA decompressor exists:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for LZMA decompressor header
fd -e hpp -e cpp "Decompressor" | rg "lzma"

Length of output: 41


Script:

#!/bin/bash
# Let's first check if LZMA decompressor exists in the streaming_compression directory
rg -l "lzma.*Decompressor" --type cpp --type hpp

# Also check the directory structure to understand the implementation pattern
fd . "src/clp/streaming_compression"

Length of output: 232


Script:

#!/bin/bash
# Let's check the directory structure first to understand the correct paths
fd streaming_compression

# Then search for any LZMA-related files
rg -l "lzma" --type cpp --type c --type h

Length of output: 411

🧹 Nitpick comments (4)
components/core/tests/test-StreamingCompression.cpp (2)

58-58: Consider avoiding string conversion overhead

The string(cCompressedFilePath) conversion could be avoided by updating FileWriter::open to accept string_view.


133-136: Improve test section documentation

While the compression-only testing for LZMA is intentional (as discussed in PR comments), the test section could be more self-documenting.

Consider adding a comment explaining why decompression is not tested:

     SECTION("LZMA compression") {
+        // Note: This section only tests compression as LZMA decompression
+        // is handled separately. See PR #614 for details.
         compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>();
         compress(std::move(compressor), uncompressed_buffer.data());
     }
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

146-149: Add error handling for write operations.

The write operation to the file writer should be wrapped in a try-catch block to handle potential I/O errors.

+    try {
         m_compressed_stream_file_writer->write(
                 clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
                 cCompressedStreamBlockBufferSize - m_lzma_stream.avail_out()
         );
+    } catch (const std::exception& e) {
+        SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    }

200-201: Enhance error logging with additional context.

Include the compression level, dictionary size, and check type in the error message to aid in debugging.

-    SPDLOG_ERROR("Error initializing the encoder: {} (error code {})", msg, static_cast<int>(rc));
+    SPDLOG_ERROR("Error initializing the encoder: {} (error code {}). Compression level: {}, Dictionary size: {}, Check: {}",
+                 msg, static_cast<int>(rc), compression_level, dict_size, static_cast<int>(check));
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df41b22 and 524fe1d.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1 hunks)
  • components/core/tests/test-StreamingCompression.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/tests/test-StreamingCompression.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (2)
components/core/tests/test-StreamingCompression.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
🔇 Additional comments (3)
components/core/tests/test-StreamingCompression.cpp (1)

34-54: LGTM! Well-structured constants and function declarations

The code demonstrates good practices:

  • Constants properly scoped in anonymous namespace
  • Use of constexpr for compile-time evaluation
  • Modern C++ auto return type syntax
  • Clear function signatures with appropriate parameter types
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)

40-42: 🛠️ Refactor suggestion

Consider processing remaining data before throwing an error.

Instead of throwing an error when there's unprocessed input data during close, consider processing the remaining data to ensure no data is lost.

 if (m_lzma_stream.avail_in() > 0) {
-    throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    encode_lzma();  // Process remaining data
 }

Likely invalid or redundant comment.


126-130: 🛠️ Refactor suggestion

Enhance error handling for different flush actions.

The current error handling for LZMA_BUF_ERROR doesn't distinguish between different flush actions. For LZMA_FULL_FLUSH and LZMA_FULL_BARRIER, a buffer error might be expected when the output buffer is full.

             case LZMA_BUF_ERROR:
-                // NOTE: this can happen if we are using LZMA_FULL_FLUSH or LZMA_FULL_BARRIER. These
-                // two actions keeps encoding input data alongside flushing buffered encoded data.
-                SPDLOG_ERROR("LZMA compressor input stream is corrupt. No encoding "
-                             "progress can be made.");
-                throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+                if (flush_action == LZMA_FULL_FLUSH || flush_action == LZMA_FULL_BARRIER) {
+                    if (m_lzma_stream.avail_in() > 0 && m_lzma_stream.avail_out() > 0) {
+                        SPDLOG_ERROR("LZMA flush failed: Unable to make progress with available buffers");
+                        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+                    }
+                    break;
+                }
+                SPDLOG_ERROR("Unexpected LZMA_BUF_ERROR during flush operation");
+                throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);

Likely invalid or redundant comment.

Copy link
Contributor

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now. Have you guys agreed on a PR title?

@Bill-hbrhbr
Copy link
Contributor Author

Looks good to me now. Have you guys agreed on a PR title?

Yes. The PR title is changed per David and Kirk's suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.