Skip to content

Conversation

tuhaihe
Copy link
Member

@tuhaihe tuhaihe commented Oct 9, 2025

Move yyjson from top-level dependency directory to contrib/pax_storage/src/cpp/contrib/yyjson since it's only used by the PAX storage component. Update all references in CMake build files, .gitmodules, and documentation to reflect the new location. This change improves the project structure by keeping PAX-specific dependencies together and removing the unnecessary top-level dependency directory.

The move includes:

  • Moving yyjson directory to contrib/pax_storage/src/cpp/contrib/yyjson
  • Updating CMakeLists.txt and pax.cmake to reference the new location
  • Updating .gitmodules to reflect the new submodule path
  • Updating README.md documentation to show the new location

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


Move yyjson from top-level dependency directory to
`contrib/pax_storage/src/cpp/contrib/yyjson` since it's only used by the
PAX storage component. Update all references in CMake build files,
.gitmodules, and documentation to reflect the new location. This change
improves the project structure by keeping PAX-specific dependencies
together and removing the unnecessary top-level dependency directory.

The move includes:
- Moving yyjson directory to contrib/pax_storage/src/cpp/contrib/yyjson
- Updating CMakeLists.txt and pax.cmake to reference the new location
- Updating .gitmodules to reflect the new submodule path
- Updating README.md documentation to show the new location
@edespino
Copy link
Contributor

Hi @tuhaihe !

I reviewed your PR and wanted to share an alternative implementation that eliminates the need for git submodules entirely. This approach uses CMake's FetchContent module, which I've seen work well in other Apache projects like MADlib.

Alternative: CMake FetchContent

I've implemented this on branch fetchcontent-yyjson for your consideration.

Key change: Replace the git submodule with automatic dependency fetching during CMake configuration.

Implementation

contrib/pax_storage/CMakeLists.txt:

if(USE_MANIFEST_API AND NOT USE_PAX_CATALOG)
    include(FetchContent)

    # Try system package first
    find_package(yyjson QUIET)

    if(NOT yyjson_FOUND)
        message(STATUS "yyjson not found in system, fetching from GitHub...")
        FetchContent_Declare(
            yyjson
            GIT_REPOSITORY https://github.com/ibireme/yyjson.git
            GIT_TAG 0.12.0
            GIT_SHALLOW TRUE
        )

        set(SAVED_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS})
        set(BUILD_SHARED_LIBS ON)

        FetchContent_MakeAvailable(yyjson)

        set(BUILD_SHARED_LIBS ${SAVED_BUILD_SHARED_LIBS})
    else()
        message(STATUS "Using system yyjson package")
    endif()
endif()

contrib/pax_storage/src/cpp/cmake/pax.cmake:

# Update include path to use FetchContent variable
set(pax_target_include ${pax_target_include} ${yyjson_SOURCE_DIR}/src)

Remove from .gitmodules - No submodule entry needed.

Benefits

  1. Simpler workflow - No git submodule update --init --recursive required
  2. Automatic handling - CMake downloads yyjson when needed during configuration
  3. Version explicit - GIT_TAG 0.12.0 (latest release) is clearer than a commit hash
    • Submodules point to opaque commit SHAs - without checking the yyjson repo, you can't tell if it's a release, a random commit, or how old it is
    • FetchContent uses semantic versions - immediately clear you're using the latest stable release
  4. System package support - Respects system-installed yyjson if available
  5. Pattern consistency - Aligns with how protobuf and zstd are already handled via find_package()

Important Note

During my review, I noticed that yyjson is only used when both USE_MANIFEST_API=ON and USE_PAX_CATALOG=OFF, which is not the default configuration. This means:

  • Default builds (./configure --enable-pax) don't use yyjson at all
  • Your reorganization and this alternative have zero impact on normal builds
  • The dependency is only relevant for the JSON manifest implementation mode

Testing

To verify this works when yyjson IS needed:

cd contrib/pax_storage/build
cmake .. -DUSE_MANIFEST_API=ON -DUSE_PAX_CATALOG=OFF
# Should see: "yyjson not found in system, fetching from GitHub..."
make

Recommendation

Your PR's organizational improvement is valuable regardless. This FetchContent approach is simply an alternative that:

  • Reduces the dependency management burden (no submodule commands)
  • Follows patterns used in other Apache projects
  • Makes the build more self-contained

Both approaches work - this is just offered as food for thought. Happy to discuss!

Branch for reference: fetchcontent-yyjson

@tuhaihe
Copy link
Member Author

tuhaihe commented Oct 11, 2025

Hi @edespino, thanks for your great idea. There are other submodules needed for PAX building, such as tabulate. I prefer to keep the same behavior across yyjson and the other submodules so that users can download all required submodules with git submodule update --init --recursive once, rather than downloading yyjson separately during the build.

@tuhaihe
Copy link
Member Author

tuhaihe commented Oct 11, 2025

More:

I searched and found that yyjson is available via the EPEL repository on Rocky Linux 9, but it is not available in Ubuntu 22.04 (though it can be found in Ubuntu 25.04). Given that it is not a commonly provided package across popular Linux distros, we can continue using the submodule approach for now.

@tuhaihe
Copy link
Member Author

tuhaihe commented Oct 11, 2025

Would like to have more voices from the core PAX developers on this. cc @gfphoenix78 @jiaqizho @gongxun0928

At least, it's not good to place the yyjson separately at the top of the dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants