Refactor with pool removal #787

JohannesLichtenberger · 2025-10-18T19:59:39Z

No description provided.

- Updated all JSON node types (OBJECT, ARRAY, OBJECT_KEY, STRING_VALUE, NUMBER_VALUE, etc.) to use uniform MemorySegment-based deserialization pattern - Implemented lazy loading for all value types (strings, numbers, booleans, nulls) - Nodes now deserialize using layout-based slicing for better performance - Removed ~100 lines of unused helper methods from NodeKind - Fixed AbstractStringNode hash computation to use toByteArray() instead of getDestination() - All JSON nodes now follow the same pattern as OBJECT and ARRAY for consistency - Build verified successful with no compilation errors

…ialization - Add size prefix (4 bytes) after NodeKind byte to avoid reading variable-sized data - Use 8-byte aligned headers (NodeKind + size + 3-byte padding) for proper alignment - Add end padding to ensure each node's total size is multiple of 8 - Switch all JSON nodes to UNALIGNED VarHandles for compatibility with factory-created nodes - Fix ObjectKeyNode to include 4-byte internal padding before hash field - Fix JsonNodeFactoryImpl to write internal padding when creating ObjectKeyNode - Fix setBooleanValue to handle both BooleanNode and ObjectBooleanNode types - Remove complex size calculation methods (calculateStopBitDataSize, calculateNumberDataSize) Benefits: - No double-reading of variable-sized content (strings, numbers) - Faster deserialization with direct MemorySegment slicing - Simpler, more maintainable code - Tests: PathSummaryTest and JsonNodeTrxGetPreviousRevisionNumberTest passing

…ules The net.openhft.hashing library needs access to sun.nio.ch.DirectBuffer when hashing DirectByteBuffer instances created from MemorySegments. Without these --add-opens flags, tests fail with IllegalAccessError. This fix allows: - Access to sun.nio.ch for DirectBuffer operations - Access to java.nio for ByteBuffer operations Tests now pass successfully.

…dding format - Add NodeKind byte before size prefix - Use 3 bytes padding (total 8 bytes with NodeKind) - Skip NodeKind byte before deserialize - Tests now pass with proper 8-byte alignment

…adding format - Fixed StringNodeTest, NumberNodeTest, BooleanNodeTest, NullNodeTest - Fixed ObjectNumberNodeTest, ObjectStringNodeTest, ObjectBooleanNodeTest, ObjectNullNodeTest, ObjectKeyNodeTest - Corrected serialization order for value nodes (siblings before/after value depending on node type) - All JSON node tests now pass with proper 8-byte alignment

- Created JsonNodeTestHelper with writeHeader(), writeEndPadding(), updateSizePrefix(), and finalizeSerialization() methods - Updated all 11 JSON node tests to use the helper methods - Reduced ~20 lines of duplicated code per test to 1-2 lines - Tests remain fully passing

…izer class - Created JsonNodeSerializer in main source with writeSizePrefix(), readSizePrefix(), writeEndPadding(), updateSizePrefix(), and calculateEndPadding() - Removed duplicate private methods from NodeKind.java - Updated NodeKind.java to use JsonNodeSerializer methods - Updated JsonNodeTestHelper to delegate to JsonNodeSerializer - Eliminated code duplication between production and test code - All tests still pass

- Added NodeKind byte before serialization in all 4 round-trip tests - Added bytesIn.readByte() to skip NodeKind byte before deserialization - Ensures proper 8-byte alignment for MemorySegment access - All 17 tests now pass

- Added serializeNumber() and deserializeNumber() static methods to NodeKind - Added helper methods serializeBigInteger() and deserializeBigInteger() - Updated NUMBER_VALUE and OBJECT_NUMBER_VALUE serialization to use shared methods - Removed duplicate serialization/deserialization code from NumberNode - Removed duplicate serialization/deserialization code from ObjectNumberNode - Both node types now use centralized logic from NodeKind for consistency

…obal() - Updated both constructors to use Arena.ofAuto() for automatic memory management - Arena.ofAuto() automatically releases memory when no longer reachable - Improves memory management by allowing automatic cleanup instead of global lifetime

…rializeNumber() - Changed NumberNode.serializeNumber() to NodeKind.serializeNumber() - Changed ObjectNumberNode.serializeNumber() to NodeKind.serializeNumber() - Fixes compilation errors after refactoring number serialization to NodeKind

…y offset - Changed serializeDelegateWithoutIDs to use putVarLong instead of writeLong - Changed deserializeNodeDelegateWithoutIDs to use getVarLong instead of readLong - This fixes JsonRedBlackTreeIntegrationTest failures - RB nodes (CASRB, PATHRB, NAMERB, RB_NODE_VALUE) need variable-length encoding for efficient storage since parent key offsets are typically small values

- Revert GrowingMemorySegment to use Arena.ofAuto() by default * Nodes store MemorySegment references that outlive BytesOut instances * Arena.ofAuto() allows GC to manage cleanup when segments become unreachable * Prevents premature deallocation bugs - Add Arena parameter constructors for explicit arena control * GrowingMemorySegment(Arena, int) for custom arena * MemorySegmentBytesOut(Arena, int) for custom arena * Enables using confined arenas for temporary buffers with clear lifecycles - Optimize KeyValueLeafPage.processEntries() with Arena.ofConfined() * Use confined arena for temporary serialization buffers * Normal records: data copied to slotMemory, temp buffer freed immediately * Overflow records: explicitly copied to Arena.global() for persistence * Provides immediate memory cleanup for ~99% of serialization operations This hybrid approach balances manual control (where beneficial) with automatic management (where lifecycles are complex). All tests pass.

- Added serializeNumber() and deserializeNumber() static methods to NodeKind - Added helper methods serializeBigInteger() and deserializeBigInteger() - Updated NUMBER_VALUE and OBJECT_NUMBER_VALUE serialization to use shared methods - Removed duplicate serialization/deserialization code from NumberNode - Removed duplicate serialization/deserialization code from ObjectNumberNode - Both node types now use centralized logic from NodeKind for consistency

…obal() - Updated both constructors to use Arena.ofAuto() for automatic memory management - Arena.ofAuto() automatically releases memory when no longer reachable - Improves memory management by allowing automatic cleanup instead of global lifetime

…rializeNumber() - Changed NumberNode.serializeNumber() to NodeKind.serializeNumber() - Changed ObjectNumberNode.serializeNumber() to NodeKind.serializeNumber() - Fixes compilation errors after refactoring number serialization to NodeKind

…y offset - Changed serializeDelegateWithoutIDs to use putVarLong instead of writeLong - Changed deserializeNodeDelegateWithoutIDs to use getVarLong instead of readLong - This fixes JsonRedBlackTreeIntegrationTest failures - RB nodes (CASRB, PATHRB, NAMERB, RB_NODE_VALUE) need variable-length encoding for efficient storage since parent key offsets are typically small values

- Revert GrowingMemorySegment to use Arena.ofAuto() by default * Nodes store MemorySegment references that outlive BytesOut instances * Arena.ofAuto() allows GC to manage cleanup when segments become unreachable * Prevents premature deallocation bugs - Add Arena parameter constructors for explicit arena control * GrowingMemorySegment(Arena, int) for custom arena * MemorySegmentBytesOut(Arena, int) for custom arena * Enables using confined arenas for temporary buffers with clear lifecycles - Optimize KeyValueLeafPage.processEntries() with Arena.ofConfined() * Use confined arena for temporary serialization buffers * Normal records: data copied to slotMemory, temp buffer freed immediately * Overflow records: explicitly copied to Arena.global() for persistence * Provides immediate memory cleanup for ~99% of serialization operations This hybrid approach balances manual control (where beneficial) with automatic management (where lifecycles are complex). All tests pass.

…lization' into refactor-json-nodes-lazy-deserialization # Conflicts: # JsonShredderTest_testChicagoDescendantAxisParallel_2024_08_12_221741.jfr.zip # analysis-5-trxs.jfr

…alization - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 68 bytes (16B NodeDelegate + 32B StructNode + 20B NameNode) - Remove typeKey from serialized data (computed on-the-fly as 'xs:untyped' hash) - Add size prefix and padding for proper 8-byte alignment - Update ELEMENT serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl to create MemorySegment-based ElementNode instances - Fix GrowingMemorySegment.grow() buffer overflow bug (copy only valid bytes) - Update ElementNodeTest and NodePageTest for new implementation All XML tests passing (271 tests, 47 skipped)

…rialization - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 36 bytes (16B NodeDelegate + 20B NameNode) - Value bytes stored separately (variable length) with compressed flag - Remove typeKey from serialized data (computed on-the-fly as 'xs:untyped' hash) - Add size prefix and padding for proper 8-byte alignment - Update ATTRIBUTE serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl to create MemorySegment-based AttributeNode instances - Update AttributeNodeTest for new implementation All XML tests passing (271 tests, 47 skipped)

…rialization - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 36 bytes (16B NodeDelegate + 20B NameNode) - Add size prefix and padding for proper 8-byte alignment - Update NAMESPACE serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl to create MemorySegment-based NamespaceNode instances - Update NamespaceNodeTest for new implementation All XML tests passing

…alization - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 32 bytes (16B NodeDelegate + 16B sibling keys) - Value data stored separately (not in MemorySegment) - Update COMMENT serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl.createCommentNode for MemorySegment creation - Update CommentNodeTest for new implementation All XML tests passing (271 tests, 47 skipped)

…tion - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 68 bytes (16B NodeDelegate + 32B StructNode + 20B NameNode) - Optional fields: childCount (8B), hash (8B), descendantCount (8B) - Value data stored separately (not in MemorySegment) - Update PROCESSING_INSTRUCTION serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl.createPINode for MemorySegment creation - Update PINodeTest for new implementation All XML tests passing (271 tests, 47 skipped)

…zation - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 32 bytes (16B NodeDelegate + 16B sibling keys) - Text nodes cannot have children, so no child-related fields - Value data stored separately (not in MemorySegment) - Update TEXT serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl.createTextNode for MemorySegment creation - Update TextNodeTest for new implementation All XML tests passing (271 tests, 47 skipped)

- Store compressed bytes directly when value is compressed - Decompress only when getRawValue() is called - Clear compressed data after decompression to save memory - Maintain backward compatibility with uncompressed values Benefits: - Reduces memory pressure for compressed text values - Defers decompression cost until value is actually accessed - Improves deserialization performance for nodes that are never read All XML tests passing (271 tests, 47 skipped)

… package - Renamed JsonNodeTestHelper to NodeTestHelper for broader applicability - Moved from io.sirix.node.json to io.sirix.node package - Now can be used by both JSON and XML node tests - Updated all references in JSON test files (11 files) - Updated all references in XML test files (ElementNodeTest, NamespaceNodeTest) - Added missing import statements to all affected test files Benefits: - Better code organization - test helper is now in the parent package - Can be reused across JSON and XML node tests - Reduces code duplication All tests passing

- PINode: Removed unused QNm import (method returns null anyway, but QNm is needed for return type) - CommentNode: Compression import is actually used for decompression, kept it - TextNode: All imports are used - ElementNode: All imports are used Note: During refactoring to MemorySegment-backed storage, some imports became unnecessary but most are still required for the new implementation. All tests passing

…torage - Remove childCount and descendantCount serialization/deserialization from all value nodes (StringNode, NumberNode, BooleanNode, NullNode, ObjectStringNode, ObjectNumberNode, ObjectBooleanNode, ObjectNullNode) as these are always leaf nodes - Fix ObjectKeyNode to properly store and retrieve descendantCount from MemorySegment instead of returning hardcoded value 1, enabling support for complex subtrees as values - Update memory layouts to place fixed-length fields before variable-length content (StringNode, ObjectStringNode, NumberNode, ObjectNumberNode) - Fix VarHandle offset calculation for descendantCount in ObjectKeyNode - Make increment/decrement methods for childCount and descendantCount no-ops in value nodes - Update JsonNodeFactoryImpl and NodeKind serialization to match new layouts - Update test data creation to match new serialization format - All 852 tests passing

- memory-leak-diagnostic.log - *.jfr (Java Flight Recorder files)

- Gradle: 8.10 → 9.1.0 - Java target: 22 → 25 - Kotlin: 2.2.10 → 2.2.20 - Mockito: 5.13.0 → 5.19.0 - ByteBuddy: 1.15.1 → 1.17.5 - Shadow plugin: 7.0.0 → 8.3.3 (com.gradleup.shadow) - Fix Gradle 9 compatibility: mainClassName → mainClass - Apply Shadow plugin only to sirix-rest-api - Update jvmToolchain to 25 in sirix-kotlin-api

- REMOVED: KeyValueLeafPagePool class and all references - REMOVED: KeyValueLeafPagePoolTest - CHANGED: KeyValueLeafPage now allocates segments directly from allocator - CHANGED: Added close() method to KeyValueLeafPage for segment cleanup - CHANGED: Cache removal listeners now call page.close() directly - CHANGED: RecordPageCache and PageCache use removalListener instead of evictionListener - CHANGED: Direct allocation in NodePageTrx, PageKind, PageUtils - FIXED: Allocator initialization in Databases.initAllocator() and freeAllocatedMemory() - UPDATED: Test files to use allocator directly instead of pool Benefits: - ~600 lines of code removed (pool + test + references) - Simpler architecture: Allocate → Use → Cache eviction → close() → GC - Pages own their segments: No complex pooling layer - Memory management via Caffeine cache + allocator, similar to UmbraDB approach

- Align all cache and allocator files with remote branch - Include diagnostic logging infrastructure - Add FFILz4Compressor implementation - Remove BufferManagerImpl.java.bak backup file - All core Java files now match remote branch exactly

- Set Java toolchain to version 25 across all subprojects - Configure Kotlin modules to use Java 25 toolchain (falls back to Java 24 target) - Add jvmTargetValidationMode.set(WARNING) in subprojects to allow Java 25/Kotlin 24 mismatch - Kotlin 2.2.20 doesn't yet fully support Java 25, so it compiles to Java 24 bytecode - Java modules compile with --enable-preview for Java 25 preview features - sirix-kotlin-api, sirix-rest-api, sirix-kotlin-cli now use jvmToolchain(25)

The Chicago dataset test is disabled for now to speed up test runs during development.

String Templates were removed from Java 25, converted to regular string concatenation. Changed STR."..." syntax to standard string concatenation with + operator.

The JUnit Platform Launcher is required for running JUnit 5 tests. Without it, test execution fails with 'Failed to load JUnit Platform' error.

Added JUnit Jupiter API, Engine, and Platform Launcher dependencies. These are required for running JUnit 5 tests with useJUnitPlatform().

JohannesLichtenberger and others added 30 commits August 29, 2024 14:21

Attempt to change to single MemorySegment for slots

cc85387

Remove some output

53976ab

Update slotted page stuff...

e0773dc

Disable tests (shouldn't have been committed)

9db8291

Update adding reference counting to the cached pages

b8ee7fb

Fix closing/clearing of pages

18a2af0

Fix closing/clearing of pages

1aaafd1

Minor updates regarding less memory usage, also fixing a resource leak

e34784d

Minor simplifications

0ae7926

Minor simplifications

bfa8ee3

Fix memory leak

d98608e

Remove leftover stuff from reusing a byte-array for decompression

cac1b97

Add custom allocator and page pool

6feeeed

Several fixes for custom allocator and page pool

788b805

Fix ArrayNodeTest and ObjectNodeTest to use proper NodeKind byte + pa…

2d5f576

…dding format - Add NodeKind byte before size prefix - Use 3 bytes padding (total 8 bytes with NodeKind) - Skip NodeKind byte before deserialize - Tests now pass with proper 8-byte alignment

Effectively remove the bin folder from version control

3fca3b8

Effectively remove the bin folder from version control

6c28331

Adapt .gitignore to ignore the sirix-core bin-directory

227816c

Johannes Lichtenberger added 24 commits October 6, 2025 02:27

Effectively remove the bin folder from version control

317f0eb

Effectively remove the bin folder from version control

8c98d95

Adapt .gitignore to ignore the sirix-core bin-directory

5a8eac8

Merge remote-tracking branch 'origin/refactor-json-nodes-lazy-deseria…

8c711a5

…lization' into refactor-json-nodes-lazy-deserialization # Conflicts: # JsonShredderTest_testChicagoDescendantAxisParallel_2024_08_12_221741.jfr.zip # analysis-5-trxs.jfr

Add diagnostic and profiling files to .gitignore

de1cf6d

- memory-leak-diagnostic.log - *.jfr (Java Flight Recorder files)

Fix test file: replace pagePool references with allocator

9feb824

Sync with origin/remove-keyvalueleafpage-pool

9db0ae2

- Align all cache and allocator files with remote branch - Include diagnostic logging infrastructure - Add FFILz4Compressor implementation - Remove BufferManagerImpl.java.bak backup file - All core Java files now match remote branch exactly

This comment was marked as outdated.

Sign in to view

Johannes Lichtenberger added 5 commits October 18, 2025 22:44

Disable testShredderAndTraverseChicago test

eb033be

The Chicago dataset test is disabled for now to speed up test runs during development.

Remove String Template syntax from LoadIntegrationTest

6bebcec

String Templates were removed from Java 25, converted to regular string concatenation. Changed STR."..." syntax to standard string concatenation with + operator.

Add missing JUnit Platform Launcher dependency to sirix-kotlin-cli

100d2c8

The JUnit Platform Launcher is required for running JUnit 5 tests. Without it, test execution fails with 'Failed to load JUnit Platform' error.

Add missing JUnit 5 dependencies to sirix-rest-api

f378eaa

Added JUnit Jupiter API, Engine, and Platform Launcher dependencies. These are required for running JUnit 5 tests with useJUnitPlatform().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor with pool removal #787

Refactor with pool removal #787

Uh oh!

JohannesLichtenberger commented Oct 18, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Refactor with pool removal #787

Are you sure you want to change the base?

Refactor with pool removal #787

Uh oh!

Conversation

JohannesLichtenberger commented Oct 18, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants