Skip to content

Conversation

@JohannesLichtenberger
Copy link
Member

No description provided.

JohannesLichtenberger and others added 30 commits August 29, 2024 14:21
- Updated all JSON node types (OBJECT, ARRAY, OBJECT_KEY, STRING_VALUE, NUMBER_VALUE, etc.) to use uniform MemorySegment-based deserialization pattern
- Implemented lazy loading for all value types (strings, numbers, booleans, nulls)
- Nodes now deserialize using layout-based slicing for better performance
- Removed ~100 lines of unused helper methods from NodeKind
- Fixed AbstractStringNode hash computation to use toByteArray() instead of getDestination()
- All JSON nodes now follow the same pattern as OBJECT and ARRAY for consistency
- Build verified successful with no compilation errors
…ialization

- Add size prefix (4 bytes) after NodeKind byte to avoid reading variable-sized data
- Use 8-byte aligned headers (NodeKind + size + 3-byte padding) for proper alignment
- Add end padding to ensure each node's total size is multiple of 8
- Switch all JSON nodes to UNALIGNED VarHandles for compatibility with factory-created nodes
- Fix ObjectKeyNode to include 4-byte internal padding before hash field
- Fix JsonNodeFactoryImpl to write internal padding when creating ObjectKeyNode
- Fix setBooleanValue to handle both BooleanNode and ObjectBooleanNode types
- Remove complex size calculation methods (calculateStopBitDataSize, calculateNumberDataSize)

Benefits:
- No double-reading of variable-sized content (strings, numbers)
- Faster deserialization with direct MemorySegment slicing
- Simpler, more maintainable code
- Tests: PathSummaryTest and JsonNodeTrxGetPreviousRevisionNumberTest passing
…ules

The net.openhft.hashing library needs access to sun.nio.ch.DirectBuffer
when hashing DirectByteBuffer instances created from MemorySegments.
Without these --add-opens flags, tests fail with IllegalAccessError.

This fix allows:
- Access to sun.nio.ch for DirectBuffer operations
- Access to java.nio for ByteBuffer operations

Tests now pass successfully.
…dding format

- Add NodeKind byte before size prefix
- Use 3 bytes padding (total 8 bytes with NodeKind)
- Skip NodeKind byte before deserialize
- Tests now pass with proper 8-byte alignment
…adding format

- Fixed StringNodeTest, NumberNodeTest, BooleanNodeTest, NullNodeTest
- Fixed ObjectNumberNodeTest, ObjectStringNodeTest, ObjectBooleanNodeTest, ObjectNullNodeTest, ObjectKeyNodeTest
- Corrected serialization order for value nodes (siblings before/after value depending on node type)
- All JSON node tests now pass with proper 8-byte alignment
- Created JsonNodeTestHelper with writeHeader(), writeEndPadding(), updateSizePrefix(), and finalizeSerialization() methods
- Updated all 11 JSON node tests to use the helper methods
- Reduced ~20 lines of duplicated code per test to 1-2 lines
- Tests remain fully passing
…izer class

- Created JsonNodeSerializer in main source with writeSizePrefix(), readSizePrefix(), writeEndPadding(), updateSizePrefix(), and calculateEndPadding()
- Removed duplicate private methods from NodeKind.java
- Updated NodeKind.java to use JsonNodeSerializer methods
- Updated JsonNodeTestHelper to delegate to JsonNodeSerializer
- Eliminated code duplication between production and test code
- All tests still pass
- Added NodeKind byte before serialization in all 4 round-trip tests
- Added bytesIn.readByte() to skip NodeKind byte before deserialization
- Ensures proper 8-byte alignment for MemorySegment access
- All 17 tests now pass
- Added serializeNumber() and deserializeNumber() static methods to NodeKind
- Added helper methods serializeBigInteger() and deserializeBigInteger()
- Updated NUMBER_VALUE and OBJECT_NUMBER_VALUE serialization to use shared methods
- Removed duplicate serialization/deserialization code from NumberNode
- Removed duplicate serialization/deserialization code from ObjectNumberNode
- Both node types now use centralized logic from NodeKind for consistency
…obal()

- Updated both constructors to use Arena.ofAuto() for automatic memory management
- Arena.ofAuto() automatically releases memory when no longer reachable
- Improves memory management by allowing automatic cleanup instead of global lifetime
…rializeNumber()

- Changed NumberNode.serializeNumber() to NodeKind.serializeNumber()
- Changed ObjectNumberNode.serializeNumber() to NodeKind.serializeNumber()
- Fixes compilation errors after refactoring number serialization to NodeKind
…y offset

- Changed serializeDelegateWithoutIDs to use putVarLong instead of writeLong
- Changed deserializeNodeDelegateWithoutIDs to use getVarLong instead of readLong
- This fixes JsonRedBlackTreeIntegrationTest failures
- RB nodes (CASRB, PATHRB, NAMERB, RB_NODE_VALUE) need variable-length encoding
  for efficient storage since parent key offsets are typically small values
- Revert GrowingMemorySegment to use Arena.ofAuto() by default
  * Nodes store MemorySegment references that outlive BytesOut instances
  * Arena.ofAuto() allows GC to manage cleanup when segments become unreachable
  * Prevents premature deallocation bugs

- Add Arena parameter constructors for explicit arena control
  * GrowingMemorySegment(Arena, int) for custom arena
  * MemorySegmentBytesOut(Arena, int) for custom arena
  * Enables using confined arenas for temporary buffers with clear lifecycles

- Optimize KeyValueLeafPage.processEntries() with Arena.ofConfined()
  * Use confined arena for temporary serialization buffers
  * Normal records: data copied to slotMemory, temp buffer freed immediately
  * Overflow records: explicitly copied to Arena.global() for persistence
  * Provides immediate memory cleanup for ~99% of serialization operations

This hybrid approach balances manual control (where beneficial) with
automatic management (where lifecycles are complex). All tests pass.
Johannes Lichtenberger added 24 commits October 6, 2025 02:27
- Added serializeNumber() and deserializeNumber() static methods to NodeKind
- Added helper methods serializeBigInteger() and deserializeBigInteger()
- Updated NUMBER_VALUE and OBJECT_NUMBER_VALUE serialization to use shared methods
- Removed duplicate serialization/deserialization code from NumberNode
- Removed duplicate serialization/deserialization code from ObjectNumberNode
- Both node types now use centralized logic from NodeKind for consistency
…obal()

- Updated both constructors to use Arena.ofAuto() for automatic memory management
- Arena.ofAuto() automatically releases memory when no longer reachable
- Improves memory management by allowing automatic cleanup instead of global lifetime
…rializeNumber()

- Changed NumberNode.serializeNumber() to NodeKind.serializeNumber()
- Changed ObjectNumberNode.serializeNumber() to NodeKind.serializeNumber()
- Fixes compilation errors after refactoring number serialization to NodeKind
…y offset

- Changed serializeDelegateWithoutIDs to use putVarLong instead of writeLong
- Changed deserializeNodeDelegateWithoutIDs to use getVarLong instead of readLong
- This fixes JsonRedBlackTreeIntegrationTest failures
- RB nodes (CASRB, PATHRB, NAMERB, RB_NODE_VALUE) need variable-length encoding
  for efficient storage since parent key offsets are typically small values
- Revert GrowingMemorySegment to use Arena.ofAuto() by default
  * Nodes store MemorySegment references that outlive BytesOut instances
  * Arena.ofAuto() allows GC to manage cleanup when segments become unreachable
  * Prevents premature deallocation bugs

- Add Arena parameter constructors for explicit arena control
  * GrowingMemorySegment(Arena, int) for custom arena
  * MemorySegmentBytesOut(Arena, int) for custom arena
  * Enables using confined arenas for temporary buffers with clear lifecycles

- Optimize KeyValueLeafPage.processEntries() with Arena.ofConfined()
  * Use confined arena for temporary serialization buffers
  * Normal records: data copied to slotMemory, temp buffer freed immediately
  * Overflow records: explicitly copied to Arena.global() for persistence
  * Provides immediate memory cleanup for ~99% of serialization operations

This hybrid approach balances manual control (where beneficial) with
automatic management (where lifecycles are complex). All tests pass.
…lization' into refactor-json-nodes-lazy-deserialization

# Conflicts:
#	JsonShredderTest_testChicagoDescendantAxisParallel_2024_08_12_221741.jfr.zip
#	analysis-5-trxs.jfr
…alization

- Replace delegate pattern with MemorySegment-backed storage using VarHandles
- Core layout: 68 bytes (16B NodeDelegate + 32B StructNode + 20B NameNode)
- Remove typeKey from serialized data (computed on-the-fly as 'xs:untyped' hash)
- Add size prefix and padding for proper 8-byte alignment
- Update ELEMENT serialization/deserialization in NodeKind
- Update XmlNodeFactoryImpl to create MemorySegment-based ElementNode instances
- Fix GrowingMemorySegment.grow() buffer overflow bug (copy only valid bytes)
- Update ElementNodeTest and NodePageTest for new implementation

All XML tests passing (271 tests, 47 skipped)
…rialization

- Replace delegate pattern with MemorySegment-backed storage using VarHandles
- Core layout: 36 bytes (16B NodeDelegate + 20B NameNode)
- Value bytes stored separately (variable length) with compressed flag
- Remove typeKey from serialized data (computed on-the-fly as 'xs:untyped' hash)
- Add size prefix and padding for proper 8-byte alignment
- Update ATTRIBUTE serialization/deserialization in NodeKind
- Update XmlNodeFactoryImpl to create MemorySegment-based AttributeNode instances
- Update AttributeNodeTest for new implementation

All XML tests passing (271 tests, 47 skipped)
…rialization

- Replace delegate pattern with MemorySegment-backed storage using VarHandles
- Core layout: 36 bytes (16B NodeDelegate + 20B NameNode)
- Add size prefix and padding for proper 8-byte alignment
- Update NAMESPACE serialization/deserialization in NodeKind
- Update XmlNodeFactoryImpl to create MemorySegment-based NamespaceNode instances
- Update NamespaceNodeTest for new implementation

All XML tests passing
…alization

- Replace delegate pattern with MemorySegment-backed storage using VarHandles
- Core layout: 32 bytes (16B NodeDelegate + 16B sibling keys)
- Value data stored separately (not in MemorySegment)
- Update COMMENT serialization/deserialization in NodeKind
- Update XmlNodeFactoryImpl.createCommentNode for MemorySegment creation
- Update CommentNodeTest for new implementation

All XML tests passing (271 tests, 47 skipped)
…tion

- Replace delegate pattern with MemorySegment-backed storage using VarHandles
- Core layout: 68 bytes (16B NodeDelegate + 32B StructNode + 20B NameNode)
- Optional fields: childCount (8B), hash (8B), descendantCount (8B)
- Value data stored separately (not in MemorySegment)
- Update PROCESSING_INSTRUCTION serialization/deserialization in NodeKind
- Update XmlNodeFactoryImpl.createPINode for MemorySegment creation
- Update PINodeTest for new implementation

All XML tests passing (271 tests, 47 skipped)
…zation

- Replace delegate pattern with MemorySegment-backed storage using VarHandles
- Core layout: 32 bytes (16B NodeDelegate + 16B sibling keys)
- Text nodes cannot have children, so no child-related fields
- Value data stored separately (not in MemorySegment)
- Update TEXT serialization/deserialization in NodeKind
- Update XmlNodeFactoryImpl.createTextNode for MemorySegment creation
- Update TextNodeTest for new implementation

All XML tests passing (271 tests, 47 skipped)
- Store compressed bytes directly when value is compressed
- Decompress only when getRawValue() is called
- Clear compressed data after decompression to save memory
- Maintain backward compatibility with uncompressed values

Benefits:
- Reduces memory pressure for compressed text values
- Defers decompression cost until value is actually accessed
- Improves deserialization performance for nodes that are never read

All XML tests passing (271 tests, 47 skipped)
… package

- Renamed JsonNodeTestHelper to NodeTestHelper for broader applicability
- Moved from io.sirix.node.json to io.sirix.node package
- Now can be used by both JSON and XML node tests
- Updated all references in JSON test files (11 files)
- Updated all references in XML test files (ElementNodeTest, NamespaceNodeTest)
- Added missing import statements to all affected test files

Benefits:
- Better code organization - test helper is now in the parent package
- Can be reused across JSON and XML node tests
- Reduces code duplication

All tests passing
- PINode: Removed unused QNm import (method returns null anyway, but QNm is needed for return type)
- CommentNode: Compression import is actually used for decompression, kept it
- TextNode: All imports are used
- ElementNode: All imports are used

Note: During refactoring to MemorySegment-backed storage, some imports became
unnecessary but most are still required for the new implementation.

All tests passing
…torage

- Remove childCount and descendantCount serialization/deserialization from all value nodes (StringNode, NumberNode, BooleanNode, NullNode, ObjectStringNode, ObjectNumberNode, ObjectBooleanNode, ObjectNullNode) as these are always leaf nodes
- Fix ObjectKeyNode to properly store and retrieve descendantCount from MemorySegment instead of returning hardcoded value 1, enabling support for complex subtrees as values
- Update memory layouts to place fixed-length fields before variable-length content (StringNode, ObjectStringNode, NumberNode, ObjectNumberNode)
- Fix VarHandle offset calculation for descendantCount in ObjectKeyNode
- Make increment/decrement methods for childCount and descendantCount no-ops in value nodes
- Update JsonNodeFactoryImpl and NodeKind serialization to match new layouts
- Update test data creation to match new serialization format
- All 852 tests passing
- memory-leak-diagnostic.log
- *.jfr (Java Flight Recorder files)
- Gradle: 8.10 → 9.1.0
- Java target: 22 → 25
- Kotlin: 2.2.10 → 2.2.20
- Mockito: 5.13.0 → 5.19.0
- ByteBuddy: 1.15.1 → 1.17.5
- Shadow plugin: 7.0.0 → 8.3.3 (com.gradleup.shadow)
- Fix Gradle 9 compatibility: mainClassName → mainClass
- Apply Shadow plugin only to sirix-rest-api
- Update jvmToolchain to 25 in sirix-kotlin-api
- REMOVED: KeyValueLeafPagePool class and all references
- REMOVED: KeyValueLeafPagePoolTest
- CHANGED: KeyValueLeafPage now allocates segments directly from allocator
- CHANGED: Added close() method to KeyValueLeafPage for segment cleanup
- CHANGED: Cache removal listeners now call page.close() directly
- CHANGED: RecordPageCache and PageCache use removalListener instead of evictionListener
- CHANGED: Direct allocation in NodePageTrx, PageKind, PageUtils
- FIXED: Allocator initialization in Databases.initAllocator() and freeAllocatedMemory()
- UPDATED: Test files to use allocator directly instead of pool

Benefits:
- ~600 lines of code removed (pool + test + references)
- Simpler architecture: Allocate → Use → Cache eviction → close() → GC
- Pages own their segments: No complex pooling layer
- Memory management via Caffeine cache + allocator, similar to UmbraDB approach
- Align all cache and allocator files with remote branch
- Include diagnostic logging infrastructure
- Add FFILz4Compressor implementation
- Remove BufferManagerImpl.java.bak backup file
- All core Java files now match remote branch exactly
cursor[bot]

This comment was marked as outdated.

Johannes Lichtenberger added 5 commits October 18, 2025 22:44
- Set Java toolchain to version 25 across all subprojects
- Configure Kotlin modules to use Java 25 toolchain (falls back to Java 24 target)
- Add jvmTargetValidationMode.set(WARNING) in subprojects to allow Java 25/Kotlin 24 mismatch
- Kotlin 2.2.20 doesn't yet fully support Java 25, so it compiles to Java 24 bytecode
- Java modules compile with --enable-preview for Java 25 preview features
- sirix-kotlin-api, sirix-rest-api, sirix-kotlin-cli now use jvmToolchain(25)
The Chicago dataset test is disabled for now to speed up test runs during development.
String Templates were removed from Java 25, converted to regular string concatenation.
Changed STR."..." syntax to standard string concatenation with + operator.
The JUnit Platform Launcher is required for running JUnit 5 tests.
Without it, test execution fails with 'Failed to load JUnit Platform' error.
Added JUnit Jupiter API, Engine, and Platform Launcher dependencies.
These are required for running JUnit 5 tests with useJUnitPlatform().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants