Skip to content

Conversation

michaeladler
Copy link
Contributor

Summary

Replace the O(n) per-triple equality check with an O(1) map-based set for each graph, improving parsing performance for large datasets (>1000 entries).

To make Quad hashable, all structs implementing the Node interface are now value-based (e.g., IRI implements the Node interface instead of *IRI).

Note: The Equal method from the Node interface is now unused and could be removed. However, I've kept it to maintain compatibility with existing downstream code.

Basic Example

No change in existing behavior.

Motivation

The motivation is (yet again) performance. Benchmark results from my machine show:

  • For 1000 objects: new implementation is 1.8s faster
  • For 2000 objects: 6.3s faster

Checks

  • Passes make test

Replace the O(n) per-triple equality check with an O(1) map-based set for
each graph, improving parsing performance for large (>1000) datasets.

To make Quad hashable, all structs implementing the Node interface are now
value-based (e.g., IRI implements the Node interface instead of *IRI).

Signed-off-by: Michael Adler <michael.adler@siemens.com>
Copy link
Member

@kazarena kazarena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you.

@kazarena kazarena merged commit 3fbc0ad into piprate:master Jul 14, 2025
6 checks passed
@michaeladler michaeladler deleted the perf/improve-n-quads-2 branch July 15, 2025 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants