You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm interested in this. I've been playing around with the idea of implementing RDF terms with object interning to save memory and avoid copying. This issue is a continuation from #2866.
In any embarassingly parallel, distributed ETLs where I've used RDFLib, I've always seen the memory usage grow over time. By implementing object interning, we may be able to fix this issue and potentially stop the memory growth when objects are no longer referenced. I think this particular issue is also related to this other issue described here #740.
The key is to implement RDF terms as immutable data structures. This way, we can safely reuse references to the same object if the unicode code point sequence in the term's value is the same.
An example of a Blank Node implementation with object interning and is thread-safe when accessing the weakrefs. Memory should be freed once the objects are no longer in use even though we have a weakref pointing to it.
importthreadingfromdataclassesimportdataclass, fieldfromtypingimportAny, Self, finalfromuuidimportuuid4fromweakrefimportWeakValueDictionaryclassInternedBlankNode:
_intern_cache: WeakValueDictionary[str, "Self"] =WeakValueDictionary()
_lock=threading.Lock()
__slots__= ("__weakref__",)
def__new__(cls, value: str|None=None) ->Self:
ifvalueisNone:
value=str(uuid4()).replace("-", "0")
withcls._lock:
ifvalueincls._intern_cache:
returncls._intern_cache[value]
instance=super().__new__(cls)
object.__setattr__(instance, "value", value)
cls._intern_cache[value] =instancereturninstance@final@dataclass(frozen=True, slots=True)classBlankNode(InternedBlankNode):
""" An RDF blank node representing an anonymous resource. Specification: https://www.w3.org/TR/rdf12-concepts/#section-blank-nodes This implementation uses object interning to ensure that blank nodes with the same identifier reference the same object instance, optimizing memory usage. The class is marked final to ensure the :py:meth:`IRI.__new__` implementation cannot be overridden. :param value: A blank node identifier. If :py:obj:`None` is provided, an identifier will be generated. """value: str=field(default_factory=lambda: str(uuid4()).replace("-", "0"))
def__str__(self) ->str:
returnf"_:{self.value}"def__reduce__(self) ->str|tuple[Any, ...]:
returnself.__class__, (self.value,)
__all__= ["BlankNode"]
I'm interested in this. I've been playing around with the idea of implementing RDF terms with object interning to save memory and avoid copying. This issue is a continuation from #2866.
In any embarassingly parallel, distributed ETLs where I've used RDFLib, I've always seen the memory usage grow over time. By implementing object interning, we may be able to fix this issue and potentially stop the memory growth when objects are no longer referenced. I think this particular issue is also related to this other issue described here #740.
The key is to implement RDF terms as immutable data structures. This way, we can safely reuse references to the same object if the unicode code point sequence in the term's value is the same.
An example of a Blank Node implementation with object interning and is thread-safe when accessing the weakrefs. Memory should be freed once the objects are no longer in use even though we have a weakref pointing to it.
And tests:
The text was updated successfully, but these errors were encountered: