-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N3 Formatting Performance #2923
Comments
@mgberg So you are saying the return logic should be just use I think the logic should be that if there is no prefix, a QName is computed for the IRI and stored back in the Store so when next the serializer calls this method, it will have a prefix for the IRI and won't need to recompute. Perhaps then I understand your point here as the This is likely 10+ year old code (long before my time on RDFLib) but if you an create a PR to improve this that passes all existing tests, please do! |
This is the same issue I was discussing in our meeting a few weeks ago, I came to the same conclusion that there is a needless I suspect this function was written for a different purpose; it uses N3 notation for URIs, unlike It also should be noted, for the majority of its life since it started in 2001, RDFLib was used by and maintained by a group of universities and was used mainly for research purposes, so there was never much consideration given to performance. I think you'll find there are many such examples of this in the codebase. |
@ashleysommer a semi-related question: is there a place where performance-related concerns are being tracked? It would be interesting to look at what's there, submit some other issues I've run into if they're not already there, and/or contributing some fixes for known issues. |
It seems the |
I noticed that formatting various terms using the
n3
method seemed to be quite slow when calling it a large number of times.The
n3
method usesNamespaceManager.normalizeUri
, shown here:rdflib/rdflib/namespace/__init__.py
Lines 542 to 565 in 0b69f4f
Note that the
name
variable is created but never used in the current implementation.I'm confused as to why
compute_qname
is called at all. It seems that you should be able to simply do the following:I'm seeing about a 2x improvement in performance with this this change (assuming unique URIs), which is not surprising considering
compute_qname
essentially repeats what has already been done innormalizeUri
thus far.Is there a reason why
compute_qname
is called that I'm missing? Is there some value to having it there? If not, I'd be glad to submit a PR to make this change.The text was updated successfully, but these errors were encountered: