Description
With the new URI-validation of URIRef:s (in rdflib/term.py
), RDFLib fails to parse RDF from sys.stdin
. Example:
echo '<> a <http://schema.org/Thing> .' | python -c '
import sys
from rdflib import *
Graph().parse(sys.stdin, format="turtle").serialize(sys.stdout)'
Result is like:
Traceback (most recent call last):
File "<string>", line 4, in <module>
...
File ".../rdflib/term.py", line 208, in __new__
raise Exception('%s does not look like a valid URI, perhaps you want to urlencode it?'%value)
Exception: file:///private/tmp/<stdin> does not look like a valid URI, perhaps you want to urlencode it?
One problematic case is when using the rdfpipe
command (now included in RDFLib again), which is very useful for on-the-fly RDF conversion.
To work around the issue, you can use the publicID
kwarg to Graph.parse
whenever you use sys.stdin
. But it seems better to handle this automatically.
This can be fixed in rdflib/parser.py
, in create_input_source
. Checking if the given file is sys.stdin
seems proper, then using some symbolic URI for that. I propose:
if f is sys.stdin:
input_source.setSystemId("file:///dev/stdin")
elif hasattr(f, "name"):
input_source.setSystemId(f.name)
Which I can add unless anyone objects.
(Although a more proper URL-escaping of the file name/path seems like the right thing to do in general (using pathname2url
), I have no idea how many usages of RDFLib already rely on the former, lax handling of this. And sys.stdin is still special.)