Skip to content

Parsing from sys.stdin fails #285

Closed
Closed
@niklasl

Description

@niklasl

With the new URI-validation of URIRef:s (in rdflib/term.py), RDFLib fails to parse RDF from sys.stdin. Example:

echo '<> a <http://schema.org/Thing> .' | python -c '
import sys
from rdflib import *
Graph().parse(sys.stdin, format="turtle").serialize(sys.stdout)'

Result is like:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  ...
  File ".../rdflib/term.py", line 208, in __new__
    raise Exception('%s does not look like a valid URI, perhaps you want to urlencode it?'%value)
Exception: file:///private/tmp/<stdin> does not look like a valid URI, perhaps you want to urlencode it?

One problematic case is when using the rdfpipe command (now included in RDFLib again), which is very useful for on-the-fly RDF conversion.

To work around the issue, you can use the publicID kwarg to Graph.parse whenever you use sys.stdin. But it seems better to handle this automatically.

This can be fixed in rdflib/parser.py, in create_input_source. Checking if the given file is sys.stdin seems proper, then using some symbolic URI for that. I propose:

                if f is sys.stdin:
                    input_source.setSystemId("file:///dev/stdin")
                elif hasattr(f, "name"):
                    input_source.setSystemId(f.name)

Which I can add unless anyone objects.

(Although a more proper URL-escaping of the file name/path seems like the right thing to do in general (using pathname2url), I have no idea how many usages of RDFLib already rely on the former, lax handling of this. And sys.stdin is still special.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingparsingRelated to a parsing.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions