Skip to content

Bad UTF-8 handling for NTriple parser #400

@Arthur-VaisseLesteven

Description

@Arthur-VaisseLesteven

The current NTriple parser does not handle properly utf-8 string.

the following triple trigger an error :
<http://linkedgeodata.org/triplify/user22701> <http://www.w3.org/2000/01/rdf-schema#label> "CiaránMooney" .

when executing this code :

line = codecs.open("file", "r", "utf-8").readline()
parser = NTriples.NTriplesParser(sink = mySink)
parser.parsestring(line)#where line is the previous triple

The parser fail to parse the "á" character.
Modifying the parse function of the NTriplesParser class allow to solve this bug.

Current :

  def parse(self, f):
        """Parse f as an N-Triples file."""
        if not hasattr(f, 'read'):
            raise ParseError("Item to parse must be a file-like object.")

        f = ascii(f)
        ...

Fix:

  def parse(self, f):
        """Parse f as an N-Triples file."""
        if not hasattr(f, 'read'):
            raise ParseError("Item to parse must be a file-like object.")

        f = codecs.getreader("utf-8")(f)
        ...

Hope this help.

AVL

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requestparsingRelated to a parsing.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions