Skip to content

[BUG] import_util json() procedure fails when file contains unicode characters #708

@QazCetelic

Description

@QazCetelic

Describe the bug
Using the import utility with JSON files containing unicode characters fails.

To Reproduce

  1. Add example JSON file, containing ë.
    example.json
[
    {
        "id": 7902,
        "labels": [
            "Foo"
        ],
        "type": "node",
        "properties": {
            "name": "Categorieën"
        }
    }
]
  1. Import it
CALL import_util.json("/data/example.json")
  1. See error
(click to show)
import_util.json: Traceback (most recent call last):
  File "/usr/lib/memgraph/query_modules/import_util.py", line 335, in json
    graph_objects = js.load(file)
                    ^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "/usr/lib/python3.12/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 163: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/memgraph/query_modules/import_util.py", line 337, in json
    raise OSError("Could not open/read file.")
OSError: Could not open/read file.

Expected behavior
It should be able to import files containing Unicode characters.

Additional context
The issue is caused by

with open(path, "r") as file:

try:
    with open(path, "r") as file: # <- this opens it with ASCII encoding which fails with many characters
            graph_objects = js.load(file)
    except Exception:
        raise OSError("Could not open/read file.")

This can probably be fixed by using with open(path, 'r', encoding="utf-8") as f:. This should support more characters and is backwards compatible with ASCII (all valid ASCII is valid UTF8).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions