You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on an HDT library in Rust and use HDT files produced by hdt-cpp as test input.
Until now that worked fine but I get an error with the subject http://dbpedia.org/resource/Özgür_Özata, which is the last subject in the attached turtle file.
When looking at that file with a hex editor, "Özgür_Özata" is represented as 0xc3,0x96,0x7a,0x67,0xc3,0xbc,0x72,0x5f,0xc3,0x96,0x7a,0x61,0x74,0x61.
These hex values are for example a valid input for the Rust function std::str::from_utf8().
However rdf2hdt changes the first byte from 0xc3 to 0x9d, which causes std::str::from_utf8() to panic.
I'm not an UTF8 expert and can't say whether that is an alternative valid representation of an "Ö" character but given that the function panics I assume it is not so I want to want to submit this as a possible bug.
I'm using a Docker image built from newest commit of the develop branch.
use std::str;fnmain(){let hex = [0xc3,0x96,0x7a,0x67,0xc3,0xbc,0x72,0x5f,0xc3,0x96,0x7a,0x61,0x74,0x61];let s = str::from_utf8(&hex[0..]).unwrap();println!("{s}");let hex = [0x9d,0x96,0x7a,0x67,0xc3,0xbc,0x72,0x5f,0xc3,0x96,0x7a,0x61,0x74,0x61];let s = str::from_utf8(&hex[0..]).unwrap();println!("{s}");}
Compiling hex v0.1.0 (/home/konrad/tmp/hex)
Finished dev [unoptimized + debuginfo] target(s) in 0.10s
Running `target/debug/hex`
Özgür_Özata
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', src/main.rs:8:39
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I'm working on an HDT library in Rust and use HDT files produced by hdt-cpp as test input.
Until now that worked fine but I get an error with the subject http://dbpedia.org/resource/Özgür_Özata, which is the last subject in the attached turtle file.
When looking at that file with a hex editor, "Özgür_Özata" is represented as 0xc3,0x96,0x7a,0x67,0xc3,0xbc,0x72,0x5f,0xc3,0x96,0x7a,0x61,0x74,0x61.
These hex values are for example a valid input for the Rust function std::str::from_utf8().
However rdf2hdt changes the first byte from 0xc3 to 0x9d, which causes std::str::from_utf8() to panic.
I'm not an UTF8 expert and can't say whether that is an alternative valid representation of an "Ö" character but given that the function panics I assume it is not so I want to want to submit this as a possible bug.
I'm using a Docker image built from newest commit of the develop branch.
persondata_en_10k.ttl.zip
The text was updated successfully, but these errors were encountered: