Hello Kanzaki,
thank you for the report!
Can you provide a link to the RDF document in question? Are you talking about the RDF dumps (which are generated by a third party), or the (incomplete) RDF we return from URLs like https://www.wikidata.org/entity/Q42.nt?
We use a (cropped version of) EasyRdf. Do you perhaps know if this problem still exists in the latest version of EasyRdf?
-- daniel
Am 05.12.2014 12:18, schrieb KANZAKI Masahide:
Hello, thank you for providing Wikidata RDF. The effort is very much appreciated.
I found that Wikidata RDF in N-Triples format has an encoding issue with non ASCII characters.
Because RDF 1.1 N-Triples format no longer has 'ASCII only' restriction, it is desirable to use characters directly as in Turtle format, not using Unicode escape sequences (\uHHHH).
If, for some reasons, escape is necessary, it should be '\u' followed by Unicode code point, not UTF-8 octets. For example, Kanji character '位' should be '\u4F4D', not '\u00E4\u00BD\u008D'. Current Wikidata RDF NT is provided in the latter form, which makes non ASCII characters garbages.
I hope this issue is not very difficult to fix.
cheers,