Hello Kanzaki,
thank you for the report!
Can you provide a link to the RDF document in question? Are you talking about
the RDF dumps (which are generated by a third party), or the (incomplete) RDF we
return from URLs like <https://www.wikidata.org/entity/Q42.nt>?
We use a (cropped version of) EasyRdf. Do you perhaps know if this problem still
exists in the latest version of EasyRdf?
-- daniel
Am 05.12.2014 12:18, schrieb KANZAKI Masahide:
Hello, thank you for providing Wikidata RDF. The
effort is very much
appreciated.
I found that Wikidata RDF in N-Triples format has an encoding issue
with non ASCII characters.
Because RDF 1.1 N-Triples format no longer has 'ASCII only'
restriction, it is desirable to use characters directly as in Turtle
format, not using Unicode escape sequences (\uHHHH).
If, for some reasons, escape is necessary, it should be '\u' followed
by Unicode code point, not UTF-8 octets. For example, Kanji character
'位' should be '\u4F4D', not '\u00E4\u00BD\u008D'. Current
Wikidata RDF
NT is provided in the latter form, which makes non ASCII characters
garbages.
I hope this issue is not very difficult to fix.
cheers,
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.