Hello,
I have a SPARQL query that returns French labels of people with the family name (P734) Labrousse (Q25273100), sorting them by label: http://tinyurl.com/hq44ea8
The problem is that French rules for sorting are not applied: Élisabeth Labrousse and Émile Labrousse should be between Audran Labrousse and Ernest Labrousse, and not at the end of the results.
This seems conform to SPARQL specifications (ordering is undefined for literals with language tags): https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#modOrderBy
Some SPARQL engines like Dydra use language tags to sort strings: http://blog.dydra.com/2015/05/06/collation
It seems that Blazegraph should be able to do the same thing (using ICU library), but the documentation is old (yep, 2013 is old ! :p) and I don't know how WDQS is configured: https://wiki.blazegraph.com/wiki/index.php/Unicode
Is there a solution to use French (or other languages) sorting in WDQS?
Thanks, Envel
Hi!
Some SPARQL engines like Dydra use language tags to sort strings: http://blog.dydra.com/2015/05/06/collation
It seems that Blazegraph should be able to do the same thing (using ICU library), but the documentation is old (yep, 2013 is old ! :p) and I don't know how WDQS is configured: https://wiki.blazegraph.com/wiki/index.php/Unicode
I'll look into it but cross-language collation is very tricky. ICU can help withing one collation (even then, single language can have more than one collation algorithm, and some do) but nothing guarantees the query result would have strings with one language. With several languages, there isn't really a well-defined order, and local rules may be contradictory.
In any case, the Query Service uses the default setting there (namely, ICU). But I'm not sure that setting alone would produce the result you seek. I'll look further into it.
Any possibility to integrate these new relational database software developments out of MIT - "Democratizing databases: With a new tool, any competent spreadsheet user can construct custom database interfaces" ... https://news.mit.edu/2016/spreadsheet-databases-0708 - with Wikidata SPARQL (and OWL) querying processes, for example (anticipating all of Wikidata/Wikipedia's languages)?
Cheers, Scott
On Jul 3, 2016 12:58 AM, "Envel Le Hir" envel.le.hir@gmail.com wrote:
Hello,
I have a SPARQL query that returns French labels of people with the family name (P734) Labrousse (Q25273100), sorting them by label: http://tinyurl.com/hq44ea8
The problem is that French rules for sorting are not applied: Élisabeth Labrousse and Émile Labrousse should be between Audran Labrousse and Ernest Labrousse, and not at the end of the results.
This seems conform to SPARQL specifications (ordering is undefined for literals with language tags): https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#modOrderBy
Some SPARQL engines like Dydra use language tags to sort strings: http://blog.dydra.com/2015/05/06/collation
It seems that Blazegraph should be able to do the same thing (using ICU library), but the documentation is old (yep, 2013 is old ! :p) and I don't know how WDQS is configured: https://wiki.blazegraph.com/wiki/index.php/Unicode
Is there a solution to use French (or other languages) sorting in WDQS?
Thanks, Envel
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata