Hi Aidan, Markus, Daniel and Wikidatans,
As an emergence out of this conversation on Wikidata query performance, and re cc World University and School/Wikidata, as a theoretical challenge, how would you suggest coding WUaS/Wikidata initially to be able to answer this question - "What are most impt stats issues in earth/space sci that journalists should understand?" - https://twitter.com/ReginaNuzzo/status/761179359101259776 - in many Wikipedia languages including however in American Sign Language (and other sign languages), as well as eventually in voice. (Regina Nuzzo is an associate Professor at Gallaudet University for the hearing impaired/deafness, and has a Ph.D. in statistics from Stanford; Regina was born with hearing loss herself).
I'm excited for when we can ask WUaS (or Wikipedia) this question, (or so many others) in voice combining, for example, CC WUaS Statistics, Earth, Space & Journalism wiki subject pages (with all their CC MIT OCW and Yale OYC) - http://worlduniversity.wikia.com/wiki/Subjects - in all of Wikipedia's 358 languages, again eventually in voice and in ASL/other sign languages (https://twitter.com/WorldUnivAndSch/status/761593842202050560 - see, too - https://www.wikidata.org/wiki/Wikidata:Project_chat#Schools).
Thanks for your paper, Aidan, as well. Would designing for deafness inform how you would approach "Querying Wikidata: Comparing SPARQL, Relational and Graph Databases" in any new ways?
Best, Scott
On Sat, Aug 6, 2016 at 12:29 PM, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
Hi Aidan,
Thanks, very interesting, though I have not read the details yet.
I wonder if you have compared the actual query results you got from the different stores. As far as I know, Neo4J actually uses a very idiosyncratic query semantics that is neither compatible with SPARQL (not even on the BGP level) nor with SQL (even for SELECT-PROJECT-JOIN queries). So it is difficult to compare it to engines that use SQL or SPARQL (or any other standard query language, for that matter). In this sense, it may not be meaningful to benchmark it against such systems.
Regarding Virtuoso, the reason for not picking it for Wikidata was the lack of load-balancing support in the open source version, not the performance of a single instance.
Best regards,
Markus
On 06.08.2016 18:19, Aidan Hogan wrote:
Hey all,
Recently we wrote a paper discussing the query performance for Wikidata, comparing different possible representations of the knowledge-base in Postgres (a relational database), Neo4J (a graph database), Virtuoso (a SPARQL database) and BlazeGraph (the SPARQL database currently in use) for a set of equivalent benchmark queries.
The paper was recently accepted for presentation at the International Semantic Web Conference (ISWC) 2016. A pre-print is available here:
http://aidanhogan.com/docs/wikidata-sparql-relational-graph.pdf
Of course there are some caveats with these results in the sense that perhaps other engines would perform better on different hardware, or different styles of queries: for this reason we tried to use the most general types of queries possible and tried to test different representations in different engines (we did not vary the hardware). Also in the discussion of results, we tried to give a more general explanation of the trends, highlighting some strengths/weaknesses for each engine independently of the particular queries/data.
I think it's worth a glance for anyone who is interested in the technology/techniques needed to query Wikidata.
Cheers, Aidan
P.S., the paper above is a follow-up to a previous work with Markus Krötzsch that focussed purely on RDF/SPARQL:
http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf
(I'm not sure if it was previously mentioned on the list.)
P.P.S., as someone who's somewhat of an outsider but who's been watching on for a few years now, I'd like to congratulate the community for making Wikidata what it is today. It's awesome work. Keep going. :)
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata