Wikidata-tech April 2015

wikidata-tech@lists.wikimedia.org

3 participants
3 discussions

by Jeroen De Dauw

Hey, I'm wondering if we still need to support PHP 5.3. http://blog.ircmaxell.com/2014/12/on-php-version-requirements.html I'd rather bump the minimum version up to PHP 5.5, and know people have been talking about doing the same for MediaWiki itself. My question is if this can already be done without making Wikibase undeployable on WMF servers. Is everything running HHVM yet, or is there some stuff relevant to Wikibase that still runs an unsupported version of PHP? Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3

8 years, 10 months

Alternative to wbformatvalue

by Almer Bolatov

Hello all, is there an alternative to MediaWiki Api wbformatvalue <http://www.wikidata.org/w/api.php?action=help&modules=wbformatvalue> function in Wikidata Toolkit? Currently I search for items with the help of MediaWiki Api <http://www.wikidata.org/w/api.php> and parse them as a list of JacksonItemDocuments. Actually, at this point I already have the required information. But now I have to do another query to format the value by calling wbformatvalue. What I could do (if WikidataToolkit supports it) is extract and format the value from JacksonItemDocument already. Thanks in advance Cheers, Almer

8 years, 11 months

Wikidata Query implementation - another approach

by Bo Ferri

Hi, I followed the most recent discussion about the implementation of the Wikidata Query service. Albeit, you've chosen Blazegraph as database for implementing it, I would like to show what we've done in the backend of our open-source datamanagement platform D:SWARM [1]. The Wikidata data model and the current state of our data model [2] are pretty similar, i.e., we mainly rely on statements that belong to resources and we would like to keep qualified attributes about the statements. We decided to build our graph data model on top of the property graph model and utilise the RDF concepts as well [3], i.e. we are RDF compatible. The main difference right now between both data models is that our graph data model currently only makes use of a fixed set of qualified attributes for the statements (whereby the claims of the Wikidata data model can use anything for qualified attributes; however, this can be changed/opened in our data model rather easily). At the current state of our implementation we make use of Neo4j. Therefore, we provide an unmanaged extension [4] to offer specific HTTP APIs at the Neo4j server to consume and read the data (etc.). Furthermore, batch insert is implemented to speed-up the import for huge amounts of data [5]. At the data import we create various indices [6] that can later be utilised to boost various queries. Finally, we also experimentally support versioning for our data [7]. Currently, we are working on improving the performance of the import. Most recently, we were able to load 115M RDF statements in ~107 minutes on a commodity machine (incl. indexing; 16GB ram, SSD, 8 cores; single-threaded for now (!)) [8]. I know that this is no landmark (I'm not really a performance guy at all ;) ), since many triples stores are much faster. On the other side, the data is now in a property graph. Hence, we can make use of the advantages of this approach (rather then dealing with the "disadavantages/misconceptions" of the current RDF data model (namely reification*) ;) ). Maybe, we can join forces at certain challenges. From what I've seen so far the Wikidata dataset is about 223M statements, or? So it should still be possible to load it at a single commodity machine into Neo4j. The only "disadvantage" right re. your current decisions for implementation is that you need to write Cypher queries instead of SPARQL queries (our you need to write a preprocessor to transform SPARQL queries into Cypher queries). Feel free to ask further questions about details our implementation. I'm looking forward to your response. Cheers, Bo/T *) RDR is also "only" an experiment right now ;) [1] http://www.dswarm.org/ [2] https://github.com/dswarm/dswarm-documentation/wiki/Graph-Data-Model [3] https://github.com/dswarm/dswarm-documentation/wiki/Comparison-RDF-and-GDM-… [4] https://github.com/dswarm/dswarm-graph-neo4j [5] https://github.com/dswarm/dswarm-graph-neo4j/tree/master/src/main/java/org/… [6] https://github.com/dswarm/dswarm-documentation/wiki/Graph-Exploration#use-o… [7] https://github.com/dswarm/dswarm-documentation/wiki/Versioning#implementati… [8] https://github.com/zazi/dswarm-graph-neo4j/tree/mapdb

9 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech April 2015