FWIW, I have implemented a query-able stand-alone web server that keeps all of the wikidata property-item-links in memory. This uses the wikidata dumps which appear to be rather frequent. I'll try do deploy a test version on wikilabs (once I figure out how all that works); it seems to be more favourable to such services than the toolserver.
On Fri, Apr 19, 2013 at 9:29 AM, Platonides platonides@gmail.com wrote:
On 19/04/13 01:19, DaB. wrote:
as you may know there is a rev_text_id-field in the revision-table. This
field
points to the text-table where the actual text is – or should be.
Because the
WMF doesn’t store the text here, but only a pointer
("DB://cluster25/11458305"
for example). If you query different wikis you will see that most of
them point
to the same cluster or one with a number short by. That says me (and I
was
also told so before) that all text of all wmf-projects are stored
together.
The task would now to separate wikidata from the rest – but the
storage-area
has no clue from where a text is which makes the separating very hard.
And
there is another problem: Deleted texts are also in this area, so even
more
filtering would be needed. I very doubt that this situation will change at the TS and I also doubt
that
it will be different for WikiLabs. So I guess your best bet is the API
here.
Sincerely, DaB.
I think the only hope would be if wikidata was stored under its own cluster (for easier differenciation) and at least one server of that group (the master?) only had that (so toolserver could get its binlogs).
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette