Hello, At Friday 19 April 2013 01:03:25 DaB. wrote:
would be no way to separate Wikidata from the rest
I don't understand why separating plaintext storage between different projects would be an issue. Is it all lumped into one storage "namespace"? I'm sure nobody at Wikimedia would be the least bit motivated to make this data available to the toolserver, but maybe it will be usable in labs. Otherwise it would be quite a waste of a great opportunity.
as you may know there is a rev_text_id-field in the revision-table. This field points to the text-table where the actual text is – or should be. Because the WMF doesn’t store the text here, but only a pointer ("DB://cluster25/11458305" for example). If you query different wikis you will see that most of them point to the same cluster or one with a number short by. That says me (and I was also told so before) that all text of all wmf-projects are stored together. The task would now to separate wikidata from the rest – but the storage-area has no clue from where a text is which makes the separating very hard. And there is another problem: Deleted texts are also in this area, so even more filtering would be needed. I very doubt that this situation will change at the TS and I also doubt that it will be different for WikiLabs. So I guess your best bet is the API here.
Sincerely, DaB.