[Toolserver-l] Experimental: Live template value search

Magnus Manske magnusmanske at googlemail.com
Tue Apr 21 09:07:10 UTC 2009


On Tue, Apr 21, 2009 at 9:25 AM, Daniel Kinzler <daniel at brightbyte.de> wrote:
> Magnus Manske schrieb:
>> All in all, it would be much better directly integrated into MediaWiki
>> (no need for text retrieval/parsing, no bulk updates). But I've been
>> saying that for years, at least this is a first attempt.
>
> Actually, this is part of my grand plan for world domination. I'm pushing for it
> behind the scenes... I have a few ideas on how it may be done nicely.

Excellent! I'll hold further development on the tool for now.

> I think the main problem is that semantic mediawiki looks like the obvious
> answer. But i doubt it is. I only want a small subset of that functionality on
> wikipedia. Maybe SMW can be chopped up to fit that, but i'm personally more
> inclined to extend the RDF extension to store triples in the DB.

I agree about Semantic MediaWiki, which is a different beast (and
might one day be used on Wikipedia).

The question seems to be scalability.Extrapolating from my sample data
set, just the key/value pairs of templates directly included in
articles would come to over 200 million rows for en.wikipedia at the
moment. A MediaWiki-internal solution would want to store templates
included in templates as well, which can be a lot for complicated
meta-templates. I think a billion rows for the current English
Wikipedia is not too far-fetched in that model. The table would be
both constantly updated (potentially hundeds of writes for a single
article update) and heavily searched (with LIKE "%stuff%", no less).

Would the RDF extension be up to that?

Cheers,
Magnus



More information about the Toolserver-l mailing list