On Tue, Apr 21, 2009 at 9:25 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Magnus Manske schrieb:
All in all, it would be much better directly integrated into MediaWiki (no need for text retrieval/parsing, no bulk updates). But I've been saying that for years, at least this is a first attempt.
Actually, this is part of my grand plan for world domination. I'm pushing for it behind the scenes... I have a few ideas on how it may be done nicely.
Excellent! I'll hold further development on the tool for now.
I think the main problem is that semantic mediawiki looks like the obvious answer. But i doubt it is. I only want a small subset of that functionality on wikipedia. Maybe SMW can be chopped up to fit that, but i'm personally more inclined to extend the RDF extension to store triples in the DB.
I agree about Semantic MediaWiki, which is a different beast (and might one day be used on Wikipedia).
The question seems to be scalability.Extrapolating from my sample data set, just the key/value pairs of templates directly included in articles would come to over 200 million rows for en.wikipedia at the moment. A MediaWiki-internal solution would want to store templates included in templates as well, which can be a lot for complicated meta-templates. I think a billion rows for the current English Wikipedia is not too far-fetched in that model. The table would be both constantly updated (potentially hundeds of writes for a single article update) and heavily searched (with LIKE "%stuff%", no less).
Would the RDF extension be up to that?
Cheers, Magnus