Magnus Manske schrieb:
On Tue, Apr 21, 2009 at 9:25 AM, Daniel Kinzler
<daniel(a)brightbyte.de> wrote:
Magnus Manske schrieb:
All in all, it would be much better directly
integrated into MediaWiki
(no need for text retrieval/parsing, no bulk updates). But I've been
saying that for years, at least this is a first attempt.
Actually, this is part of
my grand plan for world domination. I'm pushing for it
behind the scenes... I have a few ideas on how it may be done nicely.
Excellent! I'll hold further development on the tool for now.
Well, keep playing if you, it'll be a while until it goes live. Just don't
invest all that much into an interim solution.
I think the
main problem is that semantic mediawiki looks like the obvious
answer. But i doubt it is. I only want a small subset of that functionality on
wikipedia. Maybe SMW can be chopped up to fit that, but i'm personally more
inclined to extend the RDF extension to store triples in the DB.
I agree about Semantic MediaWiki, which is a different beast (and
might one day be used on Wikipedia).
That's really the question. Should we work *now* on making it usable for
wikipedia, or should we focus on something simpler?
The question seems to be scalability.Extrapolating
from my sample data
set, just the key/value pairs of templates directly included in
articles would come to over 200 million rows for en.wikipedia at the
moment. A MediaWiki-internal solution would want to store templates
included in templates as well, which can be a lot for complicated
meta-templates. I think a billion rows for the current English
Wikipedia is not too far-fetched in that model. The table would be
both constantly updated (potentially hundeds of writes for a single
article update) and heavily searched (with LIKE "%stuff%", no less).
Would the RDF extension be up to that?
It would in a way: it just wouldn't store all parameters. It would store only
things explicitly defined to be RDF values. That would greatly reduce the number
of parameters to store, since all the templates used maintenance, formatting,
styling and navigation can be omitted. It would be used nearly exclusively for
infobox-type templates, image meta-info, and cross-links like the PND template.
Or at least, that'S the idea. It also does away with problems caused by the
various names a parameters with the same meaning may have in different templates
(and different wikis).
-- daniel