On 11/25/2010 4:13 AM, José Emilio Mori Recio wrote:
A real database implementation could cost a great effort, so maybe an easier "global templates" solution (in the same way Commons is available for the rest of the projects) should be considered, as there could be useful global templates apart from the data templates. Anyway, I think Wikidata is definitely something we must have. Answering to Michael Peel, if some concrete definition is needed about what Wikidata should be, I'd be glad to help in the process. I wrote more about that subject in the WMF list a few months ago (http://lists.wikimedia.org/pipermail/foundation-l/2010-May/058688.html), but with no luck.
Ugh. Dbpedia has 50% or so recall extracting people and cities from templates from wikipedia. The trouble with people is mostly that a lot of people don't have infoboxes at all, whereas dbpedia's ruleset isn't complete enough to handle the hodge-podge of different infoboxes that are used for locations all over the place. And don't get me started on all the nonstandard infoboxes for representing geographic coordinates. I've written my own extraction systems that eat infoboxes and other templates, and it's always the same story, it's pretty easy to get about 50% recall, but you've got to fight hard for every % you get past that.
So, when I hear talk about using mediawiki templates for something like this, it's like popping a paper bag in back of the head of a Vietnam Vet. This kind of project needs a database if it's going to be useful.
The other "elephant in the room" is Freebase. Freebase, more or less, is already a "data wiki" that's linked with Wikipedia. Freebase provides a reasonable interface for hand edits, and uses crowdsourcing and machine learning techniques for data cleaning and autotyping. Although there are many things dbpedia does better (having unique titles for topics and good RDF), I almost always tell people who want to get started with dbpedia to use Freebase instead... One time I was able to solve a problem in 40 minutes with Freebase that I'd spent two weeks trying to do with dbpedia.