On 11/25/2010 4:13 AM, José Emilio Mori Recio wrote:
A real database implementation could cost a great
effort, so maybe an
easier "global templates" solution (in the same way Commons is available
for the rest of the projects) should be considered, as there could be
useful global templates apart from the data templates. Anyway, I think
Wikidata is definitely something we must have. Answering to Michael Peel,
if some concrete definition is needed about what Wikidata should be, I'd
be glad to help in the process. I wrote more about that subject in the WMF
list a few months ago
but with no luck.
Ugh. Dbpedia has 50% or so recall extracting people and cities
from templates from wikipedia. The trouble with people is mostly that a
lot of people don't have infoboxes at all, whereas dbpedia's ruleset
isn't complete enough to handle the hodge-podge of different infoboxes
that are used for locations all over the place. And don't get me
started on all the nonstandard infoboxes for representing geographic
coordinates. I've written my own extraction systems that eat infoboxes
and other templates, and it's always the same story, it's pretty easy
to get about 50% recall, but you've got to fight hard for every % you
get past that.
So, when I hear talk about using mediawiki templates for something
like this, it's like popping a paper bag in back of the head of a
Vietnam Vet. This kind of project needs a database if it's going to be
The other "elephant in the room" is Freebase. Freebase, more or
less, is already a "data wiki" that's linked with Wikipedia. Freebase
provides a reasonable interface for hand edits, and uses crowdsourcing
and machine learning techniques for data cleaning and autotyping.
Although there are many things dbpedia does better (having unique titles
for topics and good RDF), I almost always tell people who want to get
started with dbpedia to use Freebase instead... One time I was able to
solve a problem in 40 minutes with Freebase that I'd spent two weeks
trying to do with dbpedia.