Hi Everyone,
We are a couple of undergraduate students at IIT Bombay working on the
entity linking problem. It is the process of annotating a piece of text
with entities from a knowledge base. A common test set for the above task
is from the Knowledge Base Population task from the Text Analysis
Conference. The reference knowledge base for the task was extracted from an
October 2008 dump of Wikipedia. Unfortunately, when the TAC knowledge base
was being created, a lot of important information concerning the Wikipedia
category hierarchy was lost since they only retain links between entity
pages. Beyond this, the TAC knowledge base also does not have the PageIDs
of the entities extracted from Wikipedia which makes matching the entities
in TAC with the current version of Wikipedia hard due to renames and
deletions. We were wondering if there was anyway we could gain access to a
dump from October 2008. We found that the dump from January 2008 was not
complete as far as the TAC knowledge base is concerned. Any help will be
greatly appreciated.
Thanks,
C. Yeshwanth