Hi Everyone,
We are a couple of undergraduate students at IIT Bombay working on the entity linking problem. It is the process of annotating a piece of text with entities from a knowledge base. A common test set for the above task is from the Knowledge Base Population task from the Text Analysis Conference. The reference knowledge base for the task was extracted from an October 2008 dump of Wikipedia. Unfortunately, when the TAC knowledge base was being created, a lot of important information concerning the Wikipedia category hierarchy was lost since they only retain links between entity pages. Beyond this, the TAC knowledge base also does not have the PageIDs of the entities extracted from Wikipedia which makes matching the entities in TAC with the current version of Wikipedia hard due to renames and deletions. We were wondering if there was anyway we could gain access to a dump from October 2008. We found that the dump from January 2008 was not complete as far as the TAC knowledge base is concerned. Any help will be greatly appreciated.
Thanks, C. Yeshwanth
xmldatadumps-l@lists.wikimedia.org