Hi All,
I am working on a project where I need to identify entities in a text and link them back to Wikipedia articles or freebase IDs. I can detect the entities but I need a system that can disambiguate them. As I noticed many systems use Wikipedia as a backend for knowledge. I would like to hear your opinion regarding any previous experiences dealing with any of the different available systems. I tried till now Wikipedia miner, and the problem it can not detect most of the _real_ ambiguous situations. For example, the American town called Hebronhttp://wikipedia-miner.cms.waikato.ac.nz/demos/annotate/?source=Hebron+is+a+town+in+Boone+Township%2C+Porter+County%2C+Indiana%2C+United+States.+The+population+was+3%2C724+at+the+2010+census.&sourceMode=AUTO&repeatMode=ALL&minProbability=0. Accuracy can be traded in case the system is really fast as we are planning to process huge amount of data.
Regards.
Hi Rami, DBpedia Spotlight has a direct disambiguation function: http://dbpedia.org/spotlight You can mark the detected entities you would like to have disambiguated with [[ ]] in the text, I think The tools is also Internationalized right now for several languages, although I am not sure what the current status of that is.
Of course there are many other tools that probably do the same. You could check the related work section here: http://blog.semantic-web.at/wp-content/uploads/2011/09/p1_mendes.pdf and also http://en.wikipedia.org/wiki/Knowledge_extraction#Entity_Linking
If you need to interchange services easily and want to have a unified interface, there is the NLP Interchange Format, which is : http://nlp2rdf.org/nif-1-0#toc-named-entity-recognition-and-entity-linking
Note that I have no actual hands-on experience with the services. Somebody else could answer that part better :)
All the best, Sebastian
On 11/20/2011 08:48 PM, Rami Al-Rfou' wrote:
Hi All,
I am working on a project where I need to identify entities in a text and link them back to Wikipedia articles or freebase IDs. I can detect the entities but I need a system that can disambiguate them. As I noticed many systems use Wikipedia as a backend for knowledge. I would like to hear your opinion regarding any previous experiences dealing with any of the different available systems. I tried till now Wikipedia miner, and the problem it can not detect most of the _real_ ambiguous situations. For example, the American town called Hebronhttp://wikipedia-miner.cms.waikato.ac.nz/demos/annotate/?source=Hebron+is+a+town+in+Boone+Township%2C+Porter+County%2C+Indiana%2C+United+States.+The+population+was+3%2C724+at+the+2010+census.&sourceMode=AUTO&repeatMode=ALL&minProbability=0. Accuracy can be traded in case the system is really fast as we are planning to process huge amount of data.
Regards.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Rami,
there are multiple systems that do entity disambiguation. Spotlight was already mentioned. Other systems that have a demo available are:
* TagMe: http://tagme.di.unipi.it/ * It is designed for small text snippts. For them, it works very well, and it is very fast.
* AIDA: http://www.mpi-inf.mpg.de/yago-naga/aida/, demo at https://d5gate.ag5.mpi-sb.mpg.de/webaida/ * This system was developed by our group. It is suitable for short, but also for longer texts (length of news articles), but it is not as fast as TagMe. You can also manually tag mentions with [[ ]] if you set the mention extraction to manual. Different Disambiguation methods can trade off between speed and quality, the best (and slowest) is prior+sim+coherence.
Both systems could potentially give you a better quality than Wikipedia Miner. I'm not aware of any other systems that have an online demo.
If you have any follow-up questions, I'm happy to assist Johannes
Am 20.11.2011 um 20:48 schrieb Rami Al-Rfou':
Hi All,
I am working on a project where I need to identify entities in a text and link them back to Wikipedia articles or freebase IDs. I can detect the entities but I need a system that can disambiguate them. As I noticed many systems use Wikipedia as a backend for knowledge. I would like to hear your opinion regarding any previous experiences dealing with any of the different available systems. I tried till now Wikipedia miner, and the problem it can not detect most of the _real_ ambiguous situations. For example, the American town called Hebron. Accuracy can be traded in case the system is really fast as we are planning to process huge amount of data.
Regards.
-- Rami Al-Rfou' Stony Brook University PhD student @ Computer Science Dept.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org