Disambiguation systems

List overview All Threads
Download

newer

older

Google Knol is closing

Re: [Wiki-research-l]...

Rami Al-Rfou'

20 Nov 2011 20 Nov '11

2:48 p.m.

Hi All,

I am working on a project where I need to identify entities in a text and link them back to Wikipedia articles or freebase IDs. I can detect the entities but I need a system that can disambiguate them. As I noticed many systems use Wikipedia as a backend for knowledge. I would like to hear your opinion regarding any previous experiences dealing with any of the different available systems. I tried till now Wikipedia miner, and the problem it can not detect most of the _real_ ambiguous situations. For example, the American town called Hebronhttp://wikipedia-miner.cms.waikato.ac.nz/demos/annotate/?source=Hebron+is+a+town+in+Boone+Township%2C+Porter+County%2C+Indiana%2C+United+States.+The+population+was+3%2C724+at+the+2010+census.&sourceMode=AUTO&repeatMode=ALL&minProbability=0. Accuracy can be traded in case the system is really fast as we are planning to process huge amount of data.

Regards.

-- Rami Al-Rfou' Stony Brook University PhD student @ Computer Science Dept.

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

Sebastian Hellmann

21 Nov 21 Nov

3:41 a.m.

Hi Rami, DBpedia Spotlight has a direct disambiguation function: http://dbpedia.org/spotlight You can mark the detected entities you would like to have disambiguated with [[ ]] in the text, I think The tools is also Internationalized right now for several languages, although I am not sure what the current status of that is.

Of course there are many other tools that probably do the same. You could check the related work section here: http://blog.semantic-web.at/wp-content/uploads/2011/09/p1_mendes.pdf and also http://en.wikipedia.org/wiki/Knowledge_extraction#Entity_Linking

If you need to interchange services easily and want to have a unified interface, there is the NLP Interchange Format, which is : http://nlp2rdf.org/nif-1-0#toc-named-entity-recognition-and-entity-linking

Note that I have no actual hands-on experience with the services. Somebody else could answer that part better :)

All the best, Sebastian

On 11/20/2011 08:48 PM, Rami Al-Rfou' wrote:

...

Hi All,

I am working on a project where I need to identify entities in a text and link them back to Wikipedia articles or freebase IDs. I can detect the entities but I need a system that can disambiguate them. As I noticed many systems use Wikipedia as a backend for knowledge. I would like to hear your opinion regarding any previous experiences dealing with any of the different available systems. I tried till now Wikipedia miner, and the problem it can not detect most of the _real_ ambiguous situations. For example, the American town called Hebronhttp://wikipedia-miner.cms.waikato.ac.nz/demos/annotate/?source=Hebron+is+a+town+in+Boone+Township%2C+Porter+County%2C+Indiana%2C+United+States.+The+population+was+3%2C724+at+the+2010+census.&sourceMode=AUTO&repeatMode=ALL&minProbability=0. Accuracy can be traded in case the system is really fast as we are planning to process huge amount of data.

Regards.

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Projects: http://nlp2rdf.org , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org

Johannes Hoffart

7:56 a.m.

Hi Rami,

there are multiple systems that do entity disambiguation. Spotlight was already mentioned. Other systems that have a demo available are:

* TagMe: http://tagme.di.unipi.it/ * It is designed for small text snippts. For them, it works very well, and it is very fast.

* AIDA: http://www.mpi-inf.mpg.de/yago-naga/aida/, demo at https://d5gate.ag5.mpi-sb.mpg.de/webaida/ * This system was developed by our group. It is suitable for short, but also for longer texts (length of news articles), but it is not as fast as TagMe. You can also manually tag mentions with [[ ]] if you set the mention extraction to manual. Different Disambiguation methods can trade off between speed and quality, the best (and slowest) is prior+sim+coherence.

Both systems could potentially give you a better quality than Wikipedia Miner. I'm not aware of any other systems that have an online demo.

If you have any follow-up questions, I'm happy to assist Johannes

Am 20.11.2011 um 20:48 schrieb Rami Al-Rfou':

...

Hi All,

I am working on a project where I need to identify entities in a text and link them back to Wikipedia articles or freebase IDs. I can detect the entities but I need a system that can disambiguate them. As I noticed many systems use Wikipedia as a backend for knowledge. I would like to hear your opinion regarding any previous experiences dealing with any of the different available systems. I tried till now Wikipedia miner, and the problem it can not detect most of the _real_ ambiguous situations. For example, the American town called Hebron. Accuracy can be traded in case the system is really fast as we are planning to process huge amount of data.

Regards.

-- Rami Al-Rfou' Stony Brook University PhD student @ Computer Science Dept.

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

4797

Age (days ago)

4798

Last active (days ago)

wiki-research-l@lists.wikimedia.org

2 comments

3 participants

tags (0)

participants (3)

Johannes Hoffart
Rami Al-Rfou'
Sebastian Hellmann