ACL/IJCNLP-2009 Workshop
"The People's Web meets NLP: Collaboratively Constructed Semantic Resources"
Co-located with Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing
Singapore August 7th, 2009 http://www.ukp.tu-darmstadt.de/acl-ijcnlp-2009-workshop/
LIST OF ACCEPTED PAPERS
* A Novel Approach to Automatic Gazetteer Generation using Wikipedia Ziqi Zhang and Jose Iria
* Named Entity Recognition in Wikipedia Dominic Balasuriya, Nicky Ringland, Joel Nothman, Tara Murphy and James R. Curran
* Wiktionary for Natural Language Processing: Methodology and Limitations Emmanuel Navarro, Franck Sajous, Bruno Gaume, Laurent Prévot, ShuKai Hsieh, Ivy Kuo, Pierre Magistry and Chu-Ren Huang
* Using the Wiktionary Graph Structure for Synonym Detection Timothy Weale, Chris Brew and Eric Fosler-Lussier
* Automatic Content-Based Categorization of Wikipedia Articles Zeno Gantner and Lars Schmidt-Thieme
* Evaluating a Statistical CCG Parser on Wikipedia Matthew Honnibal, Joel Nothman and James R. Curran
* Construction of Disambiguated Folksonomy Ontologies Using Wikipedia Noriko Tomuro and Andriy Shepitsen
* Acquiring High Quality Non-Expert Knowledge from On-Demand Workforce Donghui Feng, Sveva Besana and Remi Zajac
* Constructing an Anaphorically Annotated Corpus with Non-Experts: Assessing the Quality of Collaborative Annotations Jon Chamberlain, Udo Kruschwitz and Massimo Poesio
INVITED TALK
Speaker: Rada Mihalcea, University of North Texas Title: Large Scale Semantic Annotations Using Encyclopedic Knowledge
Abstract: Wikipedia is an online encyclopedia that has grown to become one of the largest online repositories of encyclopedic knowledge, with millions of articles available for a large number of languages. In fact, Wikipedia editions are available for more than 200 languages, with a number of entries varying from a few pages to more than one million articles per language.
In this talk, I will describe the use of Wikipedia as a source of linguistic evidence for large scale semantic annotations. In particular, I will show how this online encyclopedia can be used to achieve state-of-the-art results on two text processing tasks: automatic keyword extraction and word sense disambiguation. I will also show how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system showed that the automatic annotations are reliable and hardly distinguishable from manual annotations. Additionally, an evaluation of the system in an educational environment showed that the availability of encyclopedic knowledge within easy reach of a learner can improve both the quality of the knowledge acquired and the time needed to obtain such knowledge.
Short bio: Rada Mihalcea is an Associate Professor of Computer Science at the University of North Texas. Her research interests are in lexical semantics, graph-based algorithms for natural language processing, and multilingual natural language processing. During 2004-2007, she acted as the president of the ACL Special Group on the Lexicon, and she serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, and Research in Language in Computation. She is the recipient of a National Science Foundation CAREER award.
INTRODUCTION
In recent years, online resources collaboratively constructed by ordinary users on the Web have considerably influenced the NLP community. In many works, they have been used as a substitute for conventional semantic resources and as semantically structured corpora with great success. While conventional resources such as WordNet are developed by trained linguists [1], online semantic resources can now be automatically extracted from the content collaboratively created by the users [2]. Thereby, the knowledge acquisition bottlenecks and coverage problems pertinent to conventional lexical semantic resources can be overcome.
The resource that has gained the greatest popularity in this respect so far is Wikipedia. However, other resources recently discovered in NLP, such as folksonomies, the multilingual collaboratively constructed dictionary Wiktionary, or Q&A sites like WikiAnswers or Yahoo! Answers are also very promising. Moreover, new wiki-based platforms such as Citizendium or Knol have recently emerged that offer features distinct from Wikipedia and are of high potential in terms of their use in NLP.
The benefits of using Web-based resources come along with new challenges, such as the interoperability with existing resources and the quality of the knowledge represented. As collaboratively created resources lack editorial control, they are typically incomplete. For the interoperability with conventional resources, the mappings have to be investigated. The quality of collaboratively constructed resources is questioned in many cases, and the information extraction remains a complicated task due to the incompleteness and semi- structuredness of the content. Therefore, the research community has begun to develop and provide tools for accessing collaboratively constructed resources [2,5].
The above listed challenges actually present a chance for NLP techniques to improve the quality of Web-based semantic resources. Researchers have therefore proposed techniques for link prediction [3] or information extraction [4] that can be used to guide the "crowds" to construct resources that are better suited for being used in NLP in return.
[1] Christiane Fellbaum WordNet An Electronic Lexical Database. MIT press, 1998. [2] Torsten Zesch, Christof Mueller and Iryna Gurevych Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary Proceedings of the Conference on Language Resources and Evaluation (LREC), 2008. http://www.ukp.tu-darmstadt.de/software/jwpl/ http://www.ukp.tu-darmstadt.de/software/jwktl/ [3] Rada Mihalcea and Andras Csomai Wikify!: Linking Documents to Encyclopedic Knowledge. Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007. [4] Daniel S. Weld et al. Intelligence in Wikipedia. Twenty-Third Conference on Artificial Intelligence (AAAI), 2008. [5] Kotaro Nakayama et al. Wikipedia Mining - Wikipedia as a Corpus for Knowledge Extraction. Proceedings of the Annual Wikipedia Conference (Wikimania), 2008. http://wikipedia-lab.org/en/index.php
TOPICS
The workshop will bring together researchers from both worlds: those using collaboratively created resources in NLP applications such as information retrieval, named entity recognition, or keyword extraction, and those using NLP applications for improving the resources or extracting different types of semantic information from them. Hopefully, this will turn into a feedback loop, where NLP techniques improved by collaboratively constructed resources are used to improve the resources in exchange.
ORGANIZERS
Iryna Gurevych Torsten Zesch
Ubiquitous Knowledge Processing Lab Technical University of Darmstadt, Germany
PROGRAM COMMITTEE
Delphine Bernhard Technische Universiaet Darmstadt Paul Buitelaar DERI, National University of Ireland, Galway Razvan Bunescu University of Texas at Austin Pablo Castells Universidad Autononoma de Madrid Philipp Cimiano Karlsruhe University Irene Cramer Dortmund University of Technology Andras Csomai Google Inc. Ernesto De Luca University of Magdeburg Roxana Girju University of Illinois at Urbana-Champaign Andreas Hotho University of Kassel Graeme Hirst University of Toronto Ed Hovy University of Southern California Jussi Karlgren Swedish Institute of Computer Science Boris Katz Massachusetts Institute of Technology Adam Kilgarriff Lexical Computing Ltd Chin-Yew Lin Microsoft Research James Martin University of Colorado Boulder Olena Medelyan University of Waikato David Milne University of Waikato Saif Mohammad University of Maryland Dan Moldovan University of Texas at Dallas Kotaro Nakayama University of Tokyo Ani Nenkova University of Pennsylvania Guenter Neumann DFKI Saarbruecken Maarten de Rijke University of Amsterdam Magnus Sahlgren Swedish Institute of Computer Science Manfred Stede Potsdam University Benno Stein Bauhaus University Weimar Tonio Wandmacher University of Osnabrueck Rene Witte Concordia University Montreal Hans-Peter Zorn European Media Lab, Heidelberg
wiki-research-l@lists.wikimedia.org