Apologies for cross-posting!
=======================
NLP & DBpedia Workshop 2013
=======================
Free, open, interoperable and multilingual NLP for DBpedia and DBpedia
for NLP:
http://nlp-dbpedia2013.blogs.aksw.org/
Collocated with the International Semantic Web Conference 2013 (ISWC 2013)
21-22 October 2013, in Sydney, Australia (*Submission deadline July 8th*)
**********************************
Recently, the DBpedia community has experienced an immense increase in
activity and we believe, that the time has come to explore the
connection between DBpedia & Natural Language Processing (NLP) in a yet
unpreceded depth. The goal of this workshop can be summarized by this
(pseudo-) formula:
NLP & DBpedia == DBpedia4NLP && NLP4DBpedia
http://db0.aksw.org/downloads/CodeCogsEqn_bold2.gif
DBpedia has a long-standing tradition to provide useful data as well as
a commitment to reliable Semantic Web technologies and living best
practices. With the rise of WikiData, DBpedia is step-by-step relieved
from the tedious extraction of data from Wikipedia's infoboxes and can
shift its focus on new challenges such as extracting information from
the unstructured article text as well as becoming a testing ground for
multilingual NLP methods.
Contribution
=========
Within the timeframe of this workshop, we hope to mobilize a community
of stakeholders from the Semantic Web area. We envision the workshop to
produce the following items:
* an open call to the DBpedia data consumer community will generate a
wish list of data, which is to be generated from Wikipedia by NLP
methods. This wish list will be broken down to tasks and benchmarks and
a GOLD standard will be created.
* the benchmarks and test data created will be collected and published
under an open license for future evaluation (inspired by OAEI and
UCI-ML). An overview of the benchmarks can be found here:
http://nlp-dbpedia2013.blogs.aksw.org/benchmarks
Please sign up to our mailing list, if you are interested in discussing
guidelines and NLP benchmarking:
http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp-dbpedia-public
Important dates
===========
8 July 2013, Paper Submission Deadline
9 August 2013, Notification of accepted papers sent to authors
Motivation
=======
The central role of Wikipedia (and therefore DBpedia) for the creation
of a Translingual Web has recently been recognized by the Strategic
Research Agenda (cf. section 3.4, page 23) and most of the contributions
of the recently held Dagstuhl seminar on the Multilingual Semantic Web
also stress the role of Wikipedia for Multilingualism. As more and more
language-specific chapters of DBpedia appear (currently 14 language
editions), DBpedia is becoming a driving factor for a Linguistic Linked
Open Data cloud as well as localized LOD clouds with specialized domains
(e.g. the Dutch windmill domain ontology created from
http://nl.dbpedia.org ).
The data contained in Wikipedia and DBpedia have ideal properties for
making them a controlled testbed for NLP. Wikipedia and DBpedia are
multilingual and multi-domain, the communities maintaining these
resource are very open and it is easy to join and contribute. The open
license allows data consumers to benefit from the content and many parts
are collaboratively editable. Especially, the data in DBpedia is widely
used and disseminated throughout the Semantic Web.
NLP4DBpedia
==========
DBpedia has been around for quite a while, infusing the Web of Data with
multi-domain data of decent quality. These triples are, however, mostly
extracted from Wikipedia infoboxes. To unlock the full potential of
Wikipedia articles for DBpedia, the information contained in the
remaining part of the articles needs to be analysed and triplified.
Here, the NLP techniques may be of favour.
DBpedia4NLP
==========
On the other hand NLP, and information extraction techniques in
particular, involve various resources while processing texts from
various domains. These resources may be used e.g. as an element of a
solution e.g. gazetteer being an important part of a rule created by an
expert or disambiguation resource, or while delivering a solution e.g.
within machine learning approaches. DBpedia easily fits in both of these
roles.
We invite papers from both these areas including:
1. Knowledge extraction from text and HTML documents (especially
unstructured and semi-structured documents) on the Web, using
information in the Linked Open Data (LOD) cloud, and especially in DBpedia.
2. Representation of NLP tool output and NLP resources as RDF/OWL, and
linking the extracted output to the LOD cloud.
3. Novel applications using the extracted knowledge, the Web of Data or
NLP DBpedia-based methods.
The specific topics are listed below.
Topics
=====
- Improving DBpedia with NLP methods
- Finding errors in DBpedia with NLP methods
- Annotation methods for Wikipedia articles
- Cross-lingual data and text mining on Wikipedia
- Pattern and semantic analysis of natural language, reading the Web,
learning by reading
- Large-scale information extraction
- Entity resolution and automatic discovery of Named Entities
- Multilingual entity recognition task of real world entities
- Frequent pattern analysis of entities
- Relationship extraction, slot filling
- Entity linking, Named Entity disambiguation, cross-document
co-reference resolution
- Disambiguation through knowledge base
- Ontology representation of natural language text
- Analysis of ontology models for natural language text
- Learning and refinement of ontologies
- Natural language taxonomies modeled to Semantic Web ontologies
- Use cases for potential data extracted from Wikipedia articles
- Use cases of entity recognition for Linked Data applications
- Impact of entity linking on information retrieval, semantic search
Furthermore, an informal list of NLP tasks can be found on this
Wikipedia page:
http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP
These are relevant for the workshop as long as they fit into the
DBpedia4NLP and NLP4DBpedia frame (i.e. the used data evolves around
Wikipedia and DBpedia).
Submission formats
==============
Paper submission
-----------------------
All papers must represent original and unpublished work that is not
currently under review. Papers will be evaluated according to their
significance, originality, technical content, style, clarity, and
relevance to the workshop. At least one author of each accepted paper is
expected to attend the workshop.
* Full research paper (up to 12 pages)
* Position papers (up to 6 pages)
* Use case descriptions (up to 6 pages)
* Data/benchmark paper (2-6 pages, depending on the size and complexity)
Note: data and benchmarks papers are meant to provide a citable
reference for your data and benchmarks. We kindly require, that you
upload any data you use to our benchmark repository in parallel to the
submission. We recommend to use an open license (e.g. CC-BY), but
minimum requirement is free use. Please write to the mailing list, if
you have any problems.
Full instructions are available at:
http://nlp-dbpedia2013.blogs.aksw.org/submission/
Submission of data and use cases
--------------------------------------------
This workshop also targets non-academic users and developers. If you
have any (open) data (e.g. texts or annotations) that can be used for
benchmarking NLP tools, but do not want or needd to write an academic
paper about it, please feel free to just add it to this table:
http://tinyurl.com/nlp-benchmarks or upload it to our repository:
http://github.com/dbpedia/nlp-dbpedia
Full instructions are available at:
http://nlp-dbpedia2013.blogs.aksw.org/benchmarks/
Also if you have any ideas, use cases or data requests please feel free
to just post them on our mailing list: nlp-dbpedia-public [at]
lists.informatik.uni-leipzig.de or send them directly to the chairs:
nlp-dbpedia2013 [at] easychair.org
Program committee
==============
* Guadalupe Aguado, Universidad Politécnica de Madrid, Spain
* Chris Bizer, Universität Mannheim, Germany
* Volha Bryl, Universität Mannheim, Germany
* Paul Buitelaar, DERI, National University of Ireland, Galway
* Charalampos Bratsas, OKFN, Greece, ???????????? ????????????
????????????, (Aristotle University of Thessaloniki), Greece
* Philipp Cimiano, CITEC, Universität Bielefeld, Germany
* Samhaa R. El-Beltagy, ?????_????? (Nile University), Egypt
* Daniel Gerber, AKSW, Universität Leipzig, Germany
* Jorge Gracia, Universidad Politécnica de Madrid, Spain
* Max Jakob, Neofonie GmbH, Germany
* Anja Jentzsch, Hasso-Plattner-Institut, Potsdam, Germany
* Ali Khalili, AKSW, Universität Leipzig, Germany
* Daniel Kinzler, Wikidata, Germany
* David Lewis, Trinity College Dublin, Ireland
* John McCrae, Universität Bielefeld, Germany
* Uroš Miloševic', Institut Mihajlo Pupin, Serbia
* Roberto Navigli, Sapienza, Università di Roma, Italy
* Axel Ngonga, AKSW, Universität Leipzig, Germany
* Asunción Gómez Pérez, Universidad Politécnica de Madrid, Spain
* Lydia Pintscher, Wikidata, Germany
* Elena Montiel Ponsoda, Universidad Politécnica de Madrid, Spain
* Giuseppe Rizzo, Eurecom, France
* Harald Sack, Hasso-Plattner-Institut, Potsdam, Germany
* Felix Sasaki, Deutsches Forschungszentrum für künstliche Intelligenz,
Germany
* Mladen Stanojevic', Institut Mihajlo Pupin, Serbia
* Hans Uszkoreit, Deutsches Forschungszentrum für künstliche
Intelligenz, Germany
* Rupert Westenthaler, Salzburg Research, Austria
* Feiyu Xu, Deutsches Forschungszentrum für künstliche Intelligenz, Germany
Contact
=====
Of course we would prefer that you will post any questions and comments
regarding NLP and DBpedia to our public mailing list at:
nlp-dbpedia-public [at] lists.informatik.uni-leipzig.de
If you want to contact the chairs of the workshop directly, please write
to:
nlp-dbpedia2013 [at] easychair.org
Kind regards,
Sebastian Hellmann, Agata Filipowska, Caroline Barrière,
Pablo N. Mendes, Dimitris Kontokostas
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Hello,
Based on the feedback I got from the community, I have made my final
proposal. Since only 3 days are left, I urge everyone to kindly reply as
early as possible.
My proposal can be found here:
https://www.mediawiki.org/wiki/User:Rahul21/Gsoc
Thanks in advance,
Rahul
I have been working on a pronunciation recording extension as a part of my
Summer Code Project.https://www.mediawiki.org/wiki/User:Rahul21/Gsoc
Could you people take a look at the UI Mockups that i have made and give me
some suggestions or feedback as to how would you want the tool to be
deployed?
Thanks in Advance
Rahul
Hello,
I am Rahul Maliakkal , a 3rd year Computer Science student ,i wish to apply
for GSOC 2013 edition. Based on the feedback and suggestions received i
have improved my proposal as i like to call it* v2.0 .* Please do have a
look at it
https://www.mediawiki.org/wiki/User:Rahul21/Gsoc
Continue to keep the suggestions and feedback pouring in.
Thanks,
Rahul
Rahul Maliakkal, 10/04/2013 19:27:
> As we all know right now uploading an audio file is only possible in
> .ogg format.
>
> In my GSOC project , i plan on adding *.wav support* to commons ,since
> its not patent encumbered i think it should be fine
Context: "Pronunciation Recording Extension"
https://www.mediawiki.org/wiki/User:Rahul21/Gsoc
> I would like to get the communities feedback on this.
Is the reason that the dependencies you found all require this format?
Nemo
[Apologies for cross-posting]
Dear fellow DBpedians,
I am very excited to announce that DBpedia and DBpedia Spotlight have again
been been selected for the Google Summer of Code 2013!!!
If you know energetic students (BSc,MSc,PhD) interested in working with
DBpedia, text processing, and semantics, please encourage them to apply!
More details can also be found on the blog post here:
http://blog.dbpedia.org/2013/04/10/dbpediaspotlight-accepted-google-summer-…
On behalf of the DBpedia GSoC team,
Dimitris Kontokostas
--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.orgHomepage:http://aksw.org/DimitrisKontokostas
<http://aksw.org/DimitrisKontokostas>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
As per requests, here are some links:
Bug #46610 - Pronunciation recording tool[1]
A tool to record brief snippets of sound, uploading them to
commons and inserting the sound file into the article. There is
a draft GSOC proposal[2]. The student is seeking mentorship
regarding the proposal, and is often available on IRC in
#mediawiki, nick Rahul_21.
Bug #36881 - Wiktionary needs usable API[3]
Discussion about how best to leverage Wiktionary data via mw
api or a standalone api. Probably the best suggestion is to
implement at least partial DICT[4] support based on on-the-fly
article parsing. Mark Hershberger wrote up a GSOC project
proposal for 2013[5]
Amgine
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=46610
[2] https://www.mediawiki.org/wiki/User:Rahul21/Gsoc
[3] https://bugzilla.wikimedia.org/show_bug.cgi?id=36881
[4] https://tools.ietf.org/html/rfc2229
[5]
https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Make_W…
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJRYvKZAAoJEBGze5c9ley6c+EH/j9h1p9JtvuybzmCyGaaTOmE
O2vpXJEYSbaD5iwsmAI7cgPWaxz6nBSp0BmRaKXYe+DH0sBFkpQ92PeIUEEU5DVo
GBfotc0ewxMl3wmxcNmol9kfz0njGRwXGvianM32o9uG4C4Bc4HLjPmOM/0n8Nyp
9GOoZapVtmvvgfGcxrPozqdcDJEF4OWPowAV2kqEaiMJz8941eEzGbP1OCViECky
mLqyaOBWHNw3a9Zb6qFzi//02dotDGJYbS97vlEkS2T84uDmoHJJ4j+AehoXBq8Z
Sy43scfYN+6yOe6SGG7E9LPDvICF9Jczhqxl2dt42tbrSO+ol4CfbN7qWf9H/Bw=
=ojfz
-----END PGP SIGNATURE-----
Hi All,
Greeting,
I am a CS grad student from Data Science Lab Stony Brook<https://sites.google.com/site/datascienceslab/> and I am dropping this mail to request information about parsing multi-lingual Wiktionary data. Our lab has been using Wikipedia data for quite a while now but we are really interested in taking advantage of the massive Wiktionary content which we feel , after proper parsing, can become an rich muti-language corpus.
But the big hurdle is a parsing tool. We have tried a few Wiktionary parsing tools
1. https://github.com/clbecker/perl-wiktionary-parser/
2. https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
3. https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_pars…
4. http://www.ukp.tu-darmstadt.de/software/jwktl/
but none of them are available in a ready-to-use or easy-to-extend in multiple language mode. (I am currently trying to work with wikokit (parser 2 above) )
I request for some advice, suggestion or redirection towards best available Wiktionary parser. We are mainly looking to extract meanings, POS, examples, translations etc. (more can never hurt).
Any help is appreciated. Kindly let know if further information is needed.
Regards,
Moutupsi