Wikidata July 2013

wikidata@lists.wikimedia.org

52 participants
46 discussions

[Wikidata-l] [ANN] Call for Papers - Workshop on "NLP & DBpedia" at ISWC 2013 - Deadline July 8th
by Sebastian Hellmann 02 Jul '13

02 Jul '13

Apologies for cross-posting! ======================= NLP & DBpedia Workshop 2013 ======================= Free, open, interoperable and multilingual NLP for DBpedia and DBpedia for NLP: http://nlp-dbpedia2013.blogs.aksw.org/ Collocated with the International Semantic Web Conference 2013 (ISWC 2013) 21-22 October 2013, in Sydney, Australia (*Submission deadline July 8th*) Please email us, if you need a deadline extension! ********************************** Recently, the DBpedia community has experienced an immense increase in activity and we believe, that the time has come to explore the connection between DBpedia & Natural Language Processing (NLP) in a yet unpreceded depth. The goal of this workshop can be summarized by this (pseudo-) formula: NLP & DBpedia == DBpedia4NLP && NLP4DBpedia http://db0.aksw.org/downloads/CodeCogsEqn_bold2.gif DBpedia has a long-standing tradition to provide useful data as well as a commitment to reliable Semantic Web technologies and living best practices. With the rise of WikiData, DBpedia is step-by-step relieved from the tedious extraction of data from Wikipedia's infoboxes and can shift its focus on new challenges such as extracting information from the unstructured article text as well as becoming a testing ground for multilingual NLP methods. Contribution ========= Within the timeframe of this workshop, we hope to mobilize a community of stakeholders from the Semantic Web area. We envision the workshop to produce the following items: * an open call to the DBpedia data consumer community will generate a wish list of data, which is to be generated from Wikipedia by NLP methods. This wish list will be broken down to tasks and benchmarks and a GOLD standard will be created. * the benchmarks and test data created will be collected and published under an open license for future evaluation (inspired by OAEI and UCI-ML). An overview of the benchmarks can be found here: http://nlp-dbpedia2013.blogs.aksw.org/benchmarks Please sign up to our mailing list, if you are interested in discussing guidelines and NLP benchmarking: http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp-dbpedia-public Important dates =========== 8 July 2013, Paper Submission Deadline 9 August 2013, Notification of accepted papers sent to authors Motivation ======= The central role of Wikipedia (and therefore DBpedia) for the creation of a Translingual Web has recently been recognized by the Strategic Research Agenda (cf. section 3.4, page 23) and most of the contributions of the recently held Dagstuhl seminar on the Multilingual Semantic Web also stress the role of Wikipedia for Multilingualism. As more and more language-specific chapters of DBpedia appear (currently 14 language editions), DBpedia is becoming a driving factor for a Linguistic Linked Open Data cloud as well as localized LOD clouds with specialized domains (e.g. the Dutch windmill domain ontology created from http://nl.dbpedia.org ). The data contained in Wikipedia and DBpedia have ideal properties for making them a controlled testbed for NLP. Wikipedia and DBpedia are multilingual and multi-domain, the communities maintaining these resource are very open and it is easy to join and contribute. The open license allows data consumers to benefit from the content and many parts are collaboratively editable. Especially, the data in DBpedia is widely used and disseminated throughout the Semantic Web. NLP4DBpedia ========== DBpedia has been around for quite a while, infusing the Web of Data with multi-domain data of decent quality. These triples are, however, mostly extracted from Wikipedia infoboxes. To unlock the full potential of Wikipedia articles for DBpedia, the information contained in the remaining part of the articles needs to be analysed and triplified. Here, the NLP techniques may be of favour. DBpedia4NLP ========== On the other hand NLP, and information extraction techniques in particular, involve various resources while processing texts from various domains. These resources may be used e.g. as an element of a solution e.g. gazetteer being an important part of a rule created by an expert or disambiguation resource, or while delivering a solution e.g. within machine learning approaches. DBpedia easily fits in both of these roles. We invite papers from both these areas including: 1. Knowledge extraction from text and HTML documents (especially unstructured and semi-structured documents) on the Web, using information in the Linked Open Data (LOD) cloud, and especially in DBpedia. 2. Representation of NLP tool output and NLP resources as RDF/OWL, and linking the extracted output to the LOD cloud. 3. Novel applications using the extracted knowledge, the Web of Data or NLP DBpedia-based methods. The specific topics are listed below. Topics ===== - Improving DBpedia with NLP methods - Finding errors in DBpedia with NLP methods - Annotation methods for Wikipedia articles - Cross-lingual data and text mining on Wikipedia - Pattern and semantic analysis of natural language, reading the Web, learning by reading - Large-scale information extraction - Entity resolution and automatic discovery of Named Entities - Multilingual entity recognition task of real world entities - Frequent pattern analysis of entities - Relationship extraction, slot filling - Entity linking, Named Entity disambiguation, cross-document co-reference resolution - Disambiguation through knowledge base - Ontology representation of natural language text - Analysis of ontology models for natural language text - Learning and refinement of ontologies - Natural language taxonomies modeled to Semantic Web ontologies - Use cases for potential data extracted from Wikipedia articles - Use cases of entity recognition for Linked Data applications - Impact of entity linking on information retrieval, semantic search Furthermore, an informal list of NLP tasks can be found on this Wikipedia page: http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP These are relevant for the workshop as long as they fit into the DBpedia4NLP and NLP4DBpedia frame (i.e. the used data evolves around Wikipedia and DBpedia). Submission formats ============== Paper submission ----------------------- All papers must represent original and unpublished work that is not currently under review. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted paper is expected to attend the workshop. * Full research paper (up to 12 pages) * Position papers (up to 6 pages) * Use case descriptions (up to 6 pages) * Data/benchmark paper (2-6 pages, depending on the size and complexity) Note: data and benchmarks papers are meant to provide a citable reference for your data and benchmarks. We kindly require, that you upload any data you use to our benchmark repository in parallel to the submission. We recommend to use an open license (e.g. CC-BY), but minimum requirement is free use. Please write to the mailing list, if you have any problems. Full instructions are available at: http://nlp-dbpedia2013.blogs.aksw.org/submission/ Submission of data and use cases -------------------------------------------- This workshop also targets non-academic users and developers. If you have any (open) data (e.g. texts or annotations) that can be used for benchmarking NLP tools, but do not want or needd to write an academic paper about it, please feel free to just add it to this table: http://tinyurl.com/nlp-benchmarks or upload it to our repository: http://github.com/dbpedia/nlp-dbpedia Full instructions are available at: http://nlp-dbpedia2013.blogs.aksw.org/benchmarks/ Also if you have any ideas, use cases or data requests please feel free to just post them on our mailing list: nlp-dbpedia-public [at] lists.informatik.uni-leipzig.de or send them directly to the chairs: nlp-dbpedia2013 [at] easychair.org Program committee ============== * Guadalupe Aguado, Universidad Politécnica de Madrid, Spain * Chris Bizer, Universität Mannheim, Germany * Volha Bryl, Universität Mannheim, Germany * Paul Buitelaar, DERI, National University of Ireland, Galway * Charalampos Bratsas, OKFN, Greece, ???????????? ???????????? ????????????, (Aristotle University of Thessaloniki), Greece * Philipp Cimiano, CITEC, Universität Bielefeld, Germany * Samhaa R. El-Beltagy, ?????_????? (Nile University), Egypt * Daniel Gerber, AKSW, Universität Leipzig, Germany * Jorge Gracia, Universidad Politécnica de Madrid, Spain * Max Jakob, Neofonie GmbH, Germany * Anja Jentzsch, Hasso-Plattner-Institut, Potsdam, Germany * Ali Khalili, AKSW, Universität Leipzig, Germany * Daniel Kinzler, Wikidata, Germany * David Lewis, Trinity College Dublin, Ireland * John McCrae, Universität Bielefeld, Germany * Uroš Miloševic', Institut Mihajlo Pupin, Serbia * Roberto Navigli, Sapienza, Università di Roma, Italy * Axel Ngonga, AKSW, Universität Leipzig, Germany * Asunción Gómez Pérez, Universidad Politécnica de Madrid, Spain * Lydia Pintscher, Wikidata, Germany * Elena Montiel Ponsoda, Universidad Politécnica de Madrid, Spain * Giuseppe Rizzo, Eurecom, France * Harald Sack, Hasso-Plattner-Institut, Potsdam, Germany * Felix Sasaki, Deutsches Forschungszentrum für künstliche Intelligenz, Germany * Mladen Stanojevic', Institut Mihajlo Pupin, Serbia * Hans Uszkoreit, Deutsches Forschungszentrum für künstliche Intelligenz, Germany * Rupert Westenthaler, Salzburg Research, Austria * Feiyu Xu, Deutsches Forschungszentrum für künstliche Intelligenz, Germany Contact ===== Of course we would prefer that you will post any questions and comments regarding NLP and DBpedia to our public mailing list at: nlp-dbpedia-public [at] lists.informatik.uni-leipzig.de If you want to contact the chairs of the workshop directly, please write to: nlp-dbpedia2013 [at] easychair.org Kind regards, Sebastian Hellmann, Agata Filipowska, Caroline Barrière, Pablo N. Mendes, Dimitris Kontokostas -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Events: * NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Deadline: *July 8th*) * LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt) Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf Projects: http://nlp2rdf.org , http://linguistics.okfn.org , http://dbpedia.org/Wiktionary , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org

1 0

[Wikidata-l] Hello from Adam!
by Adam Shorland 01 Jul '13

01 Jul '13

Hi! I'm Adam. I have just started working at WMDE on Wikidata which is contributing towards my placement year at University in the UK. I will be around for at least the next 6 months which I am sure will be great! I'm going to be working on lots of bits and pieces which I hope will keep everyone happy including usability testing, bug fixing and triage, analysis of usage patterns, api stuff and communication (among others) ! -- Adam

4 3

Re: [Wikidata-l] A solution with finality is needed for P107 - maintype (GND)
by Paul A. Houle 01 Jul '13

01 Jul '13

I would say that GND is a “good enough” answer. Most named entities are persons, organizations, events, creative works and places and these are all mutually exclusive. There ought to be a system interlock to prevent confusion between them. “Organism Classification” or whatever you call it should also be on the list, because of prevalence. One thing I’d add to that is fictional character because there are a (1) lot of them and (2) they can be ontologized more-or-less in parallel with people, and (3) you’ll get cleaner people if you keep fictional characters out. (On the other hand, there are fictional events, places, etc. too, though these are not so well documented.) Is it easy to add a new GND type? I think you’re calling the “wastebin” category term, which is reasonable (I’d call it a “concept”.) Going much further than this you’ll run into Borges encyclopedia style risks, but aren’t the categories named in GND upwards of 80% of the topics? Can you run a report on this? From: Sven Manguard Sent: Sunday, June 30, 2013 2:19 PM To: Discussion list for the Wikidata project. Subject: [Wikidata-l] A solution with finality is needed for P107 - maintype (GND) I have just closed a second deletion discussion for Property:P107 - main type (GND). As with the first discussion, it is clear that there is a broad sense that main type (GND) is not an ideal solution, however as it stands now, a large enough portion of the community does not want to get rid of it unless/until a replacement system is found or developed. For this reason, I closed the discussion as no consensus and opened up a request for comment on the matter of finding a replacement for P107. I have gone to the unusual step of emailing the mailing list for three reasons. First, P107 is the most used property on the project, and it or its replacement will (most likely) remain the most used property on the project forever. Second, the GND has evolved into a component of how Wikidata is structured; our lists of properties are sorted by GND type, and that has a real impact on what properties are used on what pages. The third reason is that, as a general statement, participation levels in requests for comment have been downright sad. Three or four people participating in an RfC is, for a project of this size, unhealthy, and most RfCs don't get more than that many people participating in them. For something this important, we need at least a dozen people, preferably at least twice that.</rant> Anyways, the RfC is at https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Primary_sorting… and I hope that, with broad participation, we can finally resolve this issue. Yours, Sven -------------------------------------------------------------------------------- _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

3 2

[Wikidata-l] Alternative proposal for Wiktionary
by David Cuenca 01 Jul '13

01 Jul '13

Hi, A colleague and me, we have prepared an alternative proposal for Wiktionary support on Wikidata https://www.wikidata.org/wiki/Wikidata:Wiktionary_%28alternative_proposal%29 It tries to address several problems that the former proposal had (see comments page of the precedent proposal). Thanks, David

1 0

[Wikidata-l] Fwd: [annotator-dev] Annotator v2.0 work-in-progress
by Samuel Klein 01 Jul '13

01 Jul '13

For those interested in annotation data, this is a good time to inform the development of the Annotator framework. SJ ---------- Forwarded message ---------- From: "Nick Stenning" <nick(a)whiteink.com> Date: Jun 29, 2013 9:15 AM Subject: [annotator-dev] Annotator v2.0 work-in-progress To: "annotator-dev(a)lists.okfn.org" <annotator-dev(a)lists.okfn.org> Cc: Hi all, I've pushed a branch with some very early-stage work on Annotator v2.0. I'm focusing mainly on extracting most of the persistence logic from core Annotator, and simplifying the Store plugin requirements through use of an intermediate Registry object. If anyone has any feedback, it would be gratefully received: https://github.com/okfn/annotator/compare/master...wip In particular, the new Registry: https://github.com/okfn/annotator/blob/208498c/src/registry.coffee And dramatically slimmed down Store plugin: https://github.com/okfn/annotator/blob/208498c/src/plugin/store.coffee -N _______________________________________________ annotator-dev mailing list annotator-dev(a)lists.okfn.org http://lists.okfn.org/mailman/listinfo/annotator-dev Unsubscribe: http://lists.okfn.org/mailman/options/annotator-dev

2 1

Re: [Wikidata-l] entity vs Special:EntityData
by Gregor Hagedorn 01 Jul '13

01 Jul '13

> http://www.wikidata.org/wiki/Q1000 is indeed an information resource > http://www.wikidata.org/entity/Q1000 is the URI for the thing, which indeed Apologies, I copied the wrong URI when testing! >> Also, checking with http://validator.linkeddata.org/vapour I received >> an error about invalid response (I am not sure whether this is a >> problem with Vapour or Wikidata ...) > More details would be appreciated. Both entity: http://validator.linkeddata.org/vapour?uri=http%3A%2F%2Fwww.wikidata.org%2F… and wiki http://validator.linkeddata.org/vapour?uri=http%3A%2F%2Fwww.wikidata.org%2F… give: * Dereferencing resource URI (without content negotiation) ** 1st request while dereferencing resource URI without specifying the desired content type (HTTP response code should be 200): Failed * Dereferencing resource URI (requesting RDF/XML) ** 1st request while dereferencing resource URI without specifying the desired content type (Content type should be 'application/rdf+xml'): Failed ** 1st request while dereferencing resource URI without specifying the desired content type (HTTP response code should be 200): Failed ---- The graph (not the text) suggests an response of: http: 400 Bad Request: The request cannot be fulfilled due to bad syntax This could be vapours fault, but I fear something is wrong with wikidata. At least the wiki test should simply find an information resource. Note that testing wikipedia works: http://validator.linkeddata.org/vapour?uri=https%3A%2F%2Fen.wikipedia.org%2… Best Gregor

2 1

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata July 2013