Today we reached at nl-wiki the situation that + 64% of the interwikiconflicts have been solved. A lot of this work has been done by the Dutch community, but also a lot of work is done by users form other projects, thank you very much for the help!
I have checked the complete template namespace and category namespace for local interwiki's and all are removed from these pages, so these namespaces are now clean on nl-wiki. If users from especially smaller Wikipedia's want to know on what pages of their wiki are local interwikis left, you can use AWB, download the latest databasedump and do a query on that dump. If you want to know what query you need exactly, e-mail me personally as the string of the query is a bit long. But it is even for noobs on bots and codes easy to do. (I can also do it for you.)
With doing all this solving of interwikiconflicts, we came across several things:
* A lot of biological conflicts are in our list of interwikiconflicts. Certain genus do only have one species under it, what makes some Wikipedias make that together one article, while others want two articles as it are two layers in the taxonomical tree. One article on the English Wikipedia that created hundreds of interwikiconflicts was a list to which many redirects were linking which were used for interwikis. All have been removed with a bot.
* Another thing we notice is that a lot of renamings of articles to make place for a disambiguation page haven't been proparly executed, as on Wikidata in an item of a group of articles, one of the links was to a disambiguation page. (It would be nice if a bot could check for disambiguation pages (based on the presence of a template from [[MediaWiki:Disambiguationspage]] on that wiki in it) so that we know where we need to fix this.)
* Another thing we see is that a lot of interwikis are still local because the local interwiki links to a page that is a redirect because the page was renamed, while this wasn't changed by a bot. Most interwikibots do not recognize that the redirect is the same page as the one added to Wikidata. So we need a bot to remove all interwikis that link to a redirect linking to a page that is in the same item as the page where the local interwikis are in.
Let's clean this mess up!
On 07/09/2013 05:00 AM, wikidata-l-request(a)lists.wikimedia.org wrote:
> Date: Mon, 8 Jul 2013 16:10:20 -0400
> From: Michael Hale <hale.michael.jr(a)live.com>
> To: Discussion list for the Wikidata project.
> Subject: Re: [Wikidata-l] Accelerating software innovation with
> Wikidata and improved Wikicode
> Message-ID: <BAY173-W358675F3AB97D5E81FC728DC780(a)phx.gbl>
> Content-Type: text/plain; charset="iso-8859-1"
> All positive change is gradual. In the meantime, for those of us with ample free time for coding, it'd be nice to have a place to check in code and unit tests that are organized roughly in the same way as Wikipedia. Maybe such a project already exists and I just haven't found it yet.
Broke the thread to say: yes, that would be pretty useful. Check out
please comment on that to ensure the conversation includes your voice
and perspective. Personally, I'm interested in finding ways for
MediaWiki administrators and power users to share Lua templates,
gadgets, user scripts, and skins.
I also encourage anyone writing Wikimedia-related code or tests to share
your code via a repository within the Wikimedia's Git/Gerrit
but I understand why Michael specified "that are organized roughly in
the same way as Wikipedia." :)
Engineering Community Manager
the following displays geographical data from Wikidata:
It is updated daily, clickable, zoomable, and also can be used to display
the connections between geographically located items. It also shows how
much data is already in Wikidata, and how amazing the community is. I hope
it will also inspire a few more drives for pushing data into Wikidata. Look
at all this huge empty places, in Africa, central Asia, Canada...
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
I have been pondering this for some time, and I would like some feedback. I figure there are many programmers on this list, but I think others might find it interesting as well.
If you look at the existing Wikicode and Rosetta Code the code samples are small and isolated. They will show, for example, how to open a file in 10 different languages. However, the search engines already do a great job of helping us find those types of code samples across blog posts of people who have had to do that specific task before. However, a problem that I run into frequently that the search engines don't help me solve is if I read a nanoelectronics paper and I want to do a simulation of the physical system they describe I often have to go to the websites of several different professors and do a fair bit of manual work to assemble their different programs into a pipeline, and then the result of my hacking is not easy to expand to new scenarios. We've made enough progress on Wikipedia that I can often just click on a couple of articles to get an understanding of the paper, but if I want to experiment with the ideas in a software context I have to do a lot of scavenging and gluing.
I'm not yet convinced that this could work. Maybe Wikipedia works so well because the internet reached a point where there was so much redundant knowledge listed in many places that there was immense social and economic pressure to utilize knowledgeable people to summarize it in a free encyclopedia. Maybe the total amount of software that has been written is still too small, there are still too few programmers, and it's still too difficult compared to writing natural languages for the crowdsourcing dynamics to work. There have been a lot of successful open-source software projects of course, but most of them are focused on creating software for a specific task instead of library components that cover all of the knowledge in the encyclopedia.
Here is my 2 cents.
I have paid my dues writing CRUD apps for business. They all want the same
thing, something that keeps track of entities and controls how the
organization interacts with those entities.
In one year, for instance, I worked on systems for an academic department
and a logistics company. The academic department needed a custom CRM system
to handle students through the lifecycle of prospect to applicant to student
to alumni. The logistics company had assets all over the place that they
were contracting with vendors to do move these assets around and do various
things with them. They sent out invoices to customers and payments to
vendors and all that.
I could think up several more examples but if we looked at a number of
business systems we see so many common elements that it seems like you ought
to be able to write the schema, add a few business rules, and there is
your application. Of course vendors have been promising us "4th Generation
Languages" since before the AI Winter, and Ruby on Rails and it's
descendants have realized many of their claims, but still building and
maintaining these systems means so much messing with details that it seems
there has got to be a better way.
This shared system would be probably an "upper" ontology because it comes
down to a 4D model of people and assets (you bet you need to know when the
last time assert 774Q8 was in Milwaukee), business transactions and that
I'd like to see 1000 flowers bloom around this schema but for this to
succeed there really has to be one vendor who builds a system from soup to
nuts that really "rocks" people.
There's an overlap between the world of Wikidata and the "business app"
domain described above in the sense that maybe some of your customers have
Wikidata id's or your locations correspond to things in Wikidata, etc,
plus similarities in schema.
From: Jane Darnell
Sent: Monday, July 8, 2013 1:13 PM
To: Discussion list for the Wikidata project.
Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and
I am all for a "dictionary of code snippets", but as with all
dictionaries, you need a way to group them, either by alphabetical
order or "birth date". It sounds like you have an idea how to group
those code samples, so why don't you share it? I would love to build
my own "pipeline" from a series of algorithms that someone else
published for me to reuse. I am also for more sharing of datacentric
programs, but where would the data be stored? Wikidata is for data
that can be used by Wikipedia, not by other projects, though maybe
someday we will find the need to put actual weather measurements in
Wikidata for some oddball Wikisource project tp do with the history of
global warming or something like that.
I just don't quite see how your idea would translate in the
Wiki(p/m)edia world into a project that could be indexed.
But then I never felt the need for "high-fidelity simulations of
virtual worlds" either.
2013/7/6, Michael Hale <hale.michael.jr(a)live.com>:
> I have been pondering this for some time, and I would like some feedback.
> figure there are many programmers on this list, but I think others might
> find it interesting as well.
> Are you satisfied with our progress in increasing software sophistication
> compared to, say, increasing the size of datacenters? Personally, I think
> there is still too much "reinventing the wheel" going on, and the best way
> to get to software that is complex enough to do things like high-fidelity
> simulations of virtual worlds is to essentially crowd-source the
> of Wikipedia into code. The existing structure of the Wikipedia articles
> would serve as a scaffold for a large, consistently designed, open-source
> software library. Then, whether I was making software for weather
> and I needed code to slowly simulate physically accurate clouds or I was
> making a game and I needed code to quickly draw stylized clouds I could
> go to the article for clouds, click on C++ (or whatever programming
> is appropriate) and then find some useful chunks of code. Every article
> could link to useful algorithms, data structures, and interface designs
> are relevant to the subject of the article. You could also find
> visualizer that accesses Wikidata. The big advantage would be that
> constraining the design of the library to the structure of Wikipedia would
> handle the encapsulation and modularity aspects of the software
> so that the components could improve independently. Creating a simulation
> visualization where you zoom in from a whole cloud to see its constituent
> microscopic particles is certainly doable right now, but it would be a lot
> easier with a function library like this.
> If you look at the existing Wikicode and Rosetta Code the code samples are
> small and isolated. They will show, for example, how to open a file in 10
> different languages. However, the search engines already do a great job of
> helping us find those types of code samples across blog posts of people
> have had to do that specific task before. However, a problem that I run
> frequently that the search engines don't help me solve is if I read a
> nanoelectronics paper and I want to do a simulation of the physical system
> they describe I often have to go to the websites of several different
> professors and do a fair bit of manual work to assemble their different
> programs into a pipeline, and then the result of my hacking is not easy to
> expand to new scenarios. We've made enough progress on Wikipedia that I
> often just click on a couple of articles to get an understanding of the
> paper, but if I want to experiment with the ideas in a software context I
> have to do a lot of scavenging and gluing.
> I'm not yet convinced that this could work. Maybe Wikipedia works so well
> because the internet reached a point where there was so much redundant
> knowledge listed in many places that there was immense social and economic
> pressure to utilize knowledgeable people to summarize it in a free
> encyclopedia. Maybe the total amount of software that has been written is
> still too small, there are still too few programmers, and it's still too
> difficult compared to writing natural languages for the crowdsourcing
> dynamics to work. There have been a lot of successful open-source software
> projects of course, but most of them are focused on creating software for
> specific task instead of library components that cover all of the
> in the encyclopedia.
Wikidata-l mailing list
I have one question concerning wikidata:
we have the statement
Ψ is the wave function
I have developed a system that discovers the relation between Ψ and
the page wave function
Is there a way to model that in wikidata or should I use another way
to model that relationship?
If so could someone show me how to model the relationship from the
PS: Linking to the variables can either be done by xpath, or I
assigned a number starting from 1 for all <math/> tags. That said a
possible link to the first equation could be specified by adding
#math1 to the url. (This already works locally.)
Mit freundlichen Grüßen
Telefon (Büro): +49 30 314 22784
Telefon (Privat):+49 30 488 27330
We have received quite a few requests for an extended deadline. We
understand that working with large amount of data such as DBpedia is
difficult and time consuming.
The deadline will therefore be extended until Thursday, July 18th, 23:59
Hawaii time. However, we would like to appeal to all authors to submit
an abstract, already. We would also be happy, if you submitted as soon
* new deadline: July 18th, 2013
* please submit abstract now
* we are still looking for a sponsor to create a challenge:
Apologies for multiple posting!
NLP & DBpedia Workshop 2013
Free, open, interoperable and multilingual NLP for DBpedia and DBpedia
Collocated with the International Semantic Web Conference 2013 (ISWC 2013)
21-22 October 2013, in Sydney, Australia (*Submission deadline July
18th*, 23:59 Hawaii time)
Recently, the DBpedia community has experienced an immense increase in
activity and we believe, that the time has come to explore the
connection between DBpedia & Natural Language Processing (NLP) in a yet
unpreceded depth. The goal of this workshop can be summarized by this
NLP & DBpedia == DBpedia4NLP && NLP4DBpedia
DBpedia has a long-standing tradition to provide useful data as well as
a commitment to reliable Semantic Web technologies and living best
practices. With the rise of WikiData, DBpedia is step-by-step relieved
from the tedious extraction of data from Wikipedia's infoboxes and can
shift its focus on new challenges such as extracting information from
the unstructured article text as well as becoming a testing ground for
multilingual NLP methods.
Within the timeframe of this workshop, we hope to mobilize a community
of stakeholders from the Semantic Web area. We envision the workshop to
produce the following items:
* an open call to the DBpedia data consumer community will generate a
wish list of data, which is to be generated from Wikipedia by NLP
methods. This wish list will be broken down to tasks and benchmarks and
a GOLD standard will be created.
* the benchmarks and test data created will be collected and published
under an open license for future evaluation (inspired by OAEI and
UCI-ML). An overview of the benchmarks can be found here:
Please sign up to our mailing list, if you are interested in discussing
guidelines and NLP benchmarking:
18 July 2013, Paper Submission Deadline
9 August 2013, Notification of accepted papers sent to authors
The central role of Wikipedia (and therefore DBpedia) for the creation
of a Translingual Web has recently been recognized by the Strategic
Research Agenda (cf. section 3.4, page 23) and most of the contributions
of the recently held Dagstuhl seminar on the Multilingual Semantic Web
also stress the role of Wikipedia for Multilingualism. As more and more
language-specific chapters of DBpedia appear (currently 14 language
editions), DBpedia is becoming a driving factor for a Linguistic Linked
Open Data cloud as well as localized LOD clouds with specialized domains
(e.g. the Dutch windmill domain ontology created from
The data contained in Wikipedia and DBpedia have ideal properties for
making them a controlled testbed for NLP. Wikipedia and DBpedia are
multilingual and multi-domain, the communities maintaining these
resource are very open and it is easy to join and contribute. The open
license allows data consumers to benefit from the content and many parts
are collaboratively editable. Especially, the data in DBpedia is widely
used and disseminated throughout the Semantic Web.
DBpedia has been around for quite a while, infusing the Web of Data with
multi-domain data of decent quality. These triples are, however, mostly
extracted from Wikipedia infoboxes. To unlock the full potential of
Wikipedia articles for DBpedia, the information contained in the
remaining part of the articles needs to be analysed and triplified.
Here, the NLP techniques may be of favour.
On the other hand NLP, and information extraction techniques in
particular, involve various resources while processing texts from
various domains. These resources may be used e.g. as an element of a
solution e.g. gazetteer being an important part of a rule created by an
expert or disambiguation resource, or while delivering a solution e.g.
within machine learning approaches. DBpedia easily fits in both of these
We invite papers from both these areas including:
1. Knowledge extraction from text and HTML documents (especially
unstructured and semi-structured documents) on the Web, using
information in the Linked Open Data (LOD) cloud, and especially in DBpedia.
2. Representation of NLP tool output and NLP resources as RDF/OWL, and
linking the extracted output to the LOD cloud.
3. Novel applications using the extracted knowledge, the Web of Data or
NLP DBpedia-based methods.
The specific topics are listed below.
- Improving DBpedia with NLP methods
- Finding errors in DBpedia with NLP methods
- Annotation methods for Wikipedia articles
- Cross-lingual data and text mining on Wikipedia
- Pattern and semantic analysis of natural language, reading the Web,
learning by reading
- Large-scale information extraction
- Entity resolution and automatic discovery of Named Entities
- Multilingual entity recognition task of real world entities
- Frequent pattern analysis of entities
- Relationship extraction, slot filling
- Entity linking, Named Entity disambiguation, cross-document
- Disambiguation through knowledge base
- Ontology representation of natural language text
- Analysis of ontology models for natural language text
- Learning and refinement of ontologies
- Natural language taxonomies modeled to Semantic Web ontologies
- Use cases for potential data extracted from Wikipedia articles
- Use cases of entity recognition for Linked Data applications
- Impact of entity linking on information retrieval, semantic search
Furthermore, an informal list of NLP tasks can be found on this
These are relevant for the workshop as long as they fit into the
DBpedia4NLP and NLP4DBpedia frame (i.e. the used data evolves around
Wikipedia and DBpedia).
All papers must represent original and unpublished work that is not
currently under review. Papers will be evaluated according to their
significance, originality, technical content, style, clarity, and
relevance to the workshop. At least one author of each accepted paper is
expected to attend the workshop.
* Full research paper (up to 12 pages)
* Position papers (up to 6 pages)
* Use case descriptions (up to 6 pages)
* Data/benchmark paper (2-6 pages, depending on the size and complexity)
Note: data and benchmarks papers are meant to provide a citable
reference for your data and benchmarks. We kindly require, that you
upload any data you use to our benchmark repository in parallel to the
submission. We recommend to use an open license (e.g. CC-BY), but
minimum requirement is free use. Please write to the mailing list, if
you have any problems.
Full instructions are available at:
Submission of data and use cases
This workshop also targets non-academic users and developers. If you
have any (open) data (e.g. texts or annotations) that can be used for
benchmarking NLP tools, but do not want or needd to write an academic
paper about it, please feel free to just add it to this table:
http://tinyurl.com/nlp-benchmarks or upload it to our repository:
Full instructions are available at:
Also if you have any ideas, use cases or data requests please feel free
to just post them on our mailing list: nlp-dbpedia-public [at]
lists.informatik.uni-leipzig.de or send them directly to the chairs:
nlp-dbpedia2013 [at] easychair.org
* Guadalupe Aguado, Universidad Politécnica de Madrid, Spain
* Chris Bizer, Universität Mannheim, Germany
* Volha Bryl, Universität Mannheim, Germany
* Paul Buitelaar, DERI, National University of Ireland, Galway
* Charalampos Bratsas, OKFN, Greece, ???????????? ????????????
????????????, (Aristotle University of Thessaloniki), Greece
* Philipp Cimiano, CITEC, Universität Bielefeld, Germany
* Samhaa R. El-Beltagy, ?????_????? (Nile University), Egypt
* Daniel Gerber, AKSW, Universität Leipzig, Germany
* Jorge Gracia, Universidad Politécnica de Madrid, Spain
* Max Jakob, Neofonie GmbH, Germany
* Anja Jentzsch, Hasso-Plattner-Institut, Potsdam, Germany
* Ali Khalili, AKSW, Universität Leipzig, Germany
* Daniel Kinzler, Wikidata, Germany
* David Lewis, Trinity College Dublin, Ireland
* John McCrae, Universität Bielefeld, Germany
* Uroš Miloševic', Institut Mihajlo Pupin, Serbia
* Roberto Navigli, Sapienza, Università di Roma, Italy
* Axel Ngonga, AKSW, Universität Leipzig, Germany
* Asunción Gómez Pérez, Universidad Politécnica de Madrid, Spain
* Lydia Pintscher, Wikidata, Germany
* Elena Montiel Ponsoda, Universidad Politécnica de Madrid, Spain
* Giuseppe Rizzo, Eurecom, France
* Harald Sack, Hasso-Plattner-Institut, Potsdam, Germany
* Felix Sasaki, Deutsches Forschungszentrum für künstliche Intelligenz,
* Mladen Stanojevic', Institut Mihajlo Pupin, Serbia
* Hans Uszkoreit, Deutsches Forschungszentrum für künstliche
* Rupert Westenthaler, Salzburg Research, Austria
* Feiyu Xu, Deutsches Forschungszentrum für künstliche Intelligenz, Germany
Of course we would prefer that you will post any questions and comments
regarding NLP and DBpedia to our public mailing list at:
nlp-dbpedia-public [at] lists.informatik.uni-leipzig.de
If you want to contact the chairs of the workshop directly, please write to:
nlp-dbpedia2013 [at] easychair.org
Sebastian Hellmann, Agata Filipowska, Caroline Barrière,
Pablo N. Mendes, Dimitris Kontokostas
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
* NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Extended
Deadline: *July 18th*)
* LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Research Group: http://aksw.org
Denny published a very rough and tentative timeline for the next month
I hope this will give you some idea of what is there to come. Please
keep in mind though that this obviously might change if we run into
Let me know if you have any questions.
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Technical Projects
Wikimedia Deutschland e.V.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.