[Begging pardon if you have already read this in the Wikidata project chat]
As Wikidatans, we all know how much data quality matters.
We all know what high quality stands for: statements need to be
validated via references to external, non-wiki, sources.
That's why the primary sources tool is being developed:
And that's why I am preparing the StrepHit IEG proposal:
StrepHit (pronounced "strep hit", means "Statement? repherence it!") is
a Natural Language Processing pipeline that understands human language,
extracts structured data from raw text and produces Wikidata statements
with reference URLs.
As a demonstration to support the IEG proposal, you can find the
**FBK-strephit-soccer** dataset uploaded to the primary sources tool
It's a small dataset serving the soccer domain use case.
Please follow the instructions on the project page to activate it and
start playing with the data.
What is the biggest difference that sets StrepHit datasets apart from
the currently uploaded ones?
At least one reference URL is always guaranteed for each statement.
This means that if StrepHit finds some new statement that was not there
in Wikidata before, it will always propose its external references.
We do not want to manually reject all the new statements with no
If you like the idea, please endorse the StrepHit IEG proposal!
PLEASE reconsider. A Wikidata based solution is not superior because it
started from Wikidata.
PLEASE consider collaboration. It will be so much more powerful when LSJBOT
and people at Wikidata collaborate. It will get things right the first
time. It does not have to be perfect from the start as long as it gets
better over time. As long as we always work on improving the data.
PLEASE consider text generation based on Wikidata. They are the scripts
LSJBOT uses, they can help us improve the text when more or better
information becomes available.
On 6 September 2015 at 08:25, Ricordisamoa <ricordisamoa(a)openmailbox.org>
> Proper data-based stubs are being worked on:
> Lsjbot, you have no chance to survive make your time.
> Il 06/09/2015 02:40, Anders Wennersten ha scritto:
>> Geonames  is a database which holds around 9 M entries of geographical
>> related items from all over the world.
>> Lsjbot is now generating articles from a subset of it, after several
>> months of extensive research on its quality, Wikidata relations and
>> notability issues. While the quality in some regions is substandard (and
>> these will not be generated) it was seen as very good in most areas. In
>> the discussion I was intrigued to learn that identical Arabic names should
>> be transcribed differently depending on its geographic location. And I was
>> fascinated of the question of notability of wells in the Bahrain desert
>> (which in the end was excluded, mostly because we knew too little of that
>> In this run Lsjbot has extended its functionality even further then when
>> it generated articles for species. It looks for relevant geographical items
>> close to the actual one: a lake close by, a mountain and where is the
>> nearest major town etc.
>> Macedonia can be taken as one example. Lsjbot generated over 10000
>> articles (and 5000 disambiguous pages) making it a magnitude more than what
>> exist in enwp. Also for a well defined type like villages, almost 50% as
>> many has been generated than existing in enwp. One example  where you
>> can see what has been generated (and note the reuse of a relevant figure
>> existing in frwp). Please compare the corresponding articles on other
>> languages in this case, many having less information than the bot generated
>> The generation is still in early stage [3) but has already got the
>> article count for svwp to pass 2 M today. But it will take many months
>> more before completed and perhaps more M marks will be passed before it is
>> through. If you want to give feedback you are welcome to enter it at 
>> (with all credits for the Lsjbot to be given to Sverker, its owner, I am
>> just one of the many supporters of him and his bot on svwp)
>> Wikimedia-l mailing list, guidelines at:
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> Wikimedia-l mailing list, guidelines at:
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
Let me add a further reply to your comment.
On 9/5/15 2:01 PM, wikidata-request(a)lists.wikimedia.org wrote:
> Message: 3
> Date: Fri, 4 Sep 2015 19:26:38 +0200
> From: Gerard Meijssen<gerard.meijssen(a)gmail.com>
> Quality is not determined by sources. Sources do lie.
> When you want quality, you seek sources where they matter most. It is not
> by going for "all" of them
I completely agree with you that many sources can be flawed. I may have
neglected the term "trustworthy" before "sources" and added it in the
Wikidata project chat.
The IEG proposal will also include an investigation phase to select a
set of authoritative sources, see the first task in the proposal work
I'll expand on this.
Tell me if I am right or wrong about this.
If I am coining a URI for something that has an identifier in an outside
system is is straightforward to append the identifier (possibly modified a
little) to a prefix, such as
Then you can write
@prefix dbpedia: <http://dbpedia.org/resource/>
and then refer to the concept (in either Turtle or SPARQL) as
I will take one step further than this and say that for pedagogical and
other coding situations, the extra length of prefix declarations is an
additional cognitive load on top of all the other cognitive loads of
dealing with the system, so in the name of concision you can do something
@prefix : <http://dbpedia.org/ontology/>
and then you can write :someProperty and <Stellarator>, and your queries
are looking very simple.
The production for a QName cannot begin with a number so it is not correct
to write something like
or expect to have the full URI squashed to that. This kind of gotcha will
drive newbies nuts, and the realization of RDF as a universal solvent
requires squashing many of them.
Another example is
If you look at the @base declaration above, you see a way to get around
this, because with the base above you can write
<100> which works just fine in the dbpedia case.
I like what Wikidata did with using fairly dense sequential integers for
the ids, so a dbpedia resource URI looks like
which is always a QName, so you can write
@prefix wd: <http://www.wikidata.org/entity/>
and then you can write
and it is all fine, because (i) wikidata added the alpha prefix and (ii)
started at the beginning with it, and (iii) made up a plausible
explanation for it is that way. Freebase mids have the same property, so
:BaseKB has it too
I think customers would expect to be able to give us
and have it just work, but because a number is never valid in the QName,
you can encode the URI like this:
and then write
which is not too bad. Note, however, if you want to write
<0884049582> you have to encode as
because, at least with the Jena framework, the same thing happens if you
so you can't choose a representation which supports that mode of expression
and a :+prefix mode.
Now what bugs me is, what to do in the case of something which "might or
might not be numeric". What internal prefix would find good acceptability
for end users?
*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*
(607) 539 6254 paul.houle on Skype ontology2(a)gmail.com
:BaseKB -- Query Freebase Data With SPARQL
Legal Entity Identifier Lookup
Join our Data Lakes group on LinkedIn
The Wikidata crew is very welcome to join!
-------- Forwarded Message --------
From: Andre Klapper <aklapper(a)wikimedia.org>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Subject: Gerrit Cleanup Day: Wed, Sep 23
Date: Tue, 01 Sep 2015 00:27:12 +0200
I'm happy to announce a Gerrit Cleanup Day on Wed, September 23.
It's an experiment to reduce Wikimedia's code review backlog which
hurts growing our long-term code contributor base.
Development/engineering teams of the Wikimedia Foundation are supposed
to join and use the day to primarily review recently submitted open
Gerrit changesets without a review, focussing on volunteer
contributions. And developers of other organizations and individual
developers are of course also very invited to join and help! :)
https://phabricator.wikimedia.org/T88531 provides more information,
steps, links to Gerrit queries. Note it's still work in progress.
Your questions and feedback are welcome.
Andre Klapper | Wikimedia Bugwrangler
I am happy to announce the release of Wikidata Toolkit 0.5.0 , the
Java library for programming with Wikidata and Wikibase.
The most prominent new feature of this release is Wikibase API support,
which allows you create Java programs that read and write data to
Wikidata (or any other Wikibase site). The API write functionality
checks the live data before making edits to merge statements for you and
to detect edit conflicts. New example programs illustrate this
functionality. Overall, we think this will make WDTK interesting for bot
Other prominent features include:
* Unit support (just in time before it is enabled on Wikidata.org ;-)
* Processing of local dump files not downloaded from Wikimedia (useful
for other Wikibase users)
* New builder classes to simplify construction of the rather complex
data objects we have in Wikidata
* WorldMapProcessor example (the code used to build the Wikidata maps)
* Improved output file naming for examples, taking dump date into account
* Several improvements in RDF export (but the general RDF structure is
as in 0.4.0; updating this to the new structure we have for the official
SPARQL endpoint is planned for the next release).
Maven users can get the library directly from Maven Central (see );
this is the preferred method of installation. It might still take a
moment until the new packages become visible in Maven Central. There is
also an all-in-one JAR at github  and of course the sources  and
updated JavaDocs .
Feedback is very welcome. Developers are also invited to contribute via
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
Hi everyone :)
We've finally done all the groundwork for unit support. I'd love for
you to give the first version a try on the test system here:
There are a few known issues still but since this is one of the things
holding back Wikidata I made the call to release now and work on these
remaining things after that. What I know is still missing:
* We're showing the label of the item of the unit. We should be
showing the symbol of the unit in the future.
* We can't convert between units yet - we only have the groundwork for
it so far. (https://phabricator.wikimedia.org/T77978)
* The items representing often-used units should be ranked higher in
the selector. (https://phabricator.wikimedia.org/T110673)
* When editing an existing value you see the URL of unit's item. This
should be replaced by the label.
* When viewing a diff of a unit change you see the URL of the unit's
item. This should be replaced by the label.
* We need to think some more about the automatic edit summaries for
unit-related changes. (https://phabricator.wikimedia.org/T108807)
If you find any bugs or if you are missing other absolutely critical
things please let me know here or file a ticket on
phabricator.wikimedia.org. If everything goes well we can get this on
Wikidata next Wednesday.
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Please feel free to redistribute!
From: [[kgh] ] <mediawiki(a)kghoffmeyer.de>
Date: Tue, Sep 1, 2015 at 8:43 AM
Subject: Semantic MediaWiki Conference Fall 2015: Call for Contributions
To: Semantic MediaWiki users <semediawiki-user(a)lists.sourceforge.net>,
Cc: Alina Mierlus <alina(a)similis.cc>, Toni Hermoso Pulido
<toniher(a)cau.cat>, Lia Veja <cornelia.veja(a)gmail.com>, Karsten
Dear users, developers and all people interested in semantic wikis,
We are very happy to announce that early bird registration to the 12th
Semantic MediaWiki Conference is now open!
Important facts reminder:
Dates: October 28th to October 30th 2015 (Wednesday to Friday)
Location: Fabra i Coats, Art Factory. Carrer Sant Adrià 20 (Sant
Andreu), Barcelona, Catalonia, Spain.
Conference page: https://semantic-mediawiki.org/wiki/SMWCon_Fall_2015
Participants: Everybody interested in semantic wikis, especially in
Semantic MediaWiki, e.g. users, developers, consultants, business
representatives and researchers.
We welcome new contributions from you:
We encourage contributions about applications and development of
semantic wikis; for a list of topics, see .
Please propose regular talks, posters or workshops on the conference
website. We will do our best to consider your proposal in the
conference program. An interesting variety of talks has already be
proposed, see .
Presentations will generally be video and audio recorded and made
available for others after the conference.
If you've already announced your talk it's now time to expand its description.
News on participation and tutorials:
You can now officially register for the conference  and benefit
from early bird fees until October 5, 2015.
The tutorial program has been announced and made available .
Amical Wikimedia  and Open Semantic Data Association e. V.  have
become the official organisers of SMWCon Fall 2015
Thanks to Institut de Cultura - Ajuntament de Barcelona  for
providing free access to the conference location and its
If you have questions you can contact Lia Veja and Karsten Hoffmeyer
(Program Chairs), Alina Mierluș (General Chair) or Toni Hermoso (Local
Chair) per e-mail (Cc).
We will be happy to see you in Barcelona!
Lia Veja, Karsten Hoffmeyer (Program Board)
Dr. Cornelia Veja
Deutsches Institut für Internationale Pädagogische Forschung (DIPF)
Schlossstrasse 29; Room 309
60486 Frankfurt am Main
Tel.: +49 (0)69 24708-703
Iris SemData Consulting Ltd.
162/76, C. Brancusi Street
400462 Cluj-Napoca (Romania)
Skype ID: liaveja