[Begging pardon if you have already read this in the Wikidata project chat]
Hi everyone,
As Wikidatans, we all know how much data quality matters.
We all know what high quality stands for: statements need to be
validated via references to external, non-wiki, sources.
That's why the primary sources tool is being developed:
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
And that's why I am preparing the StrepHit IEG proposal:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
StrepHit (pronounced "strep hit", means "Statement? repherence it!") is
a Natural Language Processing pipeline that understands human language,
extracts structured data from raw text and produces Wikidata statements
with reference URLs.
As a demonstration to support the IEG proposal, you can find the
**FBK-strephit-soccer** dataset uploaded to the primary sources tool
backend.
It's a small dataset serving the soccer domain use case.
Please follow the instructions on the project page to activate it and
start playing with the data.
What is the biggest difference that sets StrepHit datasets apart from
the currently uploaded ones?
At least one reference URL is always guaranteed for each statement.
This means that if StrepHit finds some new statement that was not there
in Wikidata before, it will always propose its external references.
We do not want to manually reject all the new statements with no
reference, right?
If you like the idea, please endorse the StrepHit IEG proposal!
Cheers,
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j
Hoi,
PLEASE reconsider. A Wikidata based solution is not superior because it
started from Wikidata.
PLEASE consider collaboration. It will be so much more powerful when LSJBOT
and people at Wikidata collaborate. It will get things right the first
time. It does not have to be perfect from the start as long as it gets
better over time. As long as we always work on improving the data.
PLEASE consider text generation based on Wikidata. They are the scripts
LSJBOT uses, they can help us improve the text when more or better
information becomes available.
Thanks,
GerardM
On 6 September 2015 at 08:25, Ricordisamoa <ricordisamoa(a)openmailbox.org>
wrote:
> Proper data-based stubs are being worked on:
> https://phabricator.wikimedia.org/project/profile/1416/
> Lsjbot, you have no chance to survive make your time.
>
>
> Il 06/09/2015 02:40, Anders Wennersten ha scritto:
>
>> Geonames [1] is a database which holds around 9 M entries of geographical
>> related items from all over the world.
>>
>> Lsjbot is now generating articles from a subset of it, after several
>> months of extensive research on its quality, Wikidata relations and
>> notability issues. While the quality in some regions is substandard (and
>> these will not be generated) it was seen as very good in most areas. In
>> the discussion I was intrigued to learn that identical Arabic names should
>> be transcribed differently depending on its geographic location. And I was
>> fascinated of the question of notability of wells in the Bahrain desert
>> (which in the end was excluded, mostly because we knew too little of that
>> reality)
>>
>> In this run Lsjbot has extended its functionality even further then when
>> it generated articles for species. It looks for relevant geographical items
>> close to the actual one: a lake close by, a mountain and where is the
>> nearest major town etc.
>>
>> Macedonia can be taken as one example. Lsjbot generated over 10000
>> articles (and 5000 disambiguous pages) making it a magnitude more than what
>> exist in enwp. Also for a well defined type like villages, almost 50% as
>> many has been generated than existing in enwp. One example [2] where you
>> can see what has been generated (and note the reuse of a relevant figure
>> existing in frwp). Please compare the corresponding articles on other
>> languages in this case, many having less information than the bot generated
>> one.
>>
>> The generation is still in early stage [3) but has already got the
>> article count for svwp to pass 2 M today. But it will take many months
>> more before completed and perhaps more M marks will be passed before it is
>> through. If you want to give feedback you are welcome to enter it at [4]
>>
>> Anders
>> (with all credits for the Lsjbot to be given to Sverker, its owner, I am
>> just one of the many supporters of him and his bot on svwp)
>>
>> [1]
>> http://www.geonames.org/about.html
>>
>> [2]
>> https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
>>
>> [3]
>> https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
>>
>> [4]
>>
>> https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_p…
>>
>>
>>
>>
>> _______________________________________________
>> Wikimedia-l mailing list, guidelines at:
>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
>> Wikimedia-l(a)lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>
>
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
Hi Gerard,
Let me add a further reply to your comment.
On 9/5/15 2:01 PM, wikidata-request(a)lists.wikimedia.org wrote:
> Message: 3
> Date: Fri, 4 Sep 2015 19:26:38 +0200
> From: Gerard Meijssen<gerard.meijssen(a)gmail.com>
>
> No.
> Quality is not determined by sources. Sources do lie.
>
> When you want quality, you seek sources where they matter most. It is not
> by going for "all" of them
I completely agree with you that many sources can be flawed. I may have
neglected the term "trustworthy" before "sources" and added it in the
Wikidata project chat.
The IEG proposal will also include an investigation phase to select a
set of authoritative sources, see the first task in the proposal work
package:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
I'll expand on this.
Cheers,
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j
Tell me if I am right or wrong about this.
If I am coining a URI for something that has an identifier in an outside
system is is straightforward to append the identifier (possibly modified a
little) to a prefix, such as
http://dbpedia.org/resource/Stellarator
Then you can write
@prefix dbpedia: <http://dbpedia.org/resource/>
and then refer to the concept (in either Turtle or SPARQL) as
dbpedia:Stellarator.
I will take one step further than this and say that for pedagogical and
other coding situations, the extra length of prefix declarations is an
additional cognitive load on top of all the other cognitive loads of
dealing with the system, so in the name of concision you can do something
like
@base <http://dbpedia.org/resource/>
@prefix : <http://dbpedia.org/ontology/>
and then you can write :someProperty and <Stellarator>, and your queries
are looking very simple.
The production for a QName cannot begin with a number so it is not correct
to write something like
dbpedia:100
or expect to have the full URI squashed to that. This kind of gotcha will
drive newbies nuts, and the realization of RDF as a universal solvent
requires squashing many of them.
Another example is
isbn:9971-5-0210-0
If you look at the @base declaration above, you see a way to get around
this, because with the base above you can write
<100> which works just fine in the dbpedia case.
I like what Wikidata did with using fairly dense sequential integers for
the ids, so a dbpedia resource URI looks like
http://www.wikidata.org/entity/Q4876286
which is always a QName, so you can write
@base <http://www.wikidata.org/entity/>
@prefix wd: <http://www.wikidata.org/entity/>
and then you can write
wd:Q4876286
<Q4876286>
and it is all fine, because (i) wikidata added the alpha prefix and (ii)
started at the beginning with it, and (iii) made up a plausible
explanation for it is that way. Freebase mids have the same property, so
:BaseKB has it too
I think customers would expect to be able to give us
isbn:0884049582
and have it just work, but because a number is never valid in the QName,
you can encode the URI like this:
http://isbn.example.com/I0884049582
and then write
isbn:I0884049582
<I0884049582>
which is not too bad. Note, however, if you want to write
<0884049582> you have to encode as
http://isbn.example.com/I0884049582
because, at least with the Jena framework, the same thing happens if you
write
@base <http://isbn.example.com/I>
or
@base <http://isbn.example.com/>
so you can't choose a representation which supports that mode of expression
and a :+prefix mode.
Now what bugs me is, what to do in the case of something which "might or
might not be numeric". What internal prefix would find good acceptability
for end users?
--
Paul Houle
*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*
(607) 539 6254 paul.houle on Skype ontology2(a)gmail.com
:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/
Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>
Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275
The Wikidata crew is very welcome to join!
andre
-------- Forwarded Message --------
From: Andre Klapper <aklapper(a)wikimedia.org>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Subject: Gerrit Cleanup Day: Wed, Sep 23
Date: Tue, 01 Sep 2015 00:27:12 +0200
I'm happy to announce a Gerrit Cleanup Day on Wed, September 23.
It's an experiment to reduce Wikimedia's code review backlog which
hurts growing our long-term code contributor base.
Development/engineering teams of the Wikimedia Foundation are supposed
to join and use the day to primarily review recently submitted open
Gerrit changesets without a review, focussing on volunteer
contributions. And developers of other organizations and individual
developers are of course also very invited to join and help! :)
https://phabricator.wikimedia.org/T88531 provides more information,
steps, links to Gerrit queries. Note it's still work in progress.
Your questions and feedback are welcome.
Thanks,
andre
--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/
Hi all,
I am happy to announce the release of Wikidata Toolkit 0.5.0 [1], the
Java library for programming with Wikidata and Wikibase.
The most prominent new feature of this release is Wikibase API support,
which allows you create Java programs that read and write data to
Wikidata (or any other Wikibase site). The API write functionality
checks the live data before making edits to merge statements for you and
to detect edit conflicts. New example programs illustrate this
functionality. Overall, we think this will make WDTK interesting for bot
authors.
Other prominent features include:
* Unit support (just in time before it is enabled on Wikidata.org ;-)
* Processing of local dump files not downloaded from Wikimedia (useful
for other Wikibase users)
* New builder classes to simplify construction of the rather complex
data objects we have in Wikidata
* WorldMapProcessor example (the code used to build the Wikidata maps)
* Improved output file naming for examples, taking dump date into account
* Several improvements in RDF export (but the general RDF structure is
as in 0.4.0; updating this to the new structure we have for the official
SPARQL endpoint is planned for the next release).
Maven users can get the library directly from Maven Central (see [1]);
this is the preferred method of installation. It might still take a
moment until the new packages become visible in Maven Central. There is
also an all-in-one JAR at github [3] and of course the sources [4] and
updated JavaDocs [5].
Feedback is very welcome. Developers are also invited to contribute via
github.
Cheers,
Markus
[1] https://www.mediawiki.org/wiki/Wikidata_Toolkit
[2] https://www.mediawiki.org/wiki/Wikidata_Toolkit/Client
[3] https://github.com/Wikidata/Wikidata-Toolkit/releases
[4] https://github.com/Wikidata/Wikidata-Toolkit/
[5] http://wikidata.github.io/Wikidata-Toolkit/
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/
Hi everyone :)
We've finally done all the groundwork for unit support. I'd love for
you to give the first version a try on the test system here:
http://wikidata.beta.wmflabs.org/wiki/Q23950
There are a few known issues still but since this is one of the things
holding back Wikidata I made the call to release now and work on these
remaining things after that. What I know is still missing:
* We're showing the label of the item of the unit. We should be
showing the symbol of the unit in the future.
(https://phabricator.wikimedia.org/T77983)
* We can't convert between units yet - we only have the groundwork for
it so far. (https://phabricator.wikimedia.org/T77978)
* The items representing often-used units should be ranked higher in
the selector. (https://phabricator.wikimedia.org/T110673)
* When editing an existing value you see the URL of unit's item. This
should be replaced by the label.
(https://phabricator.wikimedia.org/T110675)
* When viewing a diff of a unit change you see the URL of the unit's
item. This should be replaced by the label.
(https://phabricator.wikimedia.org/T108808)
* We need to think some more about the automatic edit summaries for
unit-related changes. (https://phabricator.wikimedia.org/T108807)
If you find any bugs or if you are missing other absolutely critical
things please let me know here or file a ticket on
phabricator.wikimedia.org. If everything goes well we can get this on
Wikidata next Wednesday.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Please feel free to redistribute!
Lia Veja
From: [[kgh] ] <mediawiki(a)kghoffmeyer.de>
Date: Tue, Sep 1, 2015 at 8:43 AM
Subject: Semantic MediaWiki Conference Fall 2015: Call for Contributions
To: Semantic MediaWiki users <semediawiki-user(a)lists.sourceforge.net>,
semediawiki-devel(a)lists.sourceforge.net
Cc: Alina Mierlus <alina(a)similis.cc>, Toni Hermoso Pulido
<toniher(a)cau.cat>, Lia Veja <cornelia.veja(a)gmail.com>, Karsten
Hoffmeyer <karsten(a)hoffmeyer.info>
Dear users, developers and all people interested in semantic wikis,
We are very happy to announce that early bird registration to the 12th
Semantic MediaWiki Conference is now open!
Important facts reminder:
Dates: October 28th to October 30th 2015 (Wednesday to Friday)
Location: Fabra i Coats, Art Factory. Carrer Sant Adrià 20 (Sant
Andreu), Barcelona, Catalonia, Spain.
Conference page: https://semantic-mediawiki.org/wiki/SMWCon_Fall_2015
Participants: Everybody interested in semantic wikis, especially in
Semantic MediaWiki, e.g. users, developers, consultants, business
representatives and researchers.
We welcome new contributions from you:
We encourage contributions about applications and development of
semantic wikis; for a list of topics, see [1].
Please propose regular talks, posters or workshops on the conference
website. We will do our best to consider your proposal in the
conference program. An interesting variety of talks has already be
proposed, see [2].
Presentations will generally be video and audio recorded and made
available for others after the conference.
If you've already announced your talk it's now time to expand its description.
News on participation and tutorials:
You can now officially register for the conference [3] and benefit
from early bird fees until October 5, 2015.
The tutorial program has been announced and made available [4].
Organization:
Amical Wikimedia [5] and Open Semantic Data Association e. V. [6] have
become the official organisers of SMWCon Fall 2015
Thanks to Institut de Cultura - Ajuntament de Barcelona [7] for
providing free access to the conference location and its
infrastructure
If you have questions you can contact Lia Veja and Karsten Hoffmeyer
(Program Chairs), Alina Mierluș (General Chair) or Toni Hermoso (Local
Chair) per e-mail (Cc).
We will be happy to see you in Barcelona!
Lia Veja, Karsten Hoffmeyer (Program Board)
[1] <http://semantic-mediawiki.org/wiki/SMWCon_Fall_2015/Announcement>
[2] <https://semantic-mediawiki.org/wiki/SMWCon_Fall_2015#Program_proposals>
[3] <https://ti.to/wikisofia/smwcon2015-fall>
[4] <http://semantic-mediawiki.org/wiki/SMWCon_Fall_2015#Program>
[5] <https://www.wikimedia.cat/>
[6] <https://opensemanticdata.org/>
[7] <http://lameva.barcelona.cat/barcelonacultura/en/>
--
Dr. Cornelia Veja
----------------------------------------------
Deutsches Institut für Internationale Pädagogische Forschung (DIPF)
Schlossstrasse 29; Room 309
60486 Frankfurt am Main
Tel.: +49 (0)69 24708-703
E-Mail: veja(a)dipf.de
Web: www.dipf.de;
-----------------------------------------------
Iris SemData Consulting Ltd.
162/76, C. Brancusi Street
400462 Cluj-Napoca (Romania)
Tel: +40-364-439-875
Mobile: +40-723-326-175
E-Mail: Cornelia.Veja(a)gmail.com
Skype ID: liaveja