On Fri, Sep 4, 2015 at 5:01 PM, Marco Fossati <hell.j.fox(a)gmail.com> wrote:
[Begging pardon if you have already read this in the
Wikidata project chat]
Hi everyone,
As Wikidatans, we all know how much data quality matters.
We all know what high quality stands for: statements need to be validated
via references to external, non-wiki, sources.
That's why the primary sources tool is being developed:
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
And that's why I am preparing the StrepHit IEG proposal:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
StrepHit (pronounced "strep hit", means "Statement? repherence it!")
is a
Natural Language Processing pipeline that understands human language,
extracts structured data from raw text and produces Wikidata statements with
reference URLs.
As a demonstration to support the IEG proposal, you can find the
**FBK-strephit-soccer** dataset uploaded to the primary sources tool
backend.
It's a small dataset serving the soccer domain use case.
Please follow the instructions on the project page to activate it and start
playing with the data.
What is the biggest difference that sets StrepHit datasets apart from the
currently uploaded ones?
At least one reference URL is always guaranteed for each statement.
This means that if StrepHit finds some new statement that was not there in
Wikidata before, it will always propose its external references.
We do not want to manually reject all the new statements with no reference,
right?
If you like the idea, please endorse the StrepHit IEG proposal!
Thank you for working on this, Marco. This is a great step forward. I
wish you good luck for the IEG proposal!
Cheers
Lydia
--
Lydia Pintscher -
http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.