cross-posting as this might be of interest to people on this list

Begin forwarded message:

From: Marco Fossati <>
Subject: [Wikidata] SrepHit IEG proposal: call for support (was Re: [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool)
Date: September 21, 2015 at 3:32:25 AM PDT
Reply-To: "Discussion list for the Wikidata project." <>

Dear all,

The StrepHit IEG proposal is now pretty much complete:

We have already received support and feedback, but you are the most relevant community and the project needs your specific help.

Your voice is vital and it can be heard on the project page in multiple ways. If you:
1. like the idea, please click on the *endorse* blue button;
2. want to get involved, please click on the *join* blue button;
3. share your thoughts, please click on the *give feedback* link.

Looking forward to your updates.

On 9/9/15 11:39, Marco Fossati wrote:
Hi Markus, everyone,

The project proposal is currently in active development.
I would like to focus now on the dissemination of the idea and the
engagement of the Wikidata community.
Hence, I would love to gather feedback on the following question:

Does StrepHit sounds interesting and useful for you?

It would be great if you could report your thoughts on the project talk


On 9/8/15 2:02 PM, wrote:
Date: Mon, 07 Sep 2015 16:47:16 +0200
From: Markus Krötzsch<>
To: "Discussion list for the Wikidata project."
Subject: Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the
   primary sources tool
Content-Type: text/plain; charset=utf-8; format=flowed

Dear Marco,

Sounds interesting, but the project page still has a lot of gaps. Will
you notify us again when you are done? It is a bit tricky to endorse a
proposal that is not finished yet;-)


On 04.09.2015 17:01, Marco Fossati wrote:
>[Begging pardon if you have already read this in the Wikidata
project chat]
>Hi everyone,
>As Wikidatans, we all know how much data quality matters.
>We all know what high quality stands for: statements need to be
>validated via references to external, non-wiki, sources.
>That's why the primary sources tool is being developed:
>And that's why I am preparing the StrepHit IEG proposal:

>StrepHit (pronounced "strep hit", means "Statement? repherence it!") is
>a Natural Language Processing pipeline that understands human language,
>extracts structured data from raw text and produces Wikidata statements
>with reference URLs.
>As a demonstration to support the IEG proposal, you can find the
>**FBK-strephit-soccer** dataset uploaded to the primary sources tool
>It's a small dataset serving the soccer domain use case.
>Please follow the instructions on the project page to activate it and
>start playing with the data.
>What is the biggest difference that sets StrepHit datasets apart from
>the currently uploaded ones?
>At least one reference URL is always guaranteed for each statement.
>This means that if StrepHit finds some new statement that was not there
>in Wikidata before, it will always propose its external references.
>We do not want to manually reject all the new statements with no
>reference, right?
>If you like the idea, please endorse the StrepHit IEG proposal!

Marco Fossati
Twitter: @hjfocs
Skype: hell_j

Wikidata mailing list

Dario Taraborelli  Head of Research, Wikimedia Foundation