The danger of blanket statements is that they are often easy to refute. No. Quality is not determined by sources. Sources do lie.

When you want quality, you seek sources where they matter most. It is not by going for "all" of them, it is where Wikidata differs from other sources.

Arguably and I do make that argument. Wikidata is so much underdeveloped in the statement department that having more data with a reasonable expectation of quality will trump quality for a much smaller dataset.

On 4 September 2015 at 17:01, Marco Fossati <hell.j.fox@gmail.com> wrote:
[Begging pardon if you have already read this in the Wikidata project chat]

Hi everyone,

As Wikidatans, we all know how much data quality matters.
We all know what high quality stands for: statements need to be validated via references to external, non-wiki, sources.

That's why the primary sources tool is being developed:
And that's why I am preparing the StrepHit IEG proposal:

StrepHit (pronounced "strep hit", means "Statement? repherence it!") is a Natural Language Processing pipeline that understands human language, extracts structured data from raw text and produces Wikidata statements with reference URLs.

As a demonstration to support the IEG proposal, you can find the **FBK-strephit-soccer** dataset uploaded to the primary sources tool backend.
It's a small dataset serving the soccer domain use case.
Please follow the instructions on the project page to activate it and start playing with the data.

What is the biggest difference that sets StrepHit datasets apart from the currently uploaded ones?
At least one reference URL is always guaranteed for each statement.
This means that if StrepHit finds some new statement that was not there in Wikidata before, it will always propose its external references.
We do not want to manually reject all the new statements with no reference, right?

If you like the idea, please endorse the StrepHit IEG proposal!

Marco Fossati
Twitter: @hjfocs
Skype: hell_j

Wikidata mailing list