Hoi, There is no problem considering these points. You go in a direction that has little to do with what we are and where we stand. Wikidata is a wiki. That implies that it does not have to be perfect. It implies that approaches are taken that arguably wacky and we will see in time how it pans out. For instance, "Frankfurt" "instance of" "big city", big city is a city that is over a certain size. The size is debatable and consequently it is really poor as a concept. from a Wikidata point of view. It can be inferred and therefore even redundant. Does it matter? Not really because in time "we" will see the light.
Our data is incomplete. Arguably importing data enables us to share more of the sum of all knowledge to our users. A given percentage of all data is incorrect. However having no data is arguably 100% incorrect and 100% not in line with our goal of serving the sum of all knowledge. Quality is important so processes and workflows are exceedingly important to have. We lack in that department so far. But comparing external data sources like VIAF or DNB in an iterative way is obvious when you want to identify those items and statements that are suspect. The data in Wikidata makes it easy because we have spend considerable effort linking external sources first to Wikipedia and now to Wikidata. It is easy to mark items with issues using qualifiers on the external source ID and have a basis for such workflows and quality markers.
When you make a point of external sources trusting Wikidata, these external sources may be consumers or they can be partners. When they are partners, we can provide RSS feeds informing of issues that have been found and they can do their curation on their data. When they are consumers we can still provide such an RSS but we do not know what they do with it, it is their problem more than it is ours.
As I say so often, Wikidata is immature. It is silly to blindly trust Wikidata. It is largely based on Wikipedia and it has constructs of its own that we do not need/want in Wikidata. Big cities is one example. We have items because of interwiki links that are a mix of all kinds eg a listed building and an organisation. This is conceptually wrong at Wikidata and it needs to be split. This is where many Wikipedians become uncomfortable but hey, Wikidata does not tell them to rewrite their article.
So yes you can continue with this point but it has little impact on Wikidata and when you think it should do consider what impact it has on Wikidata as a wiki. It is NOT an academic resource or a reference source perse. It is a wiki, it is allowed to be wrong particularly when it has proper workflows to improve quality.
If anything THIS is where we can do with a lot more talk and preferably action. This is where Wikidata is obviously lacking and when we do have proper workflows in place, we do NOT need the dump that is the "primary sources" as this is the antithesis of a wiki and it prevents us from sharing available knowledge. Thanks, GerardM
On 28 November 2015 at 07:05, Wil Sinclair wllm@wllm.com wrote:
Gergo, do you mind if people continue discussing this? I'm finding it very interesting and fruitful. I hadn't thought through these issues before, and there are likely to be others on this list who haven't either.
Best! ,Wil
On Fri, Nov 27, 2015 at 5:17 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Fri, Nov 27, 2015 at 11:14 AM, Lila Tretikov lila@wikimedia.org
wrote:
What I hear in email from Andreas and Liam is not as much the
propagation
of the error (which I am sure happens with some % of the cases), but the fact that the original source is obscured and therefore it is hard to identify and correct errors, biases, etc. Because if the source of
error is
obscured, that error is that much harder to find and to correct. In
fact,
we see this even on Wikipedia articles today (wrong dates of births
sourced
from publications that don't do enough fact checking is something I came across personally). It is a powerful and important principle on
Wikipedia,
but with content re-use it gets lost. Public domain/CC0 in combination
with
AI lands our content for slicing and dicing and re-arranging by others, making it something entirely new, but also detached from our process of validation and verification. I am curious to hear if people think it is
a
problem. It definitely worries me.
This conversation seems to have morphed into trying to solve some
problems
that we are speculating Google might have (no one here actually *knows*
how
the Knowledge Graph works, of course; maybe it's sensitive to
manipulation
of Wikidata claims, maybe not). That seems like an entirely fruitless
line
of discourse to me; if the problem exists, it is Google's problem to
solve
(since they are the ones in a position to tell if it's a real problem or not; not to mention they have two or three magnitudes more resources to throw at it than the Wikimedia movement would). Trying to make our
content
less free for fear that someone might misuse it is a shamefully wrong
frame
of mind for and organization that's supposed to be a leader of the open content movement, IMO. _______________________________________________ Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe