Hoi,
There is no problem considering these points. You go in a direction that
has little to do with what we are and where we stand. Wikidata is a wiki.
That implies that it does not have to be perfect. It implies that
approaches are taken that arguably wacky and we will see in time how it
pans out. For instance, "Frankfurt" "instance of" "big
city", big city is a
city that is over a certain size. The size is debatable and consequently it
is really poor as a concept. from a Wikidata point of view. It can be
inferred and therefore even redundant. Does it matter? Not really because
in time "we" will see the light.
Our data is incomplete. Arguably importing data enables us to share more of
the sum of all knowledge to our users. A given percentage of all data is
incorrect. However having no data is arguably 100% incorrect and 100% not
in line with our goal of serving the sum of all knowledge. Quality is
important so processes and workflows are exceedingly important to have. We
lack in that department so far. But comparing external data sources like
VIAF or DNB in an iterative way is obvious when you want to identify those
items and statements that are suspect. The data in Wikidata makes it easy
because we have spend considerable effort linking external sources first to
Wikipedia and now to Wikidata. It is easy to mark items with issues using
qualifiers on the external source ID and have a basis for such workflows
and quality markers.
When you make a point of external sources trusting Wikidata, these external
sources may be consumers or they can be partners. When they are partners,
we can provide RSS feeds informing of issues that have been found and they
can do their curation on their data. When they are consumers we can still
provide such an RSS but we do not know what they do with it, it is their
problem more than it is ours.
As I say so often, Wikidata is immature. It is silly to blindly trust
Wikidata. It is largely based on Wikipedia and it has constructs of its own
that we do not need/want in Wikidata. Big cities is one example. We have
items because of interwiki links that are a mix of all kinds eg a listed
building and an organisation. This is conceptually wrong at Wikidata and it
needs to be split. This is where many Wikipedians become uncomfortable but
hey, Wikidata does not tell them to rewrite their article.
So yes you can continue with this point but it has little impact on
Wikidata and when you think it should do consider what impact it has on
Wikidata as a wiki. It is NOT an academic resource or a reference source
perse. It is a wiki, it is allowed to be wrong particularly when it has
proper workflows to improve quality.
If anything THIS is where we can do with a lot more talk and preferably
action. This is where Wikidata is obviously lacking and when we do have
proper workflows in place, we do NOT need the dump that is the "primary
sources" as this is the antithesis of a wiki and it prevents us from
sharing available knowledge.
Thanks,
GerardM
On 28 November 2015 at 07:05, Wil Sinclair <wllm(a)wllm.com> wrote:
Gergo, do you mind if people continue discussing this?
I'm finding it
very interesting and fruitful. I hadn't thought through these issues
before, and there are likely to be others on this list who haven't
either.
Best!
,Wil
On Fri, Nov 27, 2015 at 5:17 PM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:
On Fri, Nov 27, 2015 at 11:14 AM, Lila Tretikov
<lila(a)wikimedia.org>
wrote:
> What I hear in email from Andreas and Liam is not as much the
propagation
> of the error (which I am sure happens with
some % of the cases), but the
> fact that the original source is obscured and therefore it is hard to
> identify and correct errors, biases, etc. Because if the source of
error is
> obscured, that error is that much harder to
find and to correct. In
fact,
> we see this even on Wikipedia articles today
(wrong dates of births
sourced
> from publications that don't do enough
fact checking is something I came
> across personally). It is a powerful and important principle on
Wikipedia,
> but with content re-use it gets lost. Public
domain/CC0 in combination
with
> AI lands our content for slicing and dicing
and re-arranging by others,
> making it something entirely new, but also detached from our process of
> validation and verification. I am curious to hear if people think it is
a
problem.
It definitely worries me.
This conversation seems to have morphed into trying to solve some
problems
that we are speculating Google might have (no one
here actually *knows*
how
the Knowledge Graph works, of course; maybe
it's sensitive to
manipulation
of Wikidata claims, maybe not). That seems like
an entirely fruitless
line
of discourse to me; if the problem exists, it is
Google's problem to
solve
(since they are the ones in a position to tell if
it's a real problem or
not; not to mention they have two or three magnitudes more resources to
throw at it than the Wikimedia movement would). Trying to make our
content
less free for fear that someone might misuse it
is a shamefully wrong
frame
of mind for and organization that's supposed
to be a leader of the open
content movement, IMO.
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>