Re: [Wikimedia-l] Quality issues

16 Dec 2015

Hoi,
The one thing where Wikidata shines is in connecting sources through
identifiers. It connects all Wikipedias through the interwiki links and
improving these has been an ongoing process of the last three years. Every
week more external identifiers are added and it is in the mix-n-match tool
by Magnus that many of these connections are made.

As more sources are added, the opportunity grows to compare and curate. The
law on copyright hold that you cannot use complete databases but it allows
you to compare and curate. When values match, there is no obvious issue.
When they do not, it is a matter of signalling the difference and
evaluating the opposing values.

What we should NOT do is accept any value as 100% correct. Sources are
known to be wrong but where everybody agrees, we can at least concentrate
on where there is a disagreement, where an investment in time makes the
most difference. In this way we do make a positive difference for our own
content and by signalling differences at the other end as well.

The problem with Andreas argument is that it does not provide any way
forward. It may be a problem and then what. By concentrating on what we do
best, sharing in the sum of all available knowledge we enable parties to
compare their content with all the other parties that have content. We
publish where we find a difference and it is then for us and others to do
the best we can.
Thanks,
      GerardM

On 16 December 2015 at 12:12, Andrea Zanni &lt;zanni.andrea84(a)gmail.com&gt; wrote:

...
  On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell
&lt;jane023(a)gmail.com&gt; wrote:

  Andrea,
 I totally agree on the mission/vision thing, but am not sure what you  mean
  exactly by scale - do you mean that Wikidata
shouldn't try to be so
 granular that it has a statement to cover each factoid in any Wikipedia
 article, or do you mean we need to talk about what constitutes notability
 in order not to grow Wikidata exponentially to the point the servers  crash?
  Jane

  Hi Jane, I explained myself poorly (sometime English is too difficult :-)

 What I mean is that the scale of the error *could* be of another scale,
 another order of magnitude.
 The propagation of the error is multiplied, it's not just a single error on
 a wikipage: it's an error propagated in many wikipages, and then Google,
 etc.
 A single point of failure.

 Of course, the opposite is also true: it's a single point of openness,
 correction, information.
 I was just wondering if this different scale is a factor in making
 Wikipedia and Wikidata different enough to accept/reject Andreas arguments.

 Andrea

  On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni
&lt;zanni.andrea84(a)gmail.com&gt;
 wrote:

 > I really feel we are drowning in a glass of water.
 > The issue of "data quality" or "reliability" that Andreas raises
is  well
   known:
 what I don't understand if the "scale" of it is much bigger on Wikidata
 than Wikipedia,
 and if this different scale makes it much more important. The scale of  the
  issue is maybe something worth discussing, and
not the issue itself? Is  the
  fact that Wikidata is centralised different from
statements on  Wikipedia? I
  don't know, but to me this is a more neutral
and interesting question.

 I often say that the Wikimedia world made quality an "heisemberghian"
 feature: you always have to check if it's there.
 The point is: it's been always like this.
 We always had to check for quality, even when we used Britannica or
 authority controls or whatever "reliable" sources we wanted. Wikipedia, 
and
 > now Wikidata, is made for everyone to contribute, it's open and honest 
in
  > being open, vulnerable, prone to errors. But
we are transparent, we say
 > that in advance,  we can claim any statement to the smallest detail. Of
 > course it's difficult, but we can do it. Wikidata, as Lydia said, can
 > actually have conflicting statements in every item: we "just" have to
 put
  > them there, as we did to Wikipedia.
 >
 > If Google uses our data and they are wrong, that's bad for them. If  they
  > correct the errors and do not give us the
corrections, that's bad for  us
  > and not ethical from them. The point is:
there is no license (for what  I
   know)
that can force them to contribute to Wikidata. That is, IMHO, the
 problem with "over-the-top" actors: they can harness collective 
intelligent
 > and "not give back." Even with CC-BY-SA, they could store (as they are
 > probably already doing) all the data in their knowledge vault, which is
 > secret as it is an incredible asset for them.
 >
 > I'd be happy to insert a new clause of "forced transparency" in 
CC-BY-SA
  or
  CC0, but it's not there.

 So, as we are  working via GLAMs with Wikipedia for getting reliable
 sources and content, we are working with them also for good statements  and
  data. Putting good data in Wikidata makes it
better, and I don't  understand
 > what is the problem here (I understand, again, the issue of putting too
 > much data and still having a small community).
 > For example: if we are importing different reliable databases, andthe
 > institutions behind them find it useful and helpful to have an  aggregator
  > of identifiers and authority controls, what
is the issue? There is  value
  in
  aggregating data, because you can spot errors and
inconsistencies. It's  not
  easy, of course, to find a good workflow, but,
again, that is *another*
 problem.

 So, in conclusion: I find many issues in Wikidata, but not on the
 mission/vision, just in the complexity of the project, the size of the
 dataset, the size of the community.

 Can we talk about those?

 Aubrey

 On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe &lt;jayen466(a)gmail.com&gt;  wrote:
 >
 > > On Sun, Dec 13, 2015 at 5:32 PM, geni &lt;geniice(a)gmail.com&gt; wrote:
 > >
 > > > On 13 December 2015 at 15:57, Andreas Kolbe &lt;jayen466(a)gmail.com&gt;
 > wrote:
 > > >
 > > > > Jane,
 > > > >
 > > > > The issue is that you can't cite one Wikipedia article as a
 source
  in
 > > > > another.
 > > > >
 > > >
 > > >
 > > > However you can within the same article per [[WP:LEAD]].
 > > >
 > >
 > >
 > > Well, of course, if there are reliable sources cited in the body of  the
  > > article that back up the statements
made in the lead. You still need  to
  > > cite a reliable source though;
that's Wikipedia 101.
 > > _______________________________________________
 > > Wikimedia-l mailing list, guidelines at:
 > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 > > Wikimedia-l(a)lists.wikimedia.org
 > > Unsubscribe: 
 https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
  _______________________________________________
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> 
_______________________________________________
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
  _______________________________________________
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Quality issues