Gerard,
Thanks for highlighting my work! I already posted slides on Commons, but I
want to flesh them out with links to actual edits so people can better
understand some of these quality improvement workflows. The tools I use for
lists are written mostly by the Wikidata "god" Magnus Manske and the tools
I use on Commons are self-built kludges with the assistance of Commonist
Vera de Kok. Here is an example of a quality improvement I did this morning
for a file on Commons that was originally uploaded by an English Wikipedian
who uploaded it with the default uploader for use in an English Wikipedia
list. The improvements are coming from both the original edits of the
uploader on Wikipedia as well as the associated Wikidata list:
https://commons.wikimedia.org/w/index.php?title=File:Rembrandt_Man_with_a_F…
Jane
On Sun, Nov 29, 2015 at 10:42 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com
wrote:
> Hoi,
> Wikidata is a wiki and, you seem to always forget that.
>
> The corruption of data .. how? Each statement is its own data item how do
> you corrupt that? As I say so often, when you get a collection that is 80%
> correct you have an error rate of 20%. When you do not include that data
> you have an error rate of 100%. When you have an other source that is 90%
> correct that has similar data and you have an overlap of 50%, you can be
> smart and at the start or later compare the data and curate.. When you only
> import at the start what is the same, you probably get something like 84%
> correct data imported. You can gamify the rest but however you slice it,
> what you do not have and could have is 100% wrong.
>
> Wikidata is NOT Wikipedia. It is much easier to curate data and
> consequently your argument is FUD. The big thing we have not learned is
> cooperation. We do not cooperate. We do not have per standard RSS feeds for
> the changes to the items that belong to a specific source. We are happy to
> get data but we do not reach out and give back. For me the fact that VIAF
> uses Wikidata as a link is an opportunity to do better. The German DNB
> cooperation are the projects that we should emulate.
>
> When you talk about quality, you talk in an insular fashion. We have to do
> it, our community. At Wikidata our community can include other
> organisations with rich collections of data with high quality. We can
> share, compare, curate. Even with our current low quality, we have subsets
> of data that shine. Subsets of data that our of at least the same quality
> as Wikipedia. However this quality is often marred with a lack of quantity,
> quantity we can have when collaboration is what we do.
>
> You are afraid of our reputation. Reputation has many aspects. Jane023
> presented at the Dutch Wikimedia conference. She uses a tool that is easier
> on her because no Wikipedians bother her because it is a Wikidata based
> list. A similar list is now used for its quality on the Welsh Wikipedia.
> The data is of a quality that Google actually uses it as she reported.
>
> When I see the religious application of Wikipedia sentiments. I find that
> we do not even care for the life of one of our own. Bassel is executed or
> likely to be executed soon and some think our neutrality is so important.
> FOR WHAT? So that we may not even protect our own? Is it right to protest
> against TTIP (and we should) and not protest for a Wikipedian that embodies
> our values?
>
> Wikipedia think is not applicable at this stage for Wikidata. Its quality
> is arguably piss poor but better in places. Many items are corrupt because
> they follow the structure of Wikipedia articles. A structure Wikipedians
> insist on because they wrote that article and "Wikidata is only a service
> project".
>
> I do agree that we need more quality. My approach has set theory on its
> side, it embodies the wiki approach and yours is one where Pallas Athena is
> to rise from the brain of Zeus in full armour. You may have noticed that my
> arguments are easy to follow and conform to something that is measurable.
> Yours is private, there is no possibility to verify the accuracy of your
> argument. I call bullshit on your argument, not because you do not make a
> fine argument but because it is an argument that prevents us from improving
> Wikidata.
>
> My hope is that we can work constructively on our quality and have a
> measurable effect.
> Thanks,
> GerardM
>
> On 29 November 2015 at 02:05, Gnangarra <gnangarra(a)gmail.com
wrote:
>
> > >
> > > While I happily agree that Sources are good, I will not ask people to
> > start
> > > adding Sources at this point of time it will not improve quality
> > > signifcantly. It makes more sense once we are at a stage where multiple
> > > sources disagree on values for statements. Adding sources is
> signifcantly
> > > more meaningful and useful once we start curating data.
> >
> >
> > the problems will that by the time Wikidata starts to curate data it'll
> > will have corrupted that data with its own data, and secondly past
> > experience with wiki's is that fixing data after its been entered is
> > actually harder and more time consuming to do, along with the fact that
> the
> > damage to reputation will have a lasting impact and fixing that consumes
> > millions of dollars in Donner money.. As said earlier there are lesson in
> > the development of Wikipedia that should be heeded in an attempt to avoid
> > those same pitfalls
> >
> >
> > On 29 November 2015 at 08:37, Gerard Meijssen <gerard.meijssen(a)gmail.com
> >
>
wrote:
> >
> > > Hoi,
> > > It was from the Myanmar WIkipedia that a lot of data was imported to
> > > Wikidata. Data that did not exist elsewhere. I do not care really what
> > > "Freedom House" says. I do not know them, I do know that the data
is
> > > relevant and useful It was even the subject on a blogpost..
> > >
> > > You may ignore data that is not from a source that you like. This
> > > indiscriminate POV is not a NPOV.
> > >
> > > As to Grasulf, you failed to get the point. It was NOT about the data
> > > itself but about the presentation. I worked on this item because a
> > > duplicate was created with even less data.
> > >
> > > While I happily agree that Sources are good, I will not ask people to
> > start
> > > adding Sources at this point of time it will not improve quality
> > > signifcantly. It makes more sense once we are at a stage where multiple
> > > sources disagree on values for statements. Adding sources is
> signifcantly
> > > more meaningful and useful once we start curating data. Statistically
> > most
> > > errors will be found where sources disagree.
> > >
> > > When people add conflicting data, it is indeed really relevant to add
> > > Sources. My practice for adding data is that I will only add data that
> > > fulfils some minimal criteria. Typically I am not interested in adding
> > data
> > > that already exists. I will remove less precise for more precise data.
> > >
> > > The biggest issue with data is that we do not have enough of it and the
> > > second most relevant issue is that we need processes to compare sources
> > > with Wikidata and have a workflow to curate differences.
> > > Thanks,
> > > GerardM
> > >
> > > On 28 November 2015 at 19:23, Andreas Kolbe <jayen466(a)gmail.com>
wrote:
> > >
> > > > Gerard,
> > > >
> > > >
> > > > On Fri, Nov 27, 2015, Gerard Meijssen
<gerard.meijssen(a)gmail.com>
>
wrote:
> > > >
> > > > When you compare the quality of Wikipedias with what en.wp used to be
> > you
> > > > > are comparing apples and oranges. The Myanmar Wikipedia is
better
> > > > informed
> > > > > on Myanmar than en.wp etc.
> > > > >
> > > >
> > > >
> > > > Is it? The entire Burmese Wikipedia contains a mere 31,646 content
> > pages
> > > at
> > > > the time of writing, covering (or trying to cover) all countries of
> the
> > > > world, and all aspects of human knowledge.[1]
> > > >
> > > > The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713
> pages
> > > > within its purview.[2] I dare say that's more articles on Myanmar
> than
> > > the
> > > > Burmese Wikipedia contains. As an indication, the English
Wikipedia's
> > > > article on Myanmar is more than twice as long as the one in the
> Burmese
> > > > Wikipedia.
> > > >
> > > > Moreover, according to Freedom House[3], the internet in Myanmar is
> not
> > > > free:
> > > >
> > > > "The government detained and charged internet users for online
> > activities
> > > > [...] Government officials pressured social media users not to
> > distribute
> > > > or share content that offends the military, or disturbs the functions
> > of
> > > > government."
> > > >
> > > >
> > > >
> > > > > When you qualify a Wikipedia as fascist, it does not follow that
> the
> > > data
> > > > > is suspect. Certainly when data in a source that you so easily
> > dismiss
> > > is
> > > > > typically the same, there is not much meaning in what you say
from
> a
> > > > > Wikidata point of view.
> > > > >
> > > >
> > > >
> > > > Data are always generated within a social context, and data generated
> > by
> > > > political extremists or people living under oppressive regimes are
> > > suspect
> > > > whenever they have political implications. (Looking at the
> descriptions
> > > of
> > > > Burmese politics, my feeling is the Burmese Wikipedia is not under
> > > > significant government control, but largely written by ex-pats.
> > However,
> > > > the situation is quite different in some other Wikipedias serving
> > > countries
> > > > labouring under similar regimes.)
> > > >
> > > >
> > > >
> > > > > PS What does your librarian think when she knows
> > > >
> > > >
> > > >
> > > > It was a he, but I'll leave him to join in himself if he chooses
to.
> > > >
> > > >
> > > > I happen to work on Dukes of Friuli. Compare the data from Wikidata
> and
> > > the
> > > > > information by Reasonator based on the same item for one of
them.
> > > > >
> > > > >
https://tools.wmflabs.org/reasonator/?&q=2471519
> > > > >
https://www.wikidata.org/wiki/Q2471519
> > > > >
> > > >
> > > >
> > > > Let's look at this example. Reasonator says of Grasulf II of
Friulim,
> > "He
> > > > died in 653". There is no source. Wikidata says he died in 653,
and
> the
> > > > indicated source is the Italian Wikipedia.
> > > >
> > > > However, when you look at the (very brief) Italian Wikipedia
> > article[4],
> > > > you will find that the year 653 is given with a question mark. The
> > > English
> > > > Wikipedia, in contrast, states, in its similarly brief article[5],
> > > >
> > > > "Nothing more is known about Grasulf and the date of his death
is
> > > > uncertain."
> > > >
> > > > Do you now see the problem about nuance? Reasonator and Wikidata
> > > > confidently proclaim as uncontested fact something that in fact is
> > rather
> > > > uncertain.
> > > >
> > > > The sole source cited by both the English and the Italian Wikipedia
> is
> > > the
> > > > Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
> > > > rusty, but while the Historia mentions that Ago succeeded Grasulf
> upon
> > > the
> > > > latter's death, it says nothing specific about when that was. The
> > > > Historia's time indications are in general very vague, usually
> limited
> > to
> > > > the phrase "Circa haec tempora", meaning "about this
time". So it is
> in
> > > > this case.
> > > >
> > > > For reference, the Google Knowledge Graph states equally confidently
> > that
> > > > Grasulf II of Friuli died in 651AD. This may be based on the English
> > > > Wikipedia's unsourced claim (in the template at the bottom of the
> > English
> > > > Wikipedia article) that his reign ended c. 651, or on some other
> source
> > > > like Freebase.
> > > >
> > > > The other Wikipedias that have articles on Grasulf II provide the
> > > following
> > > > death dates
> > > >
> > > > Catalan: 651
> > > > Galician: 653
> > > > Lithuanian: 653
> > > > Polish: 651
> > > > Romanian: Unknown
> > > > Russian: 653
> > > > Ukrainian: 651
> > > >
> > > > As for published sources, I can offer Ersch's Allgemeine
Encyclopädie
> > > > (1849), which states on page 209 that Grasulf II died in 651.[7]
> > > >
> > > > The extreme vagueness of the available dates is pointed out by Thomas
> > > > Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895).
Hodgkin puts
> the
> > > end
> > > > of Grasulf's reign at 645, "as a mere random guess", and
adds that
> "De
> > > > Rubeis, following Sigonius", puts the accession of Ago in
661.[8]
> > > >
> > > > There may well be better and more recent sources beyond my reach, but
> > > > having these published dates in Wikidata, with the source references,
> > > would
> > > > actually make some sense. Unsourced data, not so much.
> > > >
> > > > Answers are comfortable, but they are not knowledge when they are
> > > > unverifiable and/or wrong.
> > > >
> > > >
> > > > [1]
> > >
https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles
> > > > [2]
> > > >
> > > >
> > >
> >
>
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessm…
> > > >
> > > > [3]
https://freedomhouse.org/report/freedom-net/2015/myanmar
> > > > [4]
> > > >
> > > >
> > >
> >
>
https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid…
> > > > [5]
> > > >
> > > >
> > >
> >
>
https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=6…
> > > > [6]
https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV
> > > > [7]
> > > >
> > > >
> > >
> >
>
https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasul…
> > > > [8]
> > > >
> > > >
> > >
> >
>
https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%…
> > > > _______________________________________________
> > > > Wikimedia-l mailing list, guidelines at:
> > > >
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > > > Wikimedia-l(a)lists.wikimedia.org
> > > > Unsubscribe:
>
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > >
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> > > >
> > > _______________________________________________
> > > Wikimedia-l mailing list, guidelines at:
> > >
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > > Wikimedia-l(a)lists.wikimedia.org
> > > Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> > >
> >
> >
> >
> > --
> > GN.
> > President Wikimedia Australia
> > WMAU:
http://www.wikimedia.org.au/wiki/User:Gnangarra
> > Photo Gallery:
http://gnangarra.redbubble.com
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> >
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> >
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
>
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>