Gerard, Thanks for highlighting my work! I already posted slides on Commons, but I want to flesh them out with links to actual edits so people can better understand some of these quality improvement workflows. The tools I use for lists are written mostly by the Wikidata "god" Magnus Manske and the tools I use on Commons are self-built kludges with the assistance of Commonist Vera de Kok. Here is an example of a quality improvement I did this morning for a file on Commons that was originally uploaded by an English Wikipedian who uploaded it with the default uploader for use in an English Wikipedia list. The improvements are coming from both the original edits of the uploader on Wikipedia as well as the associated Wikidata list: https://commons.wikimedia.org/w/index.php?title=File:Rembrandt_Man_with_a_Fa...
Jane
On Sun, Nov 29, 2015 at 10:42 AM, Gerard Meijssen <gerard.meijssen@gmail.com
wrote:
Hoi, Wikidata is a wiki and, you seem to always forget that.
The corruption of data .. how? Each statement is its own data item how do you corrupt that? As I say so often, when you get a collection that is 80% correct you have an error rate of 20%. When you do not include that data you have an error rate of 100%. When you have an other source that is 90% correct that has similar data and you have an overlap of 50%, you can be smart and at the start or later compare the data and curate.. When you only import at the start what is the same, you probably get something like 84% correct data imported. You can gamify the rest but however you slice it, what you do not have and could have is 100% wrong.
Wikidata is NOT Wikipedia. It is much easier to curate data and consequently your argument is FUD. The big thing we have not learned is cooperation. We do not cooperate. We do not have per standard RSS feeds for the changes to the items that belong to a specific source. We are happy to get data but we do not reach out and give back. For me the fact that VIAF uses Wikidata as a link is an opportunity to do better. The German DNB cooperation are the projects that we should emulate.
When you talk about quality, you talk in an insular fashion. We have to do it, our community. At Wikidata our community can include other organisations with rich collections of data with high quality. We can share, compare, curate. Even with our current low quality, we have subsets of data that shine. Subsets of data that our of at least the same quality as Wikipedia. However this quality is often marred with a lack of quantity, quantity we can have when collaboration is what we do.
You are afraid of our reputation. Reputation has many aspects. Jane023 presented at the Dutch Wikimedia conference. She uses a tool that is easier on her because no Wikipedians bother her because it is a Wikidata based list. A similar list is now used for its quality on the Welsh Wikipedia. The data is of a quality that Google actually uses it as she reported.
When I see the religious application of Wikipedia sentiments. I find that we do not even care for the life of one of our own. Bassel is executed or likely to be executed soon and some think our neutrality is so important. FOR WHAT? So that we may not even protect our own? Is it right to protest against TTIP (and we should) and not protest for a Wikipedian that embodies our values?
Wikipedia think is not applicable at this stage for Wikidata. Its quality is arguably piss poor but better in places. Many items are corrupt because they follow the structure of Wikipedia articles. A structure Wikipedians insist on because they wrote that article and "Wikidata is only a service project".
I do agree that we need more quality. My approach has set theory on its side, it embodies the wiki approach and yours is one where Pallas Athena is to rise from the brain of Zeus in full armour. You may have noticed that my arguments are easy to follow and conform to something that is measurable. Yours is private, there is no possibility to verify the accuracy of your argument. I call bullshit on your argument, not because you do not make a fine argument but because it is an argument that prevents us from improving Wikidata.
My hope is that we can work constructively on our quality and have a measurable effect. Thanks, GerardM
On 29 November 2015 at 02:05, Gnangarra gnangarra@gmail.com wrote:
While I happily agree that Sources are good, I will not ask people to
start
adding Sources at this point of time it will not improve quality signifcantly. It makes more sense once we are at a stage where multiple sources disagree on values for statements. Adding sources is
signifcantly
more meaningful and useful once we start curating data.
the problems will that by the time Wikidata starts to curate data it'll will have corrupted that data with its own data, and secondly past experience with wiki's is that fixing data after its been entered is actually harder and more time consuming to do, along with the fact that
the
damage to reputation will have a lasting impact and fixing that consumes millions of dollars in Donner money.. As said earlier there are lesson in the development of Wikipedia that should be heeded in an attempt to avoid those same pitfalls
On 29 November 2015 at 08:37, Gerard Meijssen <gerard.meijssen@gmail.com
wrote:
Hoi, It was from the Myanmar WIkipedia that a lot of data was imported to Wikidata. Data that did not exist elsewhere. I do not care really what "Freedom House" says. I do not know them, I do know that the data is relevant and useful It was even the subject on a blogpost..
You may ignore data that is not from a source that you like. This indiscriminate POV is not a NPOV.
As to Grasulf, you failed to get the point. It was NOT about the data itself but about the presentation. I worked on this item because a duplicate was created with even less data.
While I happily agree that Sources are good, I will not ask people to
start
adding Sources at this point of time it will not improve quality signifcantly. It makes more sense once we are at a stage where multiple sources disagree on values for statements. Adding sources is
signifcantly
more meaningful and useful once we start curating data. Statistically
most
errors will be found where sources disagree.
When people add conflicting data, it is indeed really relevant to add Sources. My practice for adding data is that I will only add data that fulfils some minimal criteria. Typically I am not interested in adding
data
that already exists. I will remove less precise for more precise data.
The biggest issue with data is that we do not have enough of it and the second most relevant issue is that we need processes to compare sources with Wikidata and have a workflow to curate differences. Thanks, GerardM
On 28 November 2015 at 19:23, Andreas Kolbe jayen466@gmail.com
wrote:
Gerard,
On Fri, Nov 27, 2015, Gerard Meijssen gerard.meijssen@gmail.com
wrote:
When you compare the quality of Wikipedias with what en.wp used to be
you
are comparing apples and oranges. The Myanmar Wikipedia is better
informed
on Myanmar than en.wp etc.
Is it? The entire Burmese Wikipedia contains a mere 31,646 content
pages
at
the time of writing, covering (or trying to cover) all countries of
the
world, and all aspects of human knowledge.[1]
The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713
pages
within its purview.[2] I dare say that's more articles on Myanmar
than
the
Burmese Wikipedia contains. As an indication, the English Wikipedia's article on Myanmar is more than twice as long as the one in the
Burmese
Wikipedia.
Moreover, according to Freedom House[3], the internet in Myanmar is
not
free:
"The government detained and charged internet users for online
activities
[...] Government officials pressured social media users not to
distribute
or share content that offends the military, or disturbs the functions
of
government."
When you qualify a Wikipedia as fascist, it does not follow that
the
data
is suspect. Certainly when data in a source that you so easily
dismiss
is
typically the same, there is not much meaning in what you say from
a
Wikidata point of view.
Data are always generated within a social context, and data generated
by
political extremists or people living under oppressive regimes are
suspect
whenever they have political implications. (Looking at the
descriptions
of
Burmese politics, my feeling is the Burmese Wikipedia is not under significant government control, but largely written by ex-pats.
However,
the situation is quite different in some other Wikipedias serving
countries
labouring under similar regimes.)
PS What does your librarian think when she knows
It was a he, but I'll leave him to join in himself if he chooses to.
I happen to work on Dukes of Friuli. Compare the data from Wikidata
and
the
information by Reasonator based on the same item for one of them.
https://tools.wmflabs.org/reasonator/?&q=2471519 https://www.wikidata.org/wiki/Q2471519
Let's look at this example. Reasonator says of Grasulf II of Friulim,
"He
died in 653". There is no source. Wikidata says he died in 653, and
the
indicated source is the Italian Wikipedia.
However, when you look at the (very brief) Italian Wikipedia
article[4],
you will find that the year 653 is given with a question mark. The
English
Wikipedia, in contrast, states, in its similarly brief article[5],
"Nothing more is known about Grasulf and the date of his death is uncertain."
Do you now see the problem about nuance? Reasonator and Wikidata confidently proclaim as uncontested fact something that in fact is
rather
uncertain.
The sole source cited by both the English and the Italian Wikipedia
is
the
Historia Langobardorum, available in Wikisource.[6] My Latin is a bit rusty, but while the Historia mentions that Ago succeeded Grasulf
upon
the
latter's death, it says nothing specific about when that was. The Historia's time indications are in general very vague, usually
limited
to
the phrase "Circa haec tempora", meaning "about this time". So it is
in
this case.
For reference, the Google Knowledge Graph states equally confidently
that
Grasulf II of Friuli died in 651AD. This may be based on the English Wikipedia's unsourced claim (in the template at the bottom of the
English
Wikipedia article) that his reign ended c. 651, or on some other
source
like Freebase.
The other Wikipedias that have articles on Grasulf II provide the
following
death dates
Catalan: 651 Galician: 653 Lithuanian: 653 Polish: 651 Romanian: Unknown Russian: 653 Ukrainian: 651
As for published sources, I can offer Ersch's Allgemeine Encyclopädie (1849), which states on page 209 that Grasulf II died in 651.[7]
The extreme vagueness of the available dates is pointed out by Thomas Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts
the
end
of Grasulf's reign at 645, "as a mere random guess", and adds that
"De
Rubeis, following Sigonius", puts the accession of Ago in 661.[8]
There may well be better and more recent sources beyond my reach, but having these published dates in Wikidata, with the source references,
would
actually make some sense. Unsourced data, not so much.
Answers are comfortable, but they are not knowledge when they are unverifiable and/or wrong.
[1]
https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles
[2]
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessme...
[3] https://freedomhouse.org/report/freedom-net/2015/myanmar [4]
https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid=...
[5]
https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=63...
[6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV [7]
https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasulf...
[8]
https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%7...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe