Quality issues

List overview All Threads
Download

newer

older

Wikimedia Argentina Memorial 2015

Re: [Wikimedia-l] Congratulations...

Gerard Meijssen

20 Nov 2015 20 Nov '15

7:18 a.m.

Hoi, At Wikidata we often find issues with data imported from a Wikipedia. Lists have been produced with these issues on the Wikipedia involved and arguably they do present issues with the quality of Wikipedia or Wikidata for that matter. So far hardly anything resulted from such outreach. When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there are already those at Wikidata that argue not to bother about Wikipedia quality because in their view, Wikipedians do not care about its own quality. Arguably known issues with quality are the easiest to solve. There are many ways to approach this subject. It is indeed a quality issue both for Wikidata and Wikipedia. It can be seen as a research issue; how to deal with quality and how do such mechanisms function if at all. I blogged about it.. Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.ht…

Show replies by date

Jane Darnell

20 Nov 20 Nov

8:53 a.m.

Gerard, I think this was always the case. Most Wikidatans are as at home on Wikipedia as they are on Commons. The issue you describe also happened to Commons - both communities feel the other is less focussed on quality. Many Commonists spend hours on high quality images and these are rarely picked up by Wikipedia unless a Commonist notices and does so in their own language. There is no requirement for Wikipedians to get to know any other project and this is normal wiki behavior. We don't want anyone to feel pressured to do anything they feel uncomfortable doing. It's already difficult to get Wikipedians to do small tasks like add catagories to their articles. The list of things necessary to create an acceptable article on Wikipedia just seems to get longer and longer, while the associated work for illustrations of that article or for data of that article is not even mentioned in current AfC policies on Wikipedia. I have thought about this, but I still think we need to break down the list of things necessary to make new short articles on Wikipedia, not extend the list. So in summary, I think that what you describe is normal predictable behavior for a "Wikipedia support" project such as Commons and Wikidata. This will change as more and more external users find out that Commons and Wikidata are valuable resources in and of themselves. This is already the case for many GLAMs which have found collaborations with Commons to be valuable experiences. I have high hopes this will become the case for Wikidata as well. Jane On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

Gerard Meijssen

10:20 p.m.

Hoi, The difference between the use of quality images from Commons and establishing what is correct is quite distinct. With Commons it is an esthetic difference, with these lists it is about the credibility of the data involved. Thanks, GerardM On 20 November 2015 at 09:53, Jane Darnell <jane023(a)gmail.com> wrote:

...

Hoi, At Wikidata we often find issues with data imported from a Wikipedia.

Lists

have been produced with these issues on the Wikipedia involved and

arguably

they do present issues with the quality of Wikipedia or Wikidata for that matter. So far hardly anything resulted from such outreach. When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there

are

already those at Wikidata that argue not to bother about Wikipedia

quality

because in their view, Wikipedians do not care about its own quality. Arguably known issues with quality are the easiest to solve. There are many ways to approach this subject. It is indeed a quality

issue

both for Wikidata and Wikipedia. It can be seen as a research issue; how

deal with quality and how do such mechanisms function if at all. I blogged about it.. Thanks, GerardM

http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.ht…

Peter Southwood

11:33 a.m.

Gerard, Who were you expecting would respond from the Wikipedias? Cheers, Peter -----Original Message----- From: Wikimedia-l [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Gerard Meijssen Sent: Friday, 20 November 2015 9:18 AM To: Wikimedia Mailing List; Research into Wikimedia content and communities; WikiData-l Subject: [Wikimedia-l] Quality issues Hoi, At Wikidata we often find issues with data imported from a Wikipedia. Lists have been produced with these issues on the Wikipedia involved and arguably they do present issues with the quality of Wikipedia or Wikidata for that matter. So far hardly anything resulted from such outreach. When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there are already those at Wikidata that argue not to bother about Wikipedia quality because in their view, Wikipedians do not care about its own quality. Arguably known issues with quality are the easiest to solve. There are many ways to approach this subject. It is indeed a quality issue both for Wikidata and Wikipedia. It can be seen as a research issue; how to deal with quality and how do such mechanisms function if at all. I blogged about it.. Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.ht… _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date: 11/20/15

Gerard Meijssen

10:22 p.m.

Hoi, So far such lists have been produced for bigger Wikipedias but essentially it is potentially an issue for any and all Wikis that have data that may exist on Wikidata or linked through Wikidata on external sources. Thanks, GerardM On 20 November 2015 at 12:33, Peter Southwood <peter.southwood(a)telkomsa.net> wrote:

...

Peter Southwood

21 Nov 21 Nov

6:11 a.m.

How are you notifying the Wikipedias/Wikipedians? Do you leave a message on the talk page of the relevant article? Cheers, Peter -----Original Message----- From: Wikimedia-l [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Gerard Meijssen Sent: Saturday, 21 November 2015 12:23 AM To: Wikimedia Mailing List Subject: Re: [Wikimedia-l] Quality issues Hoi, So far such lists have been produced for bigger Wikipedias but essentially it is potentially an issue for any and all Wikis that have data that may exist on Wikidata or linked through Wikidata on external sources. Thanks, GerardM On 20 November 2015 at 12:33, Peter Southwood <peter.southwood(a)telkomsa.net> wrote:

...

Gerard, Who were you expecting would respond from the Wikipedias? Cheers, Peter -----Original Message----- From: Wikimedia-l [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Gerard Meijssen Sent: Friday, 20 November 2015 9:18 AM To: Wikimedia Mailing List; Research into Wikimedia content and communities; WikiData-l Subject: [Wikimedia-l] Quality issues Hoi, At Wikidata we often find issues with data imported from a Wikipedia. Lists have been produced with these issues on the Wikipedia involved and arguably they do present issues with the quality of Wikipedia or Wikidata for that matter. So far hardly anything resulted from such outreach. When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there are already those at Wikidata that argue not to bother about Wikipedia quality because in their view, Wikipedians do not care about its own quality. Arguably known issues with quality are the easiest to solve. There are many ways to approach this subject. It is indeed a quality issue both for Wikidata and Wikipedia. It can be seen as a research issue; how to deal with quality and how do such mechanisms function if at all. I blogged about it.. Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikiped ia.html _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date: 11/20/15 _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gerard Meijssen

7:57 a.m.

Hoi, That is indeed a problem. So far it has been lists, often well formatted lists that do not have a workflow, are not updated regularly. I have added these issues as a wishlist item to work on. [1] You have to appreciate that when a list of problematic issues is listed with over 100 items, it is no longer easy or obvious that you want to add and follow 100 talk pages.This is one of the big differences between Wikipedia think and Wikidata think. I care about a lot of data, data that is linked. Analogous to the "Kevin Bacon steps of separation" I want all items easily and obviously connected. <grin> That is another quality goal for Wikidata </grin>. Given the state of Wikipedia, most articles have an article, easy and obvious tasks like fact checking and adding sources is exactly what we are looking for for maintaining our community. Add relevance to the cocktail, we know that these facts are likely to have issues, and you appreciate why this may help us with our quality and with our community issues. Thanks, GerardM [1] https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey#Visibility_f… On 21 November 2015 at 07:11, Peter Southwood <peter.southwood(a)telkomsa.net> wrote:

...

outreach.

When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there are already those at Wikidata that argue not to bother about Wikipedia quality because in their view, Wikipedians do not care

about its own quality.

Arguably known issues with quality are the easiest to solve. There are many ways to approach this subject. It is indeed a quality issue both for Wikidata and Wikipedia. It can be seen as a research issue; how to deal with quality and how do such mechanisms function if

at all.

I blogged about it.. Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikiped ia.html _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date: 11/20/15 _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

_______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7227 / Virus Database: 4460/11036 - Release Date: 11/20/15 _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Peter Southwood

8:52 a.m.

The problem may simply be that the information is not coming to the attention of the people who care, as they don't know that it exists or where to find it. The normal place to put information relating to improvement of an article is on the article talk page, and that is where Wikipedians will expect to find it. Cheers, Peter -----Original Message----- From: Wikimedia-l [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Gerard Meijssen Sent: Saturday, 21 November 2015 9:57 AM To: Wikimedia Mailing List Subject: Re: [Wikimedia-l] Quality issues Hoi, That is indeed a problem. So far it has been lists, often well formatted lists that do not have a workflow, are not updated regularly. I have added these issues as a wishlist item to work on. [1] You have to appreciate that when a list of problematic issues is listed with over 100 items, it is no longer easy or obvious that you want to add and follow 100 talk pages.This is one of the big differences between Wikipedia think and Wikidata think. I care about a lot of data, data that is linked. Analogous to the "Kevin Bacon steps of separation" I want all items easily and obviously connected. <grin> That is another quality goal for Wikidata </grin>. Given the state of Wikipedia, most articles have an article, easy and obvious tasks like fact checking and adding sources is exactly what we are looking for for maintaining our community. Add relevance to the cocktail, we know that these facts are likely to have issues, and you appreciate why this may help us with our quality and with our community issues. Thanks, GerardM [1] https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey#Visibility_f… On 21 November 2015 at 07:11, Peter Southwood <peter.southwood(a)telkomsa.net> wrote:

...

outreach.

about its own quality.

at all.

I blogged about it.. Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikip ed ia.html _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date: 11/20/15 _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

WereSpielChequers

20 Nov 20 Nov

1:30 p.m.

New subject: [Wiki-research-l] Quality issues

My experience is that pretty much all Wikimedians care about quality, though some have different, even diametrically opposed views as to what quality means and which things are cosmetic or crucial. My experience of the sadly dormant death anomaly project <https://meta.wikimedia.org/wiki/Death_anomalies_table> was that people react positively to being told "here is a list of anomalies on your language wikipedia" especially if those anomalies are relatively serious. My experience of edits on many different languages is that wikipedians appreciate someone who improves articles, even if you don't speak their language. Dismissing any of our thousand wikis as a "black box" is I think less helpful. One of the great opportunities of Wikidata is to do the sort of data driven anomaly finding that we pioneered with the death anomalies report. But we always need to remember that there are cultural difference between wikis, and not just in such things as the age at which we assume people are dead. Diplomacy is a useful skill in cross wiki work. ~~~~ On 20 November 2015 at 07:18, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

Gerard Meijssen

10:25 p.m.

New subject: [Wiki-research-l] Quality issues

Hoi, I have been working on Wikidata for almost two years on recent deaths. It is one easy and obvious thing to signal recent deaths to all the WIkipedias that have articles.It is quite similar to what you describe. It is dead easy to produce such lists, not only for recent deaths but also for deaths that differ from one source to the next. Thanks, GerardM On 20 November 2015 at 14:30, WereSpielChequers <werespielchequers(a)gmail.com

...

wrote:

> My experience is that pretty much all Wikimedians care about quality, > though some have different, even diametrically opposed views as to what > quality means and which things are cosmetic or crucial. > > My experience of the sadly dormant death anomaly project > <https://meta.wikimedia.org/wiki/Death_anomalies_table> was that people > react positively to being told "here is a list of anomalies on your > language wikipedia" especially if those anomalies are relatively serious. > My experience of edits on many different languages is that wikipedians > appreciate someone who improves articles, even if you don't speak their > language. Dismissing any of our thousand wikis as a "black box" is I think > less helpful. > > One of the great opportunities of Wikidata is to do the sort of data driven > anomaly finding that we pioneered with the death anomalies report. But we > always need to remember that there are cultural difference between wikis, > and not just in such things as the age at which we assume people are dead. > Diplomacy is a useful skill in cross wiki work. > > ~~~~ > > On 20 November 2015 at 07:18, Gerard Meijssen <gerard.meijssen(a)gmail.com>

...

wrote:

> > > Hoi, > > At Wikidata we often find issues with data imported from a Wikipedia. > > Lists have been produced with these issues on the Wikipedia involved and > > arguably they do present issues with the quality of Wikipedia or Wikidata > > for that matter. So far hardly anything resulted from such outreach. > > > > When Wikipedia is a black box, not communicating about with the outside > > world, at some stage the situation becomes toxic. At this moment there > are > > already those at Wikidata that argue not to bother about Wikipedia > quality > > because in their view, Wikipedians do not care about its own quality. > > > > Arguably known issues with quality are the easiest to solve. > > > > There are many ways to approach this subject. It is indeed a quality > issue > > both for Wikidata and Wikipedia. It can be seen as a research issue; how > to > > deal with quality and how do such mechanisms function if at all. > > > > I blogged about it.. > > Thanks, > > GerardM > > > > > > > http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.ht… > > > > _______________________________________________ > > Wiki-research-l mailing list > > Wiki-research-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> >

Craig Franklin

21 Nov 21 Nov

4:29 a.m.

New subject: [Wiki-research-l] Quality issues

Indeed, the things that make a Wikipedia article high quality (such as well written and engaging prose) are not necessarily the same things that are useful for a data-driven product like Wikidata. When Wikidata offers assistance to another project, and that assistance is not received enthusiastically because the project feels it will not improve their own quality metrics; that is not a "black box" communication problem, nor is it anyone in particular's fault, that is an issue of differing priorities. Cheers, Craig On 20 November 2015 at 23:30, WereSpielChequers <werespielchequers(a)gmail.com

...

wrote:

...

wrote:

Petr Kadlec

20 Nov 20 Nov

1:50 p.m.

On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

Richard Symonds

2:02 p.m.

Folks, regardless of which views we hold, we're all on the same side - can we try and be a little less acerbic please - it is Friday after all! Richard Symonds Wikimedia UK 0207 065 0992 Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects). *Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.* On 20 November 2015 at 13:50, Petr Kadlec <petr.kadlec(a)gmail.com> wrote:

...

On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen < gerard.meijssen(a)gmail.com> wrote:

When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there

are

already those at Wikidata that argue not to bother about Wikipedia

quality

because in their view, Wikipedians do not care about its own quality.

Right. When some users blindly dump random data to Wikidata, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there are already those at Wikipedia that argue not to bother about Wikidata quality because in their view, Wikidatans do not care about its own quality. For instance, take a look at https://www.wikidata.org/wiki/User_talk:GerardM https://www.wikidata.org/wiki/User_talk:GerardM/Archive_1 Erm -- [[cs:User:Mormegil | Petr Kadlec]] _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gerard Meijssen

10:35 p.m.

Hoi, <grin> quality is different things </grin> I do care about quality but I do not necessarily agree with you how to best achieve it. Arguably bots are better and getting data into Wikidata than people. This means that the error rate of bots is typically better than what people do. It is all in the percentages. I have always said that the best way to improve quality is by comparing sources. When Wikidata has no data, it is arguably better to import data from any source. When the quality is 90% correct, there is already 100% more data. When 100% is compared with another source and 85% is the same, you only have to check 15% and decide what is right. When you compare with two distinct sources, the percentage that differs changes again.. :) In this way it makes sense to check errors It does not help when you state that either party has people that care or do not care about quality. By providing a high likelihood that something is problematic, you will learn who actually makes a difference. It however started with having data to compare in the first place Thanks, GerardM On 20 November 2015 at 14:50, Petr Kadlec <petr.kadlec(a)gmail.com> wrote:

...

On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen < gerard.meijssen(a)gmail.com> wrote:

When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there

are

already those at Wikidata that argue not to bother about Wikipedia

quality

because in their view, Wikipedians do not care about its own quality.

Milos Rancic

10:47 p.m.

Offtopic: Gerard, during the last half an hour or so, I am just getting emails from you inside of this thread (including wiki-research list). I thought my phone has a bug. It's useful to write a larger email with addressing all the issues. Besides other things, with this frequency, you'll spend your monthly email quota for this list the day after tomorrow. On Fri, Nov 20, 2015 at 11:35 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen < gerard.meijssen(a)gmail.com> wrote:

When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there

are

already those at Wikidata that argue not to bother about Wikipedia

quality

because in their view, Wikipedians do not care about its own quality.

Fæ

21 Nov 21 Nov

12:08 p.m.

On 20 November 2015 at 22:47, Milos Rancic <millosh(a)gmail.com> wrote:

...

Jane Darnell

12:28 p.m.

Sorry to read that Fae, but in your specific case I do think your time is spent more productively on Commons, because the value of your contributions there is huge. Having created Wikidata items for many of your Commons uploads, I think it may be worthwhile at some point to try and get someone to run a Fae-Wikidata-conversion bot to try and get as much data as possible from your uploads moved over, but until then, please go ahead with whatever it is you like to do best. In my last mail I was thinking about Wikipedians, but of course the same is true for all of the sister projects. On Sat, Nov 21, 2015 at 1:08 PM, Fæ <faewik(a)gmail.com> wrote:

...

On 20 November 2015 at 22:47, Milos Rancic <millosh(a)gmail.com> wrote:

+1 I keep an open mind for supporting Wikidata in association with my Commons uploads. This thread going over a series of old gripes against other projects, with a lack of new proposals, has diminished my interest. For me, this effectively burns out the word "Wikidata" for a month. Fae _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gnangarra

20 Nov 20 Nov

11:32 p.m.

...

... *When 100% is compared with another source and 85% is the same,**you only have to check 15% and decide what is righ**t*....

this very statement highlights one issue that will always be a problem between Wikidata and Wikipedias. Wikipedia, at least in my 10 years of experience on en:wp is that when you have multiple sources that differ you highlight the existence of those sources and the conflict of information we dont decide what is right or wrong. On 21 November 2015 at 06:35, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen < gerard.meijssen(a)gmail.com> wrote:

When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there

are

already those at Wikidata that argue not to bother about Wikipedia

quality

because in their view, Wikipedians do not care about its own quality.

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Gerard Meijssen

21 Nov 21 Nov

8:12 a.m.

Hoi, You conflate two issues. First when facts differ, it should be possible to explain why they differ. Only when there is no explanation particularly when there are no sources, there is an issue. In come real sources. When someone died on 7-5-1759 and another source has a different date, it may be the difference between a Julian and a Gregorian date. When a source makes this plain, one fact has been proven to be incorrect. When the date was 1759, it is obvious that the other date is more precise.. The point is very much that Wikipedia values sources and so does Wikidata. USE THEM and find that data sources may be wrong when they are. In this way we improve quality. Many data sources have data from the same origin. It does not follow that without original sources they are all right. Quite the reverse. It does however take humans to be bold, to determine where a booboo has been made. Yes, we do decide what is right or wrong, we do this when we research an issue and that is exactly what this is about. It all starts with determining a source. In the mean time, Wikidata is negligent in stating sources. The worst example is in the "primary sources" tool. It is bad because it is brought to us as the best work flow for adding uncertain data to Wikidata. So the world is not perfect but hey it is a wiki :) Thanks, GerardM On 21 November 2015 at 00:32, Gnangarra <gnangarra(a)gmail.com> wrote:

...

... *When 100% is compared with another source and 85% is the same,**you only have to check 15% and decide what is righ**t*....

Hoi, <grin> quality is different things </grin> I do care about quality but I

not necessarily agree with you how to best achieve it. Arguably bots are better and getting data into Wikidata than people. This means that the error rate of bots is typically better than what people do. It is all in the percentages. I have always said that the best way to improve quality is by comparing sources. When Wikidata has no data, it is arguably better to import data from any source. When the quality is 90% correct, there is already 100% more data. When 100% is compared with another source and 85% is the same, you only have to check 15% and decide what is right. When you compare

with

two distinct sources, the percentage that differs changes again.. :) In this way it makes sense to check errors It does not help when you state that either party has people that care or do not care about quality. By providing a high likelihood that something

problematic, you will learn who actually makes a difference. It however started with having data to compare in the first place Thanks, GerardM On 20 November 2015 at 14:50, Petr Kadlec <petr.kadlec(a)gmail.com> wrote: > On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen < > gerard.meijssen(a)gmail.com> > wrote: > > > When Wikipedia is a black box, not communicating about with the

outside

> > world, at some stage the situation becomes toxic. At this moment

there

are

already those at Wikidata that argue not to bother about Wikipedia

quality

because in their view, Wikipedians do not care about its own quality.

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gnangarra

9:26 a.m.

...

Many data sources have data from the same origin. It does not follow that without original sources they are all right. Quite the reverse. It does however take humans to be bold, to determine where a booboo has been made. Yes, we do decide what is right or wrong,

No we dont decide what is right or wrong, en:wp has very specific core policies about this - Original research - we dont draw conclusions from available data - NPOV - *which means presenting information without editorial bias*, the moment we make that decision about whats right we exceed the boundaries of our core pillars.... dont know, uncertain or conflicting information means exactly that we dont get to choose what we think is right The data article writers work with isnt black and white and its definitely not set in stone Wikipedia content is a constant evolving collation of knowledge, we should be careful when ever we put in place a process that makes information definitive because people become reluctant to add to that and they are even less likely to challenge something that has been cast in stone already regardless of the inaccuracy of that casting . We see it within Wikipedia when articles are elevated to FA status with the number of editors who fiercely defend that current/correct version against any changes regardless of the merit in the information being added with comments like "discuss it on talk page first" "revert good faith edit" the more disjointed knowledge becomes the harder it is to keep it current, accurate the more isolated that knowledge. Then power over making changes takes precedence over productivity, accuracy and openness On 21 November 2015 at 16:12, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

> > ... > *When 100% is compared with another source and 85% is the same,**you

only

have to check 15% and decide what is righ**t*....

this very statement highlights one issue that will always be a problem between Wikidata and Wikipedias. Wikipedia, at least in my 10 years of experience on en:wp is that when you have

multiple

sources that differ you highlight the existence of those sources and the conflict of information we dont decide what is right or wrong. On 21 November 2015 at 06:35, Gerard Meijssen <gerard.meijssen(a)gmail.com wrote: > Hoi, > <grin> quality is different things </grin> I do care about quality but

do > not necessarily agree with you how to best achieve it. Arguably bots

are

> better and getting data into Wikidata than people. This means that the > error rate of bots is typically better than what people do. It is all

> the percentages. > > I have always said that the best way to improve quality is by comparing > sources. When Wikidata has no data, it is arguably better to import

data

> from any source. When the quality is 90% correct, there is already 100% > more data. When 100% is compared with another source and 85% is the

same,

you only have to check 15% and decide what is right. When you compare

with > two distinct sources, the percentage that differs changes again.. :) In > this way it makes sense to check errors > > It does not help when you state that either party has people that care

> do not care about quality. By providing a high likelihood that

something

is > problematic, you will learn who actually makes a difference. It however > started with having data to compare in the first place > Thanks, > GerardM > > On 20 November 2015 at 14:50, Petr Kadlec <petr.kadlec(a)gmail.com>

wrote:

> On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen < > gerard.meijssen(a)gmail.com> > wrote: > > > When Wikipedia is a black box, not communicating about with the

outside

> > world, at some stage the situation becomes toxic. At this moment

there > > are > > > already those at Wikidata that argue not to bother about Wikipedia > > quality > > > because in their view, Wikipedians do not care about its own

quality.

> > > > > > > Right. When some users blindly dump random data to Wikidata, not > > communicating about with the outside world, at some stage the

situation

> > becomes toxic. At this moment there are already those at Wikipedia

that

> > argue not to bother about Wikidata quality because in their view, > > Wikidatans do not care about its own quality. > > > > For instance, take a look at > > https://www.wikidata.org/wiki/User_talk:GerardM > > https://www.wikidata.org/wiki/User_talk:GerardM/Archive_1 > > > > Erm > > -- [[cs:User:Mormegil | Petr Kadlec]] > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Gerard Meijssen

11:13 a.m.

Hoi, I respect the policy of Wikipedia. However, when multiple Wikipedias differ and when there is no sourcing does this policy hold? When Wikidata has no attributable sources but multiple statements is it not conceivable that things are easy and obvious.. that they are wrong? When you talk about the FA status of articles, you are considering something totally alien to what is at stake. Typically we do not have credible sources at Wikidata and typically there is an issue with the data. When Wikidata is as mature as en.wp we will have on average 10 statements for every item. Currently half of our items have at most two statements. We do find issues in any source by comparing them. It does make sense to make this effort. It is an obvious way of improving quality in all of our projects and even beyond that. Thanks, GerardM On 21 November 2015 at 10:26, Gnangarra <gnangarra(a)gmail.com> wrote:

...

made.

Yes, we do decide what is right or wrong,

Hoi, You conflate two issues. First when facts differ, it should be possible

explain why they differ. Only when there is no explanation particularly when there are no sources, there is an issue. In come real sources. When someone died on 7-5-1759 and another source has a different date, it may

the difference between a Julian and a Gregorian date. When a source makes this plain, one fact has been proven to be incorrect. When the date was 1759, it is obvious that the other date is more precise.. The point is

very

much that Wikipedia values sources and so does Wikidata. USE THEM and

find > that data sources may be wrong when they are. In this way we improve > quality.

made.

Yes, we do decide what is right or wrong, we do this when we research an issue and that is exactly what this is about. It all starts with determining a source. In the mean time, Wikidata is negligent in stating sources. The worst example is in the "primary sources" tool. It is bad because it is brought to us as the best work flow for adding uncertain data to Wikidata. So the world is not perfect but hey it is a wiki :) Thanks, GerardM On 21 November 2015 at 00:32, Gnangarra <gnangarra(a)gmail.com> wrote:

> > ... > *When 100% is compared with another source and 85% is the same,**you

only > > have to check 15% and decide what is righ**t*.... > > > this very statement highlights one issue that > > will always be a problem between Wikidata and Wikipedias. Wikipedia,

least in my 10 years of experience on en:wp is that when you have

multiple > sources that differ you highlight the existence of those sources and

the

> conflict of information we dont decide what is right or wrong. > > On 21 November 2015 at 06:35, Gerard Meijssen <

gerard.meijssen(a)gmail.com

> > wrote: > > > Hoi, > > <grin> quality is different things </grin> I do care about quality

but

do > not necessarily agree with you how to best achieve it. Arguably bots

are > > better and getting data into Wikidata than people. This means that

the

> error rate of bots is typically better than what people do. It is all

in > > the percentages. > > > > I have always said that the best way to improve quality is by

comparing

> sources. When Wikidata has no data, it is arguably better to import

data > > from any source. When the quality is 90% correct, there is already

100%

> more data. When 100% is compared with another source and 85% is the

same, > > you only have to check 15% and decide what is right. When you compare > with > > two distinct sources, the percentage that differs changes again.. :)

> > this way it makes sense to check errors > > > > It does not help when you state that either party has people that

care

> do not care about quality. By providing a high likelihood that

something > is > > problematic, you will learn who actually makes a difference. It

however

> started with having data to compare in the first place > Thanks, > GerardM > > On 20 November 2015 at 14:50, Petr Kadlec <petr.kadlec(a)gmail.com>

wrote: > > > > > On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen < > > > gerard.meijssen(a)gmail.com> > > > wrote: > > > > > > > When Wikipedia is a black box, not communicating about with the > outside > > > > world, at some stage the situation becomes toxic. At this moment > there > > > are > > > > already those at Wikidata that argue not to bother about

Wikipedia

> > quality > > > because in their view, Wikipedians do not care about its own

quality.

> > > > > > > Right. When some users blindly dump random data to Wikidata, not > > communicating about with the outside world, at some stage the

situation

> > becomes toxic. At this moment there are already those at Wikipedia

that

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

> > > > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gnangarra

11:40 a.m.

agree getting information in is in and of itself a good starting point but ignoring the lessons learnt in other project in doing so is only creating more work for those that follow. Having less clear policy about sources and allowing unsourced information is only going to put Wikidata behind Wikipedia in quality, in doing so its not going to endear WikiData information to Wikipedians which in turn Wikipedians as they get data just arent going to go that extra step to share no matter how easy the step is to take On 21 November 2015 at 19:13, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

> > Many data sources have data from the same origin. It does not follow

that

without original sources they are all right. Quite the reverse. It does however take humans to be bold, to determine where a booboo has been

made.

Yes, we do decide what is right or wrong,

conflicting

information means exactly that we dont get to choose what we think is right The data article writers work with isnt black and white and its

definitely

not set in stone Wikipedia content is a constant evolving collation of knowledge, we should be careful when ever we put in place a process that makes information definitive because people become reluctant to add to

that

and they are even less likely to challenge something that has been cast

stone already regardless of the inaccuracy of that casting . We see it within Wikipedia when articles are elevated to FA status with the number

editors who fiercely defend that current/correct version against any changes regardless of the merit in the information being added with comments like "discuss it on talk page first" "revert good faith edit" the more disjointed knowledge becomes the harder it is to keep it

current,

accurate the more isolated that knowledge. Then power over making changes takes precedence over productivity, accuracy and openness On 21 November 2015 at 16:12, Gerard Meijssen <gerard.meijssen(a)gmail.com wrote:

Hoi, You conflate two issues. First when facts differ, it should be possible

to > explain why they differ. Only when there is no explanation particularly > when there are no sources, there is an issue. In come real sources.

When

> someone died on 7-5-1759 and another source has a different date, it

may

be > the difference between a Julian and a Gregorian date. When a source

makes > > this plain, one fact has been proven to be incorrect. When the date was > > 1759, it is obvious that the other date is more precise.. The point is > very > > much that Wikipedia values sources and so does Wikidata. USE THEM and > find > > that data sources may be wrong when they are. In this way we improve > > quality.

> > Many data sources have data from the same origin. It does not follow

that

without original sources they are all right. Quite the reverse. It does however take humans to be bold, to determine where a booboo has been

made. > Yes, we do decide what is right or wrong, we do this when we research

> issue and that is exactly what this is about. It all starts with > determining a source. > > In the mean time, Wikidata is negligent in stating sources. The worst > example is in the "primary sources" tool. It is bad because it is

brought

> to us as the best work flow for adding uncertain data to Wikidata. So

the

> world is not perfect but hey it is a wiki :) > Thanks, > GerardM > > On 21 November 2015 at 00:32, Gnangarra <gnangarra(a)gmail.com> wrote: > > > > > > > ... > > > *When 100% is compared with another source and 85% is the

same,**you

only > > have to check 15% and decide what is righ**t*.... > > > this very statement highlights one issue that > > will always be a problem between Wikidata and Wikipedias. Wikipedia,

least in my 10 years of experience on en:wp is that when you have

multiple > sources that differ you highlight the existence of those sources and

the

> conflict of information we dont decide what is right or wrong. > > On 21 November 2015 at 06:35, Gerard Meijssen <

gerard.meijssen(a)gmail.com

> > wrote: > > > Hoi, > > <grin> quality is different things </grin> I do care about quality

but > I > > do > > > not necessarily agree with you how to best achieve it. Arguably

bots

are > > better and getting data into Wikidata than people. This means that

the > > > error rate of bots is typically better than what people do. It is

all

in > > the percentages. > > > > I have always said that the best way to improve quality is by

comparing

> sources. When Wikidata has no data, it is arguably better to import

data > > from any source. When the quality is 90% correct, there is already

100% > > > more data. When 100% is compared with another source and 85% is the > same, > > > you only have to check 15% and decide what is right. When you

compare

> > with > > > two distinct sources, the percentage that differs changes again..

> > this way it makes sense to check errors > > > > It does not help when you state that either party has people that

care

> do not care about quality. By providing a high likelihood that

something > is > > problematic, you will learn who actually makes a difference. It

however > > > started with having data to compare in the first place > > > Thanks, > > > GerardM > > > > > > On 20 November 2015 at 14:50, Petr Kadlec <petr.kadlec(a)gmail.com> > wrote: > > > > > > > On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen < > > > > gerard.meijssen(a)gmail.com> > > > > wrote: > > > > > > > > > When Wikipedia is a black box, not communicating about with the > > outside > > > > > world, at some stage the situation becomes toxic. At this

moment

> there > > > are > > > > already those at Wikidata that argue not to bother about

Wikipedia > > > > quality > > > > > because in their view, Wikipedians do not care about its own > quality. > > > > > > > > > > > > > Right. When some users blindly dump random data to Wikidata, not > > > > communicating about with the outside world, at some stage the > situation > > > > becomes toxic. At this moment there are already those at

Wikipedia

that

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

> > > > > > > > > > > -- > > GN. > > President Wikimedia Australia > > WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra > > Photo Gallery: http://gnangarra.redbubble.com > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Andreas Kolbe

23 Nov 23 Nov

9:15 p.m.

On Fri, Nov 20, 2015 at 11:32 PM, Gnangarra <gnangarra(a)gmail.com> wrote:

...

There was an interesting Oxford Internet Institute article recently discussing the potential problems that can result when Wikidata and/or the Knowledge Graph provide the Internet public with a single answer: nuances get lost, and provenance becomes obscured. http://cii.oii.ox.ac.uk/2015/11/05/semantic-cities/ The underlying study, "Semantic Cities: Coded Geopolitics and the Rise of the Semantic Web", by Heather Ford and Mark Graham, is here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2682459 Moreover, I was somewhat surprised to learn the other day that, apparently, over 80 percent of Wikidata statements are either unreferenced or only referenced to a Wikipedia: https://de.wikipedia.org/w/index.php?title=Datei:Citing_as_a_public_service… That seems like a recipe for disaster, given that Wikidata feeds the Google Knowledge Graph and Bing Satori to some extent. Thoughts?

Leila Zia

10:18 p.m.

Hi Andreas, On Mon, Nov 23, 2015 at 1:15 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Moreover, I was somewhat surprised to learn the other day that, apparently, over 80 percent of Wikidata statements are either unreferenced or only referenced to a Wikipedia: https://de.wikipedia.org/w/index.php?title=Datei:Citing_as_a_public_service… That seems like a recipe for disaster, given that Wikidata feeds the Google Knowledge Graph and Bing Satori to some extent. Thoughts?

Here are my thoughts: 1) No, it's not a recipe for disaster. :-) I expand below. 2) People sit at the different parts of the spectrum when it comes to the issues around Wikidata references. What almost all these people have in common is that they know having references is a very valuable thing for Wikidata (or any other knowledge base for that matter). 3) As a researcher, as long as the data is in Wikidata, with or without a reference, I'm already some steps ahead. If there is no reference, I have a starting point to look for a reference for that specific value, and in that process, I may find conflicting data with new references. For a project in a growing stage, these are opportunities, not blockers. 4) I hear a lot of sensitivity about referencing Wikidata claim values to Wikipedia. I hear people's concerns (having loops in referencing mechanisms is not good) but I do not consider the existence of Wikipedia references an issue and I certainly prefer a Wikipedia reference over no reference, especially if the date the information was extracted at is also tracked somewhere in Wikidata. Giving information to the researcher that the data has come from Wikipedia will give him/her a head-start about where to continue the search. 5) I see a need to give the users of open data a chance to use data with more knowledge and control. For example, if you are an app developer, you should be able to figure out relatively easily what data in Wikidata you can fully trust, and what data you may want to skip using in your app. At the moment, some part of the community considers a value with a non- Wikipedia reference approved/monitored by a human as trustworthy (this is no written rule, I'm summarizing my current understanding based on discussions with some of the Wikidata community members, including myself :-). But, among other things, the reference in Wikidata may not be a trustworthy reference. We should surface how much trust one should have in the values in Wikidata to the end-user. What is amazing is: There are many great things one can do based on the data that is being gathered in Wikidata. We should all work together to improve that data, but we should also acknowledge that our attention is split across many projects (this is definitely the case for me), and as a result, we will be seeing steady and smooth improvements in Wikidata, and not sudden and very fast improvements. We need to stay curious, excited, committed, and patient. :-) Leila Disclaimer: These are my personal views about references in Wikidata, and not necessarily the views of my team or the Wikimedia Foundation. :-)

Gnangarra

11:37 p.m.

some resposnes to Leila comments 1. Its not a disaster but it is a serious concern, we know from past experiences that it goes to the heart of the projects long term credibility, Countless hours and funds have gone into redressing Wikipedias reputation and still after 8 years of doing this we get bagged, we are still answering these questions. why send Wikidata done that track when we all understand the importance of referencing or in more theological perspective "if we cant learn from history, why do we spend so many resources recording history" 2. referencing is a very valuable thing for all data, that should be a starting point for the spectrum and Wikidata, rather than a goal or end point. Wikipeidas still have unreferenced material 15 years after it started 3. I'd disagree if the data isnt referenced then its of no value, Wikipedias are a better place to look 4.Wikipedia reference isnt ideal but it is better than nothing, providing that reference is to a permanent link rather than just a article at least then if the information is changed there is some ability to recover the original source. In general a circular reference is a bad out come 5.People need to able to trust all data in WikiData, otherwise they just wont use it because as Wikidata expands the same PR firms, interest groups which have seen so many of WP issues will gravitate to the easier to manipulate WikiData Lets build something based on the lessons learnt on Wikipedia over the last 15 years rather than duplicate those missteps On 24 November 2015 at 06:18, Leila Zia <leila(a)wikimedia.org> wrote:

...

Hi Andreas, On Mon, Nov 23, 2015 at 1:15 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

Moreover, I was somewhat surprised to learn the other day that,

apparently,

over 80 percent of Wikidata statements are either unreferenced or only referenced to a Wikipedia:

https://de.wikipedia.org/w/index.php?title=Datei:Citing_as_a_public_service…

That seems like a recipe for disaster, given that Wikidata feeds the

Google

Knowledge Graph and Bing Satori to some extent. Thoughts?

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Andreas Kolbe

24 Nov 24 Nov

4:28 a.m.

On Mon, Nov 23, 2015 at 11:37 PM, Gnangarra <gnangarra(a)gmail.com> wrote:

...

5.People need to able to trust all data in WikiData, otherwise they just wont use it because as Wikidata expands the same PR firms, interest groups which have seen so many of WP issues will gravitate to the easier to manipulate WikiData

I think the potential problem here is far worse: people *will use* the data, because their lack of trustworthiness, as amply described in the Wikidata disclaimer[1], is no longer visible when they're displayed as "fact" by dominant search engines. Google is already committed to Wikidata. Wikidata is in part a Google project. This means information placed in Wikidata may in time have the potential to reach an audience of billions – a far greater audience than Wikipedia has. People already blindly copy falsehoods from Wikipedia today, because important caveats (like checking the sourcing to assess the reliability of a Wikipedia article) are widely ignored. As a result, circular references and citogenesis have become a significant problem for Wikipedia. People are far more likely still to copy blindly from Google. It's circular referencing on steroids. The way things are headed, manipulations in Wikidata that enter the Google Knowledge Graph, Bing Satori, etc. could end up having far greater leverage than any Wikipedia manipulation has ever had. In the worst-case scenario – depending on how much search engines will come to rely on Wikidata – an edit war won by anonymous players in an obscure corner of Wikidata might literally redefine truth for the English-speaking internet. Is this really a good thing? Are checks and balances in place to prevent this from happening?

...

Lets build something based on the lessons learnt on Wikipedia over the last 15 years rather than duplicate those missteps

That seems like good advice to me. The online world's information infrastructure shouldn't be built on sand. [1] https://www.wikidata.org/wiki/Wikidata:General_disclaimer – highlights: "Wikidata cannot guarantee the validity of the information found here. [...] No formal peer review[:] Wikidata does not have an executive editor or editorial board that vets content before it is published. Our active community of editors uses tools such as the Special:Recentchanges and Special:Newpages feeds to monitor new and changing content. However, Wikidata is not uniformly peer reviewed; while readers may correct errors or engage in casual peer review, they have no legal duty to do so and thus all information read here is without any implied warranty of fitness for any purpose or use whatsoever. None of the contributors, sponsors, administrators or anyone else connected with Wikidata in any way whatsoever can be responsible for the appearance of any inaccurate or libelous information or for your use of the information contained in or linked from these web pages [...] neither is anyone at Wikidata responsible should someone change, edit, modify or remove any information that you may post on Wikidata or any of its associated projects."

Leila Zia

11:26 p.m.

On Mon, Nov 23, 2015 at 8:28 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Mon, Nov 23, 2015 at 11:37 PM, Gnangarra <gnangarra(a)gmail.com> wrote:

5.People need to able to trust all data in WikiData, otherwise they just wont use it because as Wikidata expands the same PR firms, interest

groups

which have seen so many of WP issues will gravitate to the easier to manipulate WikiData

It's worth mentioning: Dominant search engines do not rely on one source of information to surface results, they get information from many sources, weigh the responses they get based on the trust on the sources and many other factors, and aggregate to find the best answer to be shown to the user. I just used "chicken pox" as a search query in Google, I see an information box on the right-hand-side of the page about the disease, and when I click on Sources I get this page <https://support.google.com/websearch/answer/2364942?p=medical_conditions&rd=1> ("See where we found the medical information") which shows all the sources Google has used to retrieve information about chicken pox from, nothing in that list starts with wiki. Of course, this is not the case for all search queries, for some of them, Google still uses Wikipedia snippets. Leila

Andreas Kolbe

25 Nov 25 Nov

12:12 a.m.

On Tue, Nov 24, 2015 at 11:26 PM, Leila Zia <leila(a)wikimedia.org> wrote:

...

Have you never seen Google display gross Wikipedia vandalism?[1][2] Cases like that make it very clear that the Wikimedia content in question entered Google directly, without human oversight or cross-checking against other sources. What you describe sounds good, but it didn't happen. If even transient vandalism passes through (the Finnish vandalism was reportedly deleted in Wikipedia within minutes), then so can more subtle and long-lived errors and falsehoods. Similarly, Bing Satori's timeline is simply made up of verbatim Wikipedia sentences containing a numerical year. We know far too little about how search engines import Wikipedia and Wikidata content, and what proportion of content is checked and how.

...

I just used "chicken pox" as a search query in Google, I see an information box on the right-hand-side of the page about the disease, and when I click on Sources I get this page < https://support.google.com/websearch/answer/2364942?p=medical_conditions&am…

("See where we found the medical information") which shows all the sources Google has used to retrieve information about chicken pox from, nothing in that list starts with wiki. Of course, this is not the case for all search queries, for some of them, Google still uses Wikipedia snippets.

Gnangarra

3:57 a.m.

this isnt about how or whats of Google its about ensuring that what we do is trustworthy On 25 November 2015 at 08:12, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Tue, Nov 24, 2015 at 11:26 PM, Leila Zia <leila(a)wikimedia.org> wrote:

It's worth mentioning: Dominant search engines do not rely on one source of information to

surface

results, they get information from many sources, weigh the responses they get based on the trust on the sources and many other factors, and

aggregate

to find the best answer to be shown to the user.

I just used "chicken pox" as a search query in Google, I see an

information

box on the right-hand-side of the page about the disease, and when I

click

on Sources I get this page <

https://support.google.com/websearch/answer/2364942?p=medical_conditions&am…

("See where we found the medical information") which shows all the

sources

Google has used to retrieve information about chicken pox from, nothing

that list starts with wiki. Of course, this is not the case for all

queries, for some of them, Google still uses Wikipedia snippets.

For medical queries, Google (rightly) prefers other sources, so those queries are not presently affected. [1] https://www.seroundtable.com/google-world-series-cardinals-blunder-17587.ht… [2] https://commons.wikimedia.org/wiki/File:Wikipedia_vandalism_in_Google_infob… _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Gerard Meijssen

10:56 a.m.

Hoi, To belabour the point, we do make errors, we will fail in expectations. What we need is not complaining that the world is not perfect, we need to have an approach that will improve our data and is inclusive. We need to be more of a wiki. Thanks, GerardM On 25 November 2015 at 04:57, Gnangarra <gnangarra(a)gmail.com> wrote:

...

this isnt about how or whats of Google its about ensuring that what we do is trustworthy On 25 November 2015 at 08:12, Andreas Kolbe <jayen466(a)gmail.com> wrote:

On Tue, Nov 24, 2015 at 11:26 PM, Leila Zia <leila(a)wikimedia.org> wrote:

It's worth mentioning: Dominant search engines do not rely on one source of information to

surface > results, they get information from many sources, weigh the responses

they

get based on the trust on the sources and many other factors, and

aggregate

to find the best answer to be shown to the user.

Have you never seen Google display gross Wikipedia vandalism?[1][2] Cases like that make it very clear that the Wikimedia content in question

entered

Google directly, without human oversight or cross-checking against other sources. What you describe sounds good, but it didn't happen. If even transient vandalism passes through (the Finnish vandalism was reportedly deleted in Wikipedia within minutes), then so can more subtle and long-lived errors and falsehoods. Similarly, Bing Satori's timeline is simply made up of verbatim Wikipedia sentences containing a numerical year. We know far too little about how search engines import Wikipedia and Wikidata content, and what proportion of content is checked and how.

I just used "chicken pox" as a search query in Google, I see an

information

box on the right-hand-side of the page about the disease, and when I

click

on Sources I get this page <

https://support.google.com/websearch/answer/2364942?p=medical_conditions&am…

("See where we found the medical information") which shows all the

sources

Google has used to retrieve information about chicken pox from, nothing

that list starts with wiki. Of course, this is not the case for all

queries, for some of them, Google still uses Wikipedia snippets.

For medical queries, Google (rightly) prefers other sources, so those queries are not presently affected. [1]

https://www.seroundtable.com/google-world-series-cardinals-blunder-17587.ht…

[2]

https://commons.wikimedia.org/wiki/File:Wikipedia_vandalism_in_Google_infob…

Gerard Meijssen

24 Nov 24 Nov

7:15 a.m.

Hoi, To start of, results from the past are no indications of results in the future. It is the disclaimer insurance companies have to state in all their adverts in the Netherlands. When you continue and make it a "theological" issue, you lose me because I am not of this faith, far from it. Wikidata is its own project and it is utterly dissimilar from Wikipedia.To start of Wikidata has been a certified success from the start. The improvement it brought by bringing all interwiki links together is enormous.That alone should be a pointer that Wikipedia think is not realistic. To continue, people have been importing data into Wikidata from the start. They are the statements you know and, it was possible to import them from Wikipedia because of these interwiki links. So when you call for sources, it is fairly save to assume that those imports are supported by the quality of the statements of the Wikipedias and if anything, that is also where they typically fail because many assumptions at Wikipedia are plain wrong at Wikidata. For instance a listed building is not the organisation the building is known for. At Wikidata they each need their own item and associated statements. Wikidata is already a success for other reasons. VIAF no longer links to Wikipedia but to Wikidata. The biggest benefit of this move is for people who are not interested in English. Because of this change VIAF links through Wikidata to all Wikipedias not only en.wp. Consequently people may find through VIAF Wikipedia articles in their own language through their library systems. So do not forget about Wikipedia and the lessons learned. These lessons are important to Wikipedia. However, they do not necessarily apply to Wikidata particularly when you approach Wikidata as an opportunity to do things in a different way. Set theory, a branch of mathematics, is exactly what we need. When we have data at Wikidata of a given quality.. eg 90% and we have data at another source with a given quality eg 90%, we can compare the two and find a subset where the two sources do not match. When we curate the differences, it is highly likely that we improve quality at Wikidata or at the other source. With a proper workflow and an iterative approach to multiple sources, we will spend time adding sources and improving quality. This is more productive than religiously adding sources for every statement. It also brings us better information in less time. I hope this will help people understand that Wikidata is not Wikipedia and, that is a good thing. Thanks, GerardM On 24 November 2015 at 00:37, Gnangarra <gnangarra(a)gmail.com> wrote:

...

Hi Andreas, On Mon, Nov 23, 2015 at 1:15 PM, Andreas Kolbe <jayen466(a)gmail.com>

wrote:

Moreover, I was somewhat surprised to learn the other day that,

apparently,

over 80 percent of Wikidata statements are either unreferenced or only referenced to a Wikipedia:

https://de.wikipedia.org/w/index.php?title=Datei:Citing_as_a_public_service…

That seems like a recipe for disaster, given that Wikidata feeds the

Google

Knowledge Graph and Bing Satori to some extent. Thoughts?

have a

starting point to look for a reference for that specific value, and in

that

process, I may find conflicting data with new references. For a project

a growing stage, these are opportunities, not blockers. 4) I hear a lot of sensitivity about referencing Wikidata claim values to Wikipedia. I hear people's concerns (having loops in referencing

mechanisms

is not good) but I do not consider the existence of Wikipedia references

issue and I certainly prefer a Wikipedia reference over no reference, especially if the date the information was extracted at is also tracked somewhere in Wikidata. Giving information to the researcher that the data has come from Wikipedia will give him/her a head-start about where to continue the search. 5) I see a need to give the users of open data a chance to use data with more knowledge and control. For example, if you are an app developer, you should be able to figure out relatively easily what data in Wikidata you can fully trust, and what data you may want to skip using in your app. At the moment, some part of the community considers a value with a non- Wikipedia reference approved/monitored by a human as trustworthy (this is no written rule, I'm summarizing my current understanding based on discussions with some of the Wikidata community members, including myself :-). But, among other things, the reference in Wikidata may not be a trustworthy reference. We should surface how much trust one should have

the values in Wikidata to the end-user. What is amazing is: There are many great things one can do based on the data that is being gathered in Wikidata. We should all work together to improve that data, but we should also acknowledge that our attention is split across many projects (this is definitely the case for me), and as a result, we will be seeing steady and smooth improvements in Wikidata, and not sudden and very fast improvements. We need to stay curious, excited, committed, and patient. :-) Leila Disclaimer: These are my personal views about references in Wikidata, and not necessarily the views of my team or the Wikimedia Foundation. :-) _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

27 Nov 27 Nov

11:08 a.m.

Gerard, On Tue, Nov 24, 2015 at 7:15 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

These benefits are internal to Wikimedia and a completely separate issue from third-party re-use of Wikidata content as a default reference source, which is the issue of concern here. To continue, people have been importing data into Wikidata from the start.

...

They are the statements you know and, it was possible to import them from Wikipedia because of these interwiki links. So when you call for sources, it is fairly save to assume that those imports are supported by the quality of the statements of the Wikipedias

The quality of three-quarters of the 280+ Wikipedia language versions is about at the level the English Wikipedia had reached in 2002. Even some of the larger Wikipedias have significant problems. The Kazakh Wikipedia for example is controlled by functionaries of an oppressive regime[1], and the Croatian one is reportedly[2] controlled by fascists rewriting history (unless things have improved markedly in the Croatian Wikipedia since that report, which would be news to me). The Azerbaijani Wikipedia seems to have problems as well. The Wikimedia movement has always had an important principle: that all content should be traceable to a "reliable source". Throughout the first decade of this movement and beyond, Wikimedia content has never been considered a reliable source. For example, you can't use a Wikipedia article as a reference in another Wikipedia article. Another important principle has been the disclaimer: pointing out to people that the data is anonymously crowdsourced, and that there is no guarantee of reliability or fitness for use. Both of these principles are now being jettisoned. Wikipedia content is considered a reliable source in Wikidata, and Wikidata content is used as a reliable source by Google, where it appears without any indication of its provenance. This is a reflection of the fact that Wikidata, unlike Wikipedia, comes with a CC0 licence. That decision was, I understand, made by Denny, who is both a Google employee and a WMF board member. The benefit to Google is very clear: this free, unattributed content adds value to Google's search engine result pages, and improves Google's revenue (currently running at about $10 million an hour, much of it from ads). But what is the benefit to the end user? The end user gets information of undisclosed provenance, which is presented to them as authoritative, even though it may be compromised. In what sense is that an improvement for society? To me, the ongoing information revolution is like the 19th century industrial revolution done over. It created whole new categories of abuse, which it took a century to (partly) eliminate. But first, capitalists had a field day, and the people who were screwed were the common folk. Could we not try to learn from history?

...

and if anything, that is also where they typically fail because many assumptions at Wikipedia are plain wrong at Wikidata. For instance a listed building is not the organisation the building is known for. At Wikidata they each need their own item and associated statements. Wikidata is already a success for other reasons. VIAF no longer links to Wikipedia but to Wikidata. The biggest benefit of this move is for people who are not interested in English. Because of this change VIAF links through Wikidata to all Wikipedias not only en.wp. Consequently people may find through VIAF Wikipedia articles in their own language through their library systems.

At the recent Wikiconference USA, a Wikimedia veteran and professional librarian expressed the view to me that * circular referencing between VIAF and Wikidata will create a humongous muddle that nobody will be able to sort out again afterwards, because – unlike wiki mishaps in other topic areas – here it's the most authoritative sources that are being corrupted by circular referencing; * third parties are using Wikimedia content as a *reference standard *when that was never the intention (see above). I've seen German Wikimedians express concerns that quality assurance standards have dropped alarmingly since the project began, with bot users mass-importing unreliable data.

...

So do not forget about Wikipedia and the lessons learned. These lessons are important to Wikipedia. However, they do not necessarily apply to Wikidata particularly when you approach Wikidata as an opportunity to do things in a different way. Set theory, a branch of mathematics, is exactly what we need. When we have data at Wikidata of a given quality.. eg 90% and we have data at another source with a given quality eg 90%, we can compare the two and find a subset where the two sources do not match. When we curate the differences, it is highly likely that we improve quality at Wikidata or at the other source.

This sounds like "Let's do it quick and dirty and worry about the problems later". I sometimes get the feeling software engineers just love a programming challenge, because that's where they can hone and display their skills. Dirty data is one of those challenges: all the clever things one can do to clean up the data! There is tremendous optimism about what can be done. But why have bad data in the first place, starting with rubbish and then proving that it can be cleaned up a bit using clever software? The effort will make the engineer look good, sure, but there will always be collateral damage as errors propagate before they are fixed. The engineer's eyes are not typically on the content, but on their software. The content their bots and programs manipulate at times seems almost incidental, something for "others" to worry about – "others" who don't necessarily exist in sufficient numbers to ensure quality. In short, my feeling is that the engineering enthusiasm and expertise applied to Wikidata aren't balanced by a similar level of commitment to scholarship in generating the data, and getting them right first time. We've seen where that approach can lead with Wikipedia. Wikipedia hoaxes and falsehoods find their way into the blogosphere, the media, even the academic literature. The stakes with Wikidata are potentially much higher, because I fear errors in Wikidata stand a good chance of being massively propagated by Google's present and future automated information delivery mechanisms, which are completely opaque. Most internet users aren't even aware to what extent the Google Knowledge Graph relies on anonymously compiled, crowdsourced data; they will just assume that if Google says it, it must be true. In addition to honest mistakes, transcription errors, outdated info etc., the whole thing is a propagandist's wet dream. Anonymous accounts! Guaranteed identity protection! Plausible deniability! No legal liability! Automated import and dissemination without human oversight! Massive impact on public opinion![3] If information is power, then this provides the best chance of a power grab humanity has seen since the invention of the newspaper. In the media landscape, you at least have right-wing, centrist and left-wing publications each presenting their version of the truth, and you know who's publishing what and what agenda they follow. You can pick and choose, compare and contrast, read between the lines. We won't have that online. Wikimedia-fuelled search engines like Google and Bing dominate the information supply. The right to enjoy a pluralist media landscape, populated by players who are accountable to the public, was hard won in centuries past. Some countries still don't enjoy that luxury today. Are we now blithely giving it away, in the name of progress, and for the greater glory of technocrats? I don't trust the way this is going. I see a distinct possibility that we'll end up with false information in Wikidata (or, rather, the Google Knowledge Graph) being used to "correct" accurate information in other sources, just because the Google/Wikidata content is ubiquitous. If you build circular referencing loops fuelled by spurious data, you don't provide access to knowledge, you destroy it. A lie told often enough etc. To quote Heather Ford and Mark Graham, "We know that the engineers and developers, volunteers and passionate technologists are often trying to do their best in difficult circumstances. But there need to be better attempts by people working on these platforms to explain how decisions are made about what is represented. These may just look like unimportant lines of code in some system somewhere, but they have a very real impact on the identities and futures of people who are often far removed from the conversations happening among engineers." I agree with that. The "what" should be more important than the "how", and at present it doesn't seem to be. It's well worth thinking about, and having a debate about what can be done to prevent the worst from happening. In particular, I would like to see the decision to publish Wikidata under a CC0 licence revisited. The public should know where the data it gets comes from; that's a basic issue of transparency. Andreas [1] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed [2] http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-contro… [3] http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-201…

Liam Wyatt

12:51 p.m.

On 27 November 2015 at 12:08, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

The Wikimedia movement has always had an important principle: that all content should be traceable to a "reliable source". Throughout the first decade of this movement and beyond, Wikimedia content has never been considered a reliable source. For example, you can't use a Wikipedia article as a reference in another Wikipedia article. Another important principle has been the disclaimer: pointing out to people that the data is anonymously crowdsourced, and that there is no guarantee of reliability or fitness for use. Both of these principles are now being jettisoned. Wikipedia content is considered a reliable source in Wikidata...

<snip> I agree that "reliable source" referencing and "crowdsourced content" are indeed principles of our movement. However, I disagree that Wikidata is "jettisoning" them. In fact, quite the contrary! The purpose of the statement "imported from --> English Wikipedia" in the "reference" field of a Wikidata item's statement is PRECISELY to indicate to the user that this information has not been INDEPENDENTLY verified to a reliable source and that Wikipedia is NOT considered a reliable source. Furthermore, it provides a PROVENANCE of that information to help stop people from circular referencing. That is - clearly stating that the specific fact in Wikidata has come from Wikipedia helps to avoid the structured-data equivalent of "citogenisis": https://xkcd.com/978/ If/When a person can provide a reliable reference for that same fact, they are encouraged to add an actual reference. Note, the wikidata statement used for facts coming in from Wikipedia use the property "imported from". This is deliberately different from the property "reference URL" which is what you would use when adding an actual reference to a third-party reliable online source. Furthermore, the fact that many statements in Wikidata are not given a reference (yet) is not necessarily a "problem". For example - this https://www.wikidata.org/wiki/Q21481859 is a Wikidata item for a scientific publication with 2891 co-authors!! This is an extreme example, but it demonstrates my point... None of those 2891 statements has a specific reference listed for it, because all of them are self-evidently referenced to the scientific publication itself. The same is true of the other properties applied to this item (volume, publication date, title, page number...). All of these could be "referenced" to the very first property in the Wikidata item - the DOI of the scientific article: http://www.sciencedirect.com/science/article/pii/S0370269312008581 This item is not "less reliable" because it doesn't have the same footnote repeated almost three thousand times, but if you merely look at statistics of "unreferenced wikidata statements" it would APPEAR that it is very poorly cited. So, I think we need a more nuanced view of what "proper referencing" means in the context of Wikidata. -Liam wittylama.com Peace, love & metadata

Gnangarra

1:47 p.m.

Disclaimer first - I'm not exactly conversant in the intricacies of WikiData, if I was to take the information on 14th Dalai Lama https://en.wikipedia.org/wiki/14th_Dalai_Lama it links to Wikidata at https://www.wikidata.org/wiki/Q17293 the en article has 2 references that list his date of birth, the WikiData item has two references for the same piece of information WikiData source; 1. just says imported from Russian language Wikipedia, which links to Wikidata page on the Russian Wikipedia not to the source url nor does it link to permanent url so as a source its meaningless, while may just be the result of who did the data import linking to Russian language Wikipedia is kind of obscure for a source, I can understand a tibetan, mandarin, or cantonese language source as they would be associated with the region 2. Integrated Authority File links to https://www.wikidata.org/wiki/Q36578 on WikiData it doesnt provide a url or any other information which enables someone to verify what is said Despite two reference the data itself appears to be immediately untraceable to a reliable source. The circular reference of Wikidata to a Wikipedia of any language is ok but the link should be traceable to a specific article version which would then make it possible to verify the data even if the current data on Wikipedia is changed after its imported, that in itself shouldnt be difficult to engineer. If that was the case then to me a Wikipedia reference for all data is a reasonable minimum standard to start at, finding a way to replicate the same data 2891 times in Liams scenario shouldnt be much of a challenge if WP can replicate templates in 100,000 articles, as a standard we have GLAM making donations of images in quantities of 10,000's I htnk someone has already solved this in a meaningful way On 27 November 2015 at 20:51, Liam Wyatt <liamwyatt(a)gmail.com> wrote:

...

On 27 November 2015 at 12:08, Andreas Kolbe <jayen466(a)gmail.com> wrote:

people

that the data is anonymously crowdsourced, and that there is no guarantee of reliability or fitness for use. Both of these principles are now being jettisoned. Wikipedia content is considered a reliable source in Wikidata...

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Andreas Kolbe

3:27 p.m.

On Fri, Nov 27, 2015 at 1:47 PM, Gnangarra <gnangarra(a)gmail.com> wrote:

...

Would it not make more sense to import (and verify!) the reliable source cited in the relevant Wikipedia version, along with the statement?

geni

7:19 p.m.

On 27 November 2015 at 15:27, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Fri, Nov 27, 2015 at 1:47 PM, Gnangarra <gnangarra(a)gmail.com> wrote: Would it not make more sense to import (and verify!) the reliable source cited in the relevant Wikipedia version, along with the statement?

You hit issues with non machine readable data, paywalls and deadtree walls. And even then it varies by field (for example in chemistry if you can get around those problems you wouldn't bother with wikidata and instead go straight for the Beilstein clone). -- geni

Jane Darnell

2:09 p.m.

Yes I agree. I think most of the discussion here has to do with people conflating the concept of text as in Wikipedia sentences and the concept of data as in Wikidata statements. When a user adds an image from Commons on Wikipedia, the source of the image is generally not added to Wikipedia, and I have never heard anyone complain about that except for image donors who wished that their images *were* attributed when used on Wikipedia. The same is true when Wikipedians add Wikidata statements from an item on Wikipedia. A date statement in Wikidata for a painting may be indirectly referenced in the item in another statement (the collection statement, or a "described at url" statement). This is also true of the way the date field in the Commons artwork template is used. It is just as undesirable to clutter Wikipedia with a reference for such a date from Wikidata as it is to reference the source of the file image when including images, and so there will generally not be a reference for the pulled date in the Wikidata infobox, because the user can always look up the item for more information. Most paintings included on Wikipedia, with or without infoboxes, do not reference the date field specifically - either to the Commons image or to the article. When they do, this is often in cases where the date has been disputed. Our goal is not to reference everything, but to reference the things that need referencing. On Fri, Nov 27, 2015 at 1:51 PM, Liam Wyatt <liamwyatt(a)gmail.com> wrote:

...

On 27 November 2015 at 12:08, Andreas Kolbe <jayen466(a)gmail.com> wrote:

people

Andreas Kolbe

3:16 p.m.

Liam, I am interested in anything demonstrating that the things I am concerned about are not a problem. Further Comments interspersed below. On Fri, Nov 27, 2015 at 12:51 PM, Liam Wyatt <liamwyatt(a)gmail.com> wrote:

...

On 27 November 2015 at 12:08, Andreas Kolbe <jayen466(a)gmail.com> wrote:

people

How does the presence of that information in Wikidata help if the Google user just gets the info in the Knowledge Graph without any indication that it comes from Wikidata? Because CC0 specifically waives the right to attribution that Wikipedia retains.[1][2] No re-user of Wikidata content is required to say where the data came from, and they typically don't. So, absent this information, don't you think it likely that users will simply propagate information they find in Google and on other reusers' sites? Rather than preventing citogenesis, I think it's citogenesis on steroids, given that Google has far more users than any Wikimedia project. This CC0, no-attribution arrangement may financially benefit Google, because they can dispense with a source link that might lead users away from their own site and their ads, but how does it benefit the public, or indeed benefit Wikimedia? Are we all just working to make Google richer, or are we working for the public? Moreover, according to data on Wikimedia Labs[3], about half of all statements in Wikidata have *no reference whatsoever*. That's *in addition* to the third that are only referenced to a Wikipedia. Yet all of this material is meant to form an input to the Google Knowledge Graph, following Google's abandonment of Freebase in favour of Wikidata.[4][5]

...

Furthermore, the fact that many statements in Wikidata are not given a reference (yet) is not necessarily a "problem". For example - this https://www.wikidata.org/wiki/Q21481859 is a Wikidata item for a scientific publication with 2891 co-authors!! This is an extreme example, but it demonstrates my point... None of those 2891 statements has a specific reference listed for it, because all of them are self-evidently referenced to the scientific publication itself. The same is true of the other properties applied to this item (volume, publication date, title, page number...). All of these could be "referenced" to the very first property in the Wikidata item - the DOI of the scientific article: http://www.sciencedirect.com/science/article/pii/S0370269312008581 This item is not "less reliable" because it doesn't have the same footnote repeated almost three thousand times, but if you merely look at statistics of "unreferenced wikidata statements" it would APPEAR that it is very poorly cited. So, I think we need a more nuanced view of what "proper referencing" means in the context of Wikidata.

I take your point, even though I am unsure what value this Wikidata listing adds for the public, given that it merely reproduces details from the publisher's page. Might we be reinventing the wheel? And if there is value added for the public in some way that escapes me, surely it would not be difficult to have the bot add the reference automatically when importing the data from the publisher's page, thereby showing that it is referenced and making it easier to spot when someone subsequently adds the name of his classmate as a joke? I'll add an extreme example of my own, from the opposite end of the spectrum: for five months in 2014, Wikidata told the world that Franklin D. Roosevelt was also known as "Adolf Hitler".[6] If obvious unsourced vandalism lasts as long as that, I am not sanguine about the likelihood of more subtle distortions being spotted in a timely manner. Note that manipulation of Knowledge Graph content was reportedly a problem with Freebase as well.[4] [1] https://creativecommons.org/publicdomain/zero/1.0/ [2] https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attributio… [3] https://tools.wmflabs.org/wikidata-todo/stats.php [4] https://www.seroundtable.com/google-freebase-wikidata-knowledge-graph-19591… [5] http://searchengineland.com/google-close-freebase-helped-feed-knowledge-gra… [6] https://www.wikidata.org/w/index.php?title=Q8007&oldid=124603129 https://www.wikidata.org/w/index.php?title=Q8007&diff=next&oldid=15…

geni

7:14 p.m.

On 27 November 2015 at 15:16, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

The problem is that there aren't really any alturnatives to CC0 that do any better (since wikidata isn't really copyrightable in conventional terms). Open Data Commons Open Database License would be closest but only applies in the EU and leads to messy arguments over what counts as a substantial part

Gerard Meijssen

6:26 p.m.

Hoi, When a benefit is "Wikimedia specific" and thereby dismissed, you miss much of what is going on. Exactly because of this link most items are well defined as to what they are about. It is not perfect but it is good. Consequently Wikidata is able to link Wikipedia in any language to sources external to Wikipedia. This is a big improvement over linking external sources to a Wikipedia. The disambiguation of subjects is done at the Wikidata end. You make Wikidata to be a "default reference source". Given its current state, it is a bit much. Wikidata does not have the maturity to function as such. The best pointer to this fact is that 50% of all items has two or fewer statements. When you compare the quality of Wikipedias with what en.wp used to be you are comparing apples and oranges. The Myanmar Wikipedia is better informed on Myanmar than en.wp etc. When you qualify a Wikipedia as fascist, it does not follow that the data is suspect. Certainly when data in a source that you so easily dismiss is typically the same, there is not much meaning in what you say from a Wikidata point of view. I am thrilled that sources are so important to the Wikimedia movement and again, I am wondering what you hope to achieve by this pronouncement. Be realistic what is it that you want to achieve? Is quality important to you and, how do you define it and more importantly how do you want to achieve it. Have you seen the statistics on sources [1]? Then have a better look and you will find that real sources are mostly absent. Adding sources one statement at a time will not significantly improve quality because that is a numbers game and it is easier to achieve quality in a different way. When a librarian says that many sources copy each others data and that this is a problem, the bigger problem is missed. The bigger problem is not where they agree but where they disagree. Arguably they are the statements where quality is more likely an issue. Now ask your librarian what is likely to improve Wikidata more either find Sources for the statements that differ of find Sources where the statements agree. Wikidata is not authoritative but when our community starts researching such issues both Wikidata and other sources will improve rapidly their quality. This is not to say that in the end you want both Sources where sources agree and disagree. Then ask your librarian if there is a problem with missing data We can import data from sources and consequently be more informative or we do not import more data and people have to magically combine information that exists in many sources to get a composite view. We could see Wikidata as a place where data is combined and compared with other sources, Do tell your librarian that the process mentioned above should be iterative and it will be easily understood that comparing with just one additional source will improve the focus on likely issues even more. PS What does your librarian think when she knows that the Dutch National Library is inclined to provide us with software so that books can be ordered at Dutch libraries from Wikidata data (and by inference from Wikipedias)? When some see Wikidata as a source of reference, they will increasingly be served a better product. At this moment it is not good at all. When German Wikimedians have concerns about quality.WONDERFUL but what have they done to improve things? Do they apply Wikipedia standards and how does that help? You wonder why have "bad" data in the first place... Our data IS bad and there is not enough of it for it to be really useful. We can easily add more data and have a more useful result We can easily compare sources and ask people to concentrate on differences. However you can not tell me to add Sources to the data that I add. I will tell you to do it yourself. I am happy to improve on quality but on my terms, not yours. You mention the propagation of errors.. How would that work. You indicate that there are not enough people to fix all the issues. With bots like Kian, we have probability in adding data. We have people add data where the software is not certain. You doubt technology but you do not know where we are, what is already done. In short my feeling is that you do not know what you are talking about. There is real scholarship in the approach that I described, My take is in applying set theory. Kian is AI. For all I care yours is FUD. Your notion of accountability is one of a consumer, it is not the accountability needed for a project that is immature and is not at all at a stage where you should imply that it is good enough and that quality is assured. There are domains in Wikidata that I will not touch because in my opinion it is wrong in its principles. At the same time I know that it can be fixed in time and leave it at that, I disagree with Heather Ford and Mark Graham. As long as Wikidata does not have the power of a Reasonator, the data is just that. It does not make itself in information and consequently it is awful. When there is one thing the Wikidata engineers do not do, it is considering the use of the data and the workflows to improve the data and the quality. The data needs to be CC-0 because it is how we ensure that everybody will be happy and willing to participate. As more participation happens as more collaboration occurs we will see Wikidata increase in the amount of data that it holds and at the same time we will see quality improve. Yes, Wikidata could do more in the way of adding sources to data. As long as the "primary sources tool" does not add the sources it knows, what do you expect from anybody else. Thanks, GerardM [1] https://tools.wmflabs.org/wikidata-todo/stats.php?reverse On 27 November 2015 at 12:08, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Gerard, On Tue, Nov 24, 2015 at 7:15 AM, Gerard Meijssen < gerard.meijssen(a)gmail.com> wrote:

Hoi, To start of, results from the past are no indications of results in the future. It is the disclaimer insurance companies have to state in all

their

adverts in the Netherlands. When you continue and make it a "theological" issue, you lose me because I am not of this faith, far from it. Wikidata

its own project and it is utterly dissimilar from Wikipedia.To start of Wikidata has been a certified success from the start. The improvement it brought by bringing all interwiki links together is enormous.That alone should be a pointer that Wikipedia think is not realistic.

They are the statements you know and, it was possible to import them

from

Wikipedia because of these interwiki links. So when you call for sources, it is fairly save to assume that those imports are supported by the

quality

of the statements of the Wikipedias

may

find through VIAF Wikipedia articles in their own language through their library systems.

So do not forget about Wikipedia and the lessons learned. These lessons

are

important to Wikipedia. However, they do not necessarily apply to

Wikidata

particularly when you approach Wikidata as an opportunity to do things

in a

different way. Set theory, a branch of mathematics, is exactly what we need. When we have data at Wikidata of a given quality.. eg 90% and we

have

data at another source with a given quality eg 90%, we can compare the

two

and find a subset where the two sources do not match. When we curate the differences, it is highly likely that we improve quality at Wikidata or

the other source.

Lila Tretikov

7:14 p.m.

Hoi Gerard, What I hear in email from Andreas and Liam is not as much the propagation of the error (which I am sure happens with some % of the cases), but the fact that the original source is obscured and therefore it is hard to identify and correct errors, biases, etc. Because if the source of error is obscured, that error is that much harder to find and to correct. In fact, we see this even on Wikipedia articles today (wrong dates of births sourced from publications that don't do enough fact checking is something I came across personally). It is a powerful and important principle on Wikipedia, but with content re-use it gets lost. Public domain/CC0 in combination with AI lands our content for slicing and dicing and re-arranging by others, making it something entirely new, but also detached from our process of validation and verification. I am curious to hear if people think it is a problem. It definitely worries me. We have been looking very closely at Wikidata and the possibilities it offers. I am curious to understand more about your note on Resonator: "As long as Wikidata does not have the power of a Reasonator, the data is just that. It does not make itself in information and consequently it is awful. When there is one thing the Wikidata engineers do not do, it is considering the use of the data and the workflows to improve the data and the quality." Am I understanding you saying that until the data sees the light of day it will not become of high quality? Thanks, Lila On Fri, Nov 27, 2015 at 10:26 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com

...

wrote:

> Hoi, > When a benefit is "Wikimedia specific" and thereby dismissed, you miss much > of what is going on. Exactly because of this link most items are well > defined as to what they are about. It is not perfect but it is good. > Consequently Wikidata is able to link Wikipedia in any language to sources > external to Wikipedia. This is a big improvement over linking external > sources to a Wikipedia. The disambiguation of subjects is done at the > Wikidata end. > > You make Wikidata to be a "default reference source". Given its current > state, it is a bit much. Wikidata does not have the maturity to function as > such. The best pointer to this fact is that 50% of all items has two or > fewer statements. > > When you compare the quality of Wikipedias with what en.wp used to be you > are comparing apples and oranges. The Myanmar Wikipedia is better informed > on Myanmar than en.wp etc. > > When you qualify a Wikipedia as fascist, it does not follow that the data > is suspect. Certainly when data in a source that you so easily dismiss is > typically the same, there is not much meaning in what you say from a > Wikidata point of view. > > I am thrilled that sources are so important to the Wikimedia movement and > again, I am wondering what you hope to achieve by this pronouncement. Be > realistic what is it that you want to achieve? Is quality important to you > and, how do you define it and more importantly how do you want to achieve > it. Have you seen the statistics on sources [1]? Then have a better look > and you will find that real sources are mostly absent. Adding sources one > statement at a time will not significantly improve quality because that is > a numbers game and it is easier to achieve quality in a different way. > > When a librarian says that many sources copy each others data and that this > is a problem, the bigger problem is missed. The bigger problem is not where > they agree but where they disagree. Arguably they are the statements where > quality is more likely an issue. Now ask your librarian what is likely to > improve Wikidata more either find Sources for the statements that differ of > find Sources where the statements agree. Wikidata is not authoritative but > when our community starts researching such issues both Wikidata and other > sources will improve rapidly their quality. This is not to say that in the > end you want both Sources where sources agree and disagree. > > Then ask your librarian if there is a problem with missing data We can > import data from sources and consequently be more informative or we do not > import more data and people have to magically combine information that > exists in many sources to get a composite view. We could see Wikidata as a > place where data is combined and compared with other sources, Do tell your > librarian that the process mentioned above should be iterative and it will > be easily understood that comparing with just one additional source will > improve the focus on likely issues even more. > > PS What does your librarian think when she knows that the Dutch National > Library is inclined to provide us with software so that books can be > ordered at Dutch libraries from Wikidata data (and by inference from > Wikipedias)? > > When some see Wikidata as a source of reference, they will increasingly be > served a better product. At this moment it is not good at all. > > When German Wikimedians have concerns about quality.WONDERFUL but what have > they done to improve things? Do they apply Wikipedia standards and how does > that help? > > You wonder why have "bad" data in the first place... Our data IS bad and > there is not enough of it for it to be really useful. We can easily add > more data and have a more useful result We can easily compare sources and > ask people to concentrate on differences. However you can not tell me to > add Sources to the data that I add. I will tell you to do it yourself. I am > happy to improve on quality but on my terms, not yours. > > You mention the propagation of errors.. How would that work. You indicate > that there are not enough people to fix all the issues. With bots like > Kian, we have probability in adding data. We have people add data where the > software is not certain. You doubt technology but you do not know where we > are, what is already done. > > In short my feeling is that you do not know what you are talking about. > There is real scholarship in the approach that I described, My take is in > applying set theory. Kian is AI. For all I care yours is FUD. > > Your notion of accountability is one of a consumer, it is not the > accountability needed for a project that is immature and is not at all at a > stage where you should imply that it is good enough and that quality is > assured. There are domains in Wikidata that I will not touch because in my > opinion it is wrong in its principles. At the same time I know that it can > be fixed in time and leave it at that, > > I disagree with Heather Ford and Mark Graham. As long as Wikidata does not > have the power of a Reasonator, the data is just that. It does not make > itself in information and consequently it is awful. When there is one thing > the Wikidata engineers do not do, it is considering the use of the data and > the workflows to improve the data and the quality. > > The data needs to be CC-0 because it is how we ensure that everybody will > be happy and willing to participate. As more participation happens as more > collaboration occurs we will see Wikidata increase in the amount of data > that it holds and at the same time we will see quality improve. > > Yes, Wikidata could do more in the way of adding sources to data. As long > as the "primary sources tool" does not add the sources it knows, what do > you expect from anybody else. > Thanks, > GerardM > > > [1] https://tools.wmflabs.org/wikidata-todo/stats.php?reverse > > > > On 27 November 2015 at 12:08, Andreas Kolbe <jayen466(a)gmail.com

...

wrote:

> > > Gerard, > > > > On Tue, Nov 24, 2015 at 7:15 AM, Gerard Meijssen < > > gerard.meijssen(a)gmail.com> >

...

wrote:

> > > > > Hoi, > > > To start of, results from the past are no indications of results in the > > > future. It is the disclaimer insurance companies have to state in all > > their > > > adverts in the Netherlands. When you continue and make it a > "theological" > > > issue, you lose me because I am not of this faith, far from it. > Wikidata > > is > > > its own project and it is utterly dissimilar from Wikipedia.To start of > > > Wikidata has been a certified success from the start. The improvement > it > > > brought by bringing all interwiki links together is enormous.That alone > > > should be a pointer that Wikipedia think is not realistic. > > > > > > > > > These benefits are internal to Wikimedia and a completely separate issue > > from third-party re-use of Wikidata content as a default reference > source, > > which is the issue of concern here. > > > > > > To continue, people have been importing data into Wikidata from the > start. > > > They are the statements you know and, it was possible to import them > > from > > > Wikipedia because of these interwiki links. So when you call for > sources, > > > it is fairly save to assume that those imports are supported by the > > quality > > > of the statements of the Wikipedias > > > > > > > > The quality of three-quarters of the 280+ Wikipedia language versions is > > about at the level the English Wikipedia had reached in 2002. > > > > Even some of the larger Wikipedias have significant problems. The Kazakh > > Wikipedia for example is controlled by functionaries of an oppressive > > regime[1], and the Croatian one is reportedly[2] controlled by fascists > > rewriting history (unless things have improved markedly in the Croatian > > Wikipedia since that report, which would be news to me). The Azerbaijani > > Wikipedia seems to have problems as well. > > > > The Wikimedia movement has always had an important principle: that all > > content should be traceable to a "reliable source". Throughout the first > > decade of this movement and beyond, Wikimedia content has never been > > considered a reliable source. For example, you can't use a Wikipedia > > article as a reference in another Wikipedia article. > > > > Another important principle has been the disclaimer: pointing out to > people > > that the data is anonymously crowdsourced, and that there is no guarantee > > of reliability or fitness for use. > > > > Both of these principles are now being jettisoned. > > > > Wikipedia content is considered a reliable source in Wikidata, and > Wikidata > > content is used as a reliable source by Google, where it appears without > > any indication of its provenance. This is a reflection of the fact that > > Wikidata, unlike Wikipedia, comes with a CC0 licence. That decision was, > I > > understand, made by Denny, who is both a Google employee and a WMF board > > member. > > > > The benefit to Google is very clear: this free, unattributed content adds > > value to Google's search engine result pages, and improves Google's > revenue > > (currently running at about $10 million an hour, much of it from ads). > > > > But what is the benefit to the end user? The end user gets information of > > undisclosed provenance, which is presented to them as authoritative, even > > though it may be compromised. In what sense is that an improvement for > > society? > > > > To me, the ongoing information revolution is like the 19th century > > industrial revolution done over. It created whole new categories of > abuse, > > which it took a century to (partly) eliminate. But first, capitalists > had a > > field day, and the people who were screwed were the common folk. Could we > > not try to learn from history? > > > > > > > > > and if anything, that is also where > > > they typically fail because many assumptions at Wikipedia are plain > wrong > > > at Wikidata. For instance a listed building is not the organisation the > > > building is known for. At Wikidata they each need their own item and > > > associated statements. > > > > > > Wikidata is already a success for other reasons. VIAF no longer links > to > > > Wikipedia but to Wikidata. The biggest benefit of this move is for > people > > > who are not interested in English. Because of this change VIAF links > > > through Wikidata to all Wikipedias not only en.wp. Consequently people > > may > > > find through VIAF Wikipedia articles in their own language through > their > > > library systems. > > > > > > > > > At the recent Wikiconference USA, a Wikimedia veteran and professional > > librarian expressed the view to me that > > > > * circular referencing between VIAF and Wikidata will create a humongous > > muddle that nobody will be able to sort out again afterwards, because – > > unlike wiki mishaps in other topic areas – here it's the most > authoritative > > sources that are being corrupted by circular referencing; > > > > * third parties are using Wikimedia content as a *reference standard > *when > > that was never the intention (see above). > > > > I've seen German Wikimedians express concerns that quality assurance > > standards have dropped alarmingly since the project began, with bot users > > mass-importing unreliable data. > > > > > > > > > So do not forget about Wikipedia and the lessons learned. These lessons > > are > > > important to Wikipedia. However, they do not necessarily apply to > > Wikidata > > > particularly when you approach Wikidata as an opportunity to do things > > in a > > > different way. Set theory, a branch of mathematics, is exactly what we > > > need. When we have data at Wikidata of a given quality.. eg 90% and we > > have > > > data at another source with a given quality eg 90%, we can compare the > > two > > > and find a subset where the two sources do not match. When we curate > the > > > differences, it is highly likely that we improve quality at Wikidata or > > at > > > the other source. > > > > > > > > This sounds like "Let's do it quick and dirty and worry about the > problems > > later". > > > > I sometimes get the feeling software engineers just love a programming > > challenge, because that's where they can hone and display their skills. > > Dirty data is one of those challenges: all the clever things one can do > to > > clean up the data! There is tremendous optimism about what can be done. > But > > why have bad data in the first place, starting with rubbish and then > > proving that it can be cleaned up a bit using clever software? > > > > The effort will make the engineer look good, sure, but there will always > be > > collateral damage as errors propagate before they are fixed. The > engineer's > > eyes are not typically on the content, but on their software. The content > > their bots and programs manipulate at times seems almost incidental, > > something for "others" to worry about – "others" who don't necessarily > > exist in sufficient numbers to ensure quality. > > > > In short, my feeling is that the engineering enthusiasm and expertise > > applied to Wikidata aren't balanced by a similar level of commitment to > > scholarship in generating the data, and getting them right first time. > > > > We've seen where that approach can lead with Wikipedia. Wikipedia hoaxes > > and falsehoods find their way into the blogosphere, the media, even the > > academic literature. The stakes with Wikidata are potentially much > higher, > > because I fear errors in Wikidata stand a good chance of being massively > > propagated by Google's present and future automated information delivery > > mechanisms, which are completely opaque. Most internet users aren't even > > aware to what extent the Google Knowledge Graph relies on anonymously > > compiled, crowdsourced data; they will just assume that if Google says > it, > > it must be true. > > > > In addition to honest mistakes, transcription errors, outdated info etc., > > the whole thing is a propagandist's wet dream. Anonymous accounts! > > Guaranteed identity protection! Plausible deniability! No legal > liability! > > Automated import and dissemination without human oversight! Massive > impact > > on public opinion![3] > > > > If information is power, then this provides the best chance of a power > grab > > humanity has seen since the invention of the newspaper. In the media > > landscape, you at least have right-wing, centrist and left-wing > > publications each presenting their version of the truth, and you know > who's > > publishing what and what agenda they follow. You can pick and choose, > > compare and contrast, read between the lines. We won't have that online. > > Wikimedia-fuelled search engines like Google and Bing dominate the > > information supply. > > > > The right to enjoy a pluralist media landscape, populated by players who > > are accountable to the public, was hard won in centuries past. Some > > countries still don't enjoy that luxury today. Are we now blithely giving > > it away, in the name of progress, and for the greater glory of > technocrats? > > > > I don't trust the way this is going. I see a distinct possibility that > > we'll end up with false information in Wikidata (or, rather, the Google > > Knowledge Graph) being used to "correct" accurate information in other > > sources, just because the Google/Wikidata content is ubiquitous. If you > > build circular referencing loops fuelled by spurious data, you don't > > provide access to knowledge, you destroy it. A lie told often enough etc. > > > > To quote Heather Ford and Mark Graham, "We know that the engineers and > > developers, volunteers and passionate technologists are often trying to > do > > their best in difficult circumstances. But there need to be better > attempts > > by people working on these platforms to explain how decisions are made > > about what is represented. These may just look like unimportant lines of > > code in some system somewhere, but they have a very real impact on the > > identities and futures of people who are often far removed from the > > conversations happening among engineers." > > > > I agree with that. The "what" should be more important than the "how", > and > > at present it doesn't seem to be. > > > > It's well worth thinking about, and having a debate about what can be > done > > to prevent the worst from happening. > > > > In particular, I would like to see the decision to publish Wikidata > under a > > CC0 licence revisited. The public should know where the data it gets > comes > > from; that's a basic issue of transparency. > > > > Andreas > > > > [1] > > > https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed > > [2] > > > > > http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-contro… > > [3] > > > > > http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-201… > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> >

Gerard Meijssen

7:25 p.m.

Hoi, I happen to work on Dukes of Friuli. Compare the data from Wikidata and the information by Reasonator based on the same item for one of them. https://tools.wmflabs.org/reasonator/?&q=2471519 https://www.wikidata.org/wiki/Q2471519 Wikidata is not informative, you have to work hard to get the information that Reasonator provides already for over a year. All kinds of additional services can easily be added like the QR code and the family tree. The Reasonator info can be easily seen in any language, just add the labels. Thanks, GerardM On 27 November 2015 at 20:14, Lila Tretikov <lila(a)wikimedia.org> wrote:

...

wrote:

Hoi, When a benefit is "Wikimedia specific" and thereby dismissed, you miss

much

of what is going on. Exactly because of this link most items are well defined as to what they are about. It is not perfect but it is good. Consequently Wikidata is able to link Wikipedia in any language to

sources

external to Wikipedia. This is a big improvement over linking external sources to a Wikipedia. The disambiguation of subjects is done at the Wikidata end. You make Wikidata to be a "default reference source". Given its current state, it is a bit much. Wikidata does not have the maturity to function

such. The best pointer to this fact is that 50% of all items has two or fewer statements. When you compare the quality of Wikipedias with what en.wp used to be you are comparing apples and oranges. The Myanmar Wikipedia is better

informed

on Myanmar than en.wp etc. When you qualify a Wikipedia as fascist, it does not follow that the data is suspect. Certainly when data in a source that you so easily dismiss is typically the same, there is not much meaning in what you say from a Wikidata point of view. I am thrilled that sources are so important to the Wikimedia movement and again, I am wondering what you hope to achieve by this pronouncement. Be realistic what is it that you want to achieve? Is quality important to

you

and, how do you define it and more importantly how do you want to achieve it. Have you seen the statistics on sources [1]? Then have a better look and you will find that real sources are mostly absent. Adding sources one statement at a time will not significantly improve quality because that

a numbers game and it is easier to achieve quality in a different way. When a librarian says that many sources copy each others data and that

this

is a problem, the bigger problem is missed. The bigger problem is not

where

they agree but where they disagree. Arguably they are the statements

where

quality is more likely an issue. Now ask your librarian what is likely to improve Wikidata more either find Sources for the statements that differ

find Sources where the statements agree. Wikidata is not authoritative

but

when our community starts researching such issues both Wikidata and other sources will improve rapidly their quality. This is not to say that in

the

end you want both Sources where sources agree and disagree. Then ask your librarian if there is a problem with missing data We can import data from sources and consequently be more informative or we do

not

import more data and people have to magically combine information that exists in many sources to get a composite view. We could see Wikidata as

place where data is combined and compared with other sources, Do tell

your

librarian that the process mentioned above should be iterative and it

will

be easily understood that comparing with just one additional source will improve the focus on likely issues even more. PS What does your librarian think when she knows that the Dutch National Library is inclined to provide us with software so that books can be ordered at Dutch libraries from Wikidata data (and by inference from Wikipedias)? When some see Wikidata as a source of reference, they will increasingly

served a better product. At this moment it is not good at all. When German Wikimedians have concerns about quality.WONDERFUL but what

have

they done to improve things? Do they apply Wikipedia standards and how

does

that help? You wonder why have "bad" data in the first place... Our data IS bad and there is not enough of it for it to be really useful. We can easily add more data and have a more useful result We can easily compare sources and ask people to concentrate on differences. However you can not tell me to add Sources to the data that I add. I will tell you to do it yourself. I

happy to improve on quality but on my terms, not yours. You mention the propagation of errors.. How would that work. You indicate that there are not enough people to fix all the issues. With bots like Kian, we have probability in adding data. We have people add data where

the

software is not certain. You doubt technology but you do not know where

are, what is already done. In short my feeling is that you do not know what you are talking about. There is real scholarship in the approach that I described, My take is in applying set theory. Kian is AI. For all I care yours is FUD. Your notion of accountability is one of a consumer, it is not the accountability needed for a project that is immature and is not at all

at a

stage where you should imply that it is good enough and that quality is assured. There are domains in Wikidata that I will not touch because in

opinion it is wrong in its principles. At the same time I know that it

can

be fixed in time and leave it at that, I disagree with Heather Ford and Mark Graham. As long as Wikidata does

not

have the power of a Reasonator, the data is just that. It does not make itself in information and consequently it is awful. When there is one

thing

the Wikidata engineers do not do, it is considering the use of the data

and

the workflows to improve the data and the quality. The data needs to be CC-0 because it is how we ensure that everybody will be happy and willing to participate. As more participation happens as

more > collaboration occurs we will see Wikidata increase in the amount of data > that it holds and at the same time we will see quality improve. > > Yes, Wikidata could do more in the way of adding sources to data. As long > as the "primary sources tool" does not add the sources it knows, what do > you expect from anybody else. > Thanks, > GerardM > > > [1] https://tools.wmflabs.org/wikidata-todo/stats.php?reverse > > > > On 27 November 2015 at 12:08, Andreas Kolbe <jayen466(a)gmail.com> wrote: > > > Gerard, > > > > On Tue, Nov 24, 2015 at 7:15 AM, Gerard Meijssen < > > gerard.meijssen(a)gmail.com> >

wrote:

> > > > > Hoi, > > > To start of, results from the past are no indications of results in the

future. It is the disclaimer insurance companies have to state in all

their > adverts in the Netherlands. When you continue and make it a

"theological"

> issue, you lose me because I am not of this faith, far from it.

Wikidata > is > > its own project and it is utterly dissimilar from Wikipedia.To start

> Wikidata has been a certified success from the start. The improvement

it > > brought by bringing all interwiki links together is enormous.That

alone

> > should be a pointer that Wikipedia think is not realistic. > > > > > These benefits are internal to Wikimedia and a completely separate

issue

from third-party re-use of Wikidata content as a default reference

source,

which is the issue of concern here. To continue, people have been importing data into Wikidata from the

start.

They are the statements you know and, it was possible to import them

from > Wikipedia because of these interwiki links. So when you call for

sources, > > it is fairly save to assume that those imports are supported by the > quality > > of the statements of the Wikipedias > > > > The quality of three-quarters of the 280+ Wikipedia language versions

> about at the level the English Wikipedia had reached in 2002. > > Even some of the larger Wikipedias have significant problems. The

Kazakh

> Wikipedia for example is controlled by functionaries of an oppressive > regime[1], and the Croatian one is reportedly[2] controlled by fascists > rewriting history (unless things have improved markedly in the Croatian > Wikipedia since that report, which would be news to me). The

Azerbaijani

> Wikipedia seems to have problems as well. > > The Wikimedia movement has always had an important principle: that all > content should be traceable to a "reliable source". Throughout the

first

decade of this movement and beyond, Wikimedia content has never been considered a reliable source. For example, you can't use a Wikipedia article as a reference in another Wikipedia article. Another important principle has been the disclaimer: pointing out to

people > that the data is anonymously crowdsourced, and that there is no

guarantee

of reliability or fitness for use. Both of these principles are now being jettisoned. Wikipedia content is considered a reliable source in Wikidata, and

Wikidata > content is used as a reliable source by Google, where it appears

without

> any indication of its provenance. This is a reflection of the fact that > Wikidata, unlike Wikipedia, comes with a CC0 licence. That decision

was,

I > understand, made by Denny, who is both a Google employee and a WMF

board

> member. > > The benefit to Google is very clear: this free, unattributed content

adds

value to Google's search engine result pages, and improves Google's

revenue > (currently running at about $10 million an hour, much of it from ads). > > But what is the benefit to the end user? The end user gets information

> undisclosed provenance, which is presented to them as authoritative,

even

though it may be compromised. In what sense is that an improvement for society? To me, the ongoing information revolution is like the 19th century industrial revolution done over. It created whole new categories of

abuse,

which it took a century to (partly) eliminate. But first, capitalists

had a > field day, and the people who were screwed were the common folk. Could

not try to learn from history? > and if anything, that is also where > they typically fail because many assumptions at Wikipedia are plain

wrong > > at Wikidata. For instance a listed building is not the organisation

the

> building is known for. At Wikidata they each need their own item and > associated statements. > > Wikidata is already a success for other reasons. VIAF no longer links

> Wikipedia but to Wikidata. The biggest benefit of this move is for

people > > who are not interested in English. Because of this change VIAF links > > through Wikidata to all Wikipedias not only en.wp. Consequently

people

may > find through VIAF Wikipedia articles in their own language through

their > > library systems. > > > > > At the recent Wikiconference USA, a Wikimedia veteran and professional > librarian expressed the view to me that > > * circular referencing between VIAF and Wikidata will create a

humongous

muddle that nobody will be able to sort out again afterwards, because – unlike wiki mishaps in other topic areas – here it's the most

authoritative

sources that are being corrupted by circular referencing; * third parties are using Wikimedia content as a *reference standard

*when > that was never the intention (see above). > > I've seen German Wikimedians express concerns that quality assurance > standards have dropped alarmingly since the project began, with bot

users

> mass-importing unreliable data. > > > > > So do not forget about Wikipedia and the lessons learned. These

lessons

> are > > important to Wikipedia. However, they do not necessarily apply to > Wikidata > > particularly when you approach Wikidata as an opportunity to do

things

> in a > > different way. Set theory, a branch of mathematics, is exactly what

> > need. When we have data at Wikidata of a given quality.. eg 90% and

> have > > data at another source with a given quality eg 90%, we can compare

the

two > and find a subset where the two sources do not match. When we curate

the > > differences, it is highly likely that we improve quality at Wikidata

the other source.

This sounds like "Let's do it quick and dirty and worry about the

problems

later". I sometimes get the feeling software engineers just love a programming challenge, because that's where they can hone and display their skills. Dirty data is one of those challenges: all the clever things one can do

clean up the data! There is tremendous optimism about what can be done.

But > why have bad data in the first place, starting with rubbish and then > proving that it can be cleaned up a bit using clever software? > > The effort will make the engineer look good, sure, but there will

always

collateral damage as errors propagate before they are fixed. The

engineer's > eyes are not typically on the content, but on their software. The

content

> their bots and programs manipulate at times seems almost incidental, > something for "others" to worry about – "others" who don't necessarily > exist in sufficient numbers to ensure quality. > > In short, my feeling is that the engineering enthusiasm and expertise > applied to Wikidata aren't balanced by a similar level of commitment to > scholarship in generating the data, and getting them right first time. > > We've seen where that approach can lead with Wikipedia. Wikipedia

hoaxes

and falsehoods find their way into the blogosphere, the media, even the academic literature. The stakes with Wikidata are potentially much

higher, > because I fear errors in Wikidata stand a good chance of being

massively

> propagated by Google's present and future automated information

delivery

> mechanisms, which are completely opaque. Most internet users aren't

even

aware to what extent the Google Knowledge Graph relies on anonymously compiled, crowdsourced data; they will just assume that if Google says

it, > it must be true. > > In addition to honest mistakes, transcription errors, outdated info

etc.,

the whole thing is a propagandist's wet dream. Anonymous accounts! Guaranteed identity protection! Plausible deniability! No legal

liability!

Automated import and dissemination without human oversight! Massive

impact

on public opinion![3] If information is power, then this provides the best chance of a power

grab

humanity has seen since the invention of the newspaper. In the media landscape, you at least have right-wing, centrist and left-wing publications each presenting their version of the truth, and you know

who's > publishing what and what agenda they follow. You can pick and choose, > compare and contrast, read between the lines. We won't have that

online.

> Wikimedia-fuelled search engines like Google and Bing dominate the > information supply. > > The right to enjoy a pluralist media landscape, populated by players

who

> are accountable to the public, was hard won in centuries past. Some > countries still don't enjoy that luxury today. Are we now blithely

giving

it away, in the name of progress, and for the greater glory of

technocrats? > > I don't trust the way this is going. I see a distinct possibility that > we'll end up with false information in Wikidata (or, rather, the Google > Knowledge Graph) being used to "correct" accurate information in other > sources, just because the Google/Wikidata content is ubiquitous. If you > build circular referencing loops fuelled by spurious data, you don't > provide access to knowledge, you destroy it. A lie told often enough

etc.

To quote Heather Ford and Mark Graham, "We know that the engineers and developers, volunteers and passionate technologists are often trying to

their best in difficult circumstances. But there need to be better

attempts > by people working on these platforms to explain how decisions are made > about what is represented. These may just look like unimportant lines

code in some system somewhere, but they have a very real impact on the identities and futures of people who are often far removed from the conversations happening among engineers." I agree with that. The "what" should be more important than the "how",

and

at present it doesn't seem to be. It's well worth thinking about, and having a debate about what can be

done

to prevent the worst from happening. In particular, I would like to see the decision to publish Wikidata

under a

CC0 licence revisited. The public should know where the data it gets

comes

from; that's a basic issue of transparency. Andreas [1]

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed

[2]

http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-contro…

[3]

http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-201…

Gerard Meijssen

7:38 p.m.

Hoi, Sources are important. When we do not have data at Wikidata and we add it from anywhere, we have the basis to do some good. At this time we do not really add source information. It is too cumbersome and as long as the "primary sources tool", an "official" tool does not do it, why bother? My point about sources is very much that when one source does not agree, there is a likely quality issue. When five sources agree, there is nothing that marks them as suspect and there is no reason why I would look for a Source for that statement anytime soon. When you work on quality, you do not care about sources that agree, you care about those that do not. When multiple sources copied each others data, it means that they all provide information. That is superior to wave hands, not being informative and not include missing data because of a lack of a Source We have to ask ourselves, what is our aim. To share in the sum of all knowledge and occasionally be wrong or keeping a lot of knowledge from the people and pretend that it is all correct Thanks, GerardM PS we are wiki based. On 27 November 2015 at 20:14, Lila Tretikov <lila(a)wikimedia.org> wrote:

...

wrote:

Hoi, When a benefit is "Wikimedia specific" and thereby dismissed, you miss

much

sources

informed

you

a numbers game and it is easier to achieve quality in a different way. When a librarian says that many sources copy each others data and that

this

is a problem, the bigger problem is missed. The bigger problem is not

where

they agree but where they disagree. Arguably they are the statements

where

quality is more likely an issue. Now ask your librarian what is likely to improve Wikidata more either find Sources for the statements that differ

find Sources where the statements agree. Wikidata is not authoritative

but

when our community starts researching such issues both Wikidata and other sources will improve rapidly their quality. This is not to say that in

the

not

import more data and people have to magically combine information that exists in many sources to get a composite view. We could see Wikidata as

place where data is combined and compared with other sources, Do tell

your

librarian that the process mentioned above should be iterative and it

will

served a better product. At this moment it is not good at all. When German Wikimedians have concerns about quality.WONDERFUL but what

have

they done to improve things? Do they apply Wikipedia standards and how

does

the

software is not certain. You doubt technology but you do not know where

at a

stage where you should imply that it is good enough and that quality is assured. There are domains in Wikidata that I will not touch because in

opinion it is wrong in its principles. At the same time I know that it

can

be fixed in time and leave it at that, I disagree with Heather Ford and Mark Graham. As long as Wikidata does

not

have the power of a Reasonator, the data is just that. It does not make itself in information and consequently it is awful. When there is one

thing

the Wikidata engineers do not do, it is considering the use of the data

and

the workflows to improve the data and the quality. The data needs to be CC-0 because it is how we ensure that everybody will be happy and willing to participate. As more participation happens as

wrote:

> > > > > Hoi, > > > To start of, results from the past are no indications of results in the

future. It is the disclaimer insurance companies have to state in all

their > adverts in the Netherlands. When you continue and make it a

"theological"

> issue, you lose me because I am not of this faith, far from it.

Wikidata > is > > its own project and it is utterly dissimilar from Wikipedia.To start

> Wikidata has been a certified success from the start. The improvement

it > > brought by bringing all interwiki links together is enormous.That

alone

> > should be a pointer that Wikipedia think is not realistic. > > > > > These benefits are internal to Wikimedia and a completely separate

issue

from third-party re-use of Wikidata content as a default reference

source,

which is the issue of concern here. To continue, people have been importing data into Wikidata from the

start.

They are the statements you know and, it was possible to import them

from > Wikipedia because of these interwiki links. So when you call for

> about at the level the English Wikipedia had reached in 2002. > > Even some of the larger Wikipedias have significant problems. The

Kazakh

Azerbaijani

> Wikipedia seems to have problems as well. > > The Wikimedia movement has always had an important principle: that all > content should be traceable to a "reliable source". Throughout the

first

people > that the data is anonymously crowdsourced, and that there is no

guarantee

of reliability or fitness for use. Both of these principles are now being jettisoned. Wikipedia content is considered a reliable source in Wikidata, and

Wikidata > content is used as a reliable source by Google, where it appears

without

> any indication of its provenance. This is a reflection of the fact that > Wikidata, unlike Wikipedia, comes with a CC0 licence. That decision

was,

I > understand, made by Denny, who is both a Google employee and a WMF

board

> member. > > The benefit to Google is very clear: this free, unattributed content

adds

value to Google's search engine result pages, and improves Google's

revenue > (currently running at about $10 million an hour, much of it from ads). > > But what is the benefit to the end user? The end user gets information

> undisclosed provenance, which is presented to them as authoritative,

even

abuse,

which it took a century to (partly) eliminate. But first, capitalists

had a > field day, and the people who were screwed were the common folk. Could

not try to learn from history? > and if anything, that is also where > they typically fail because many assumptions at Wikipedia are plain

wrong > > at Wikidata. For instance a listed building is not the organisation

the

> building is known for. At Wikidata they each need their own item and > associated statements. > > Wikidata is already a success for other reasons. VIAF no longer links

> Wikipedia but to Wikidata. The biggest benefit of this move is for

people > > who are not interested in English. Because of this change VIAF links > > through Wikidata to all Wikipedias not only en.wp. Consequently

people

may > find through VIAF Wikipedia articles in their own language through

humongous

muddle that nobody will be able to sort out again afterwards, because – unlike wiki mishaps in other topic areas – here it's the most

authoritative

sources that are being corrupted by circular referencing; * third parties are using Wikimedia content as a *reference standard

*when > that was never the intention (see above). > > I've seen German Wikimedians express concerns that quality assurance > standards have dropped alarmingly since the project began, with bot

users

> mass-importing unreliable data. > > > > > So do not forget about Wikipedia and the lessons learned. These

lessons

> are > > important to Wikipedia. However, they do not necessarily apply to > Wikidata > > particularly when you approach Wikidata as an opportunity to do

things

> in a > > different way. Set theory, a branch of mathematics, is exactly what

> > need. When we have data at Wikidata of a given quality.. eg 90% and

> have > > data at another source with a given quality eg 90%, we can compare

the

two > and find a subset where the two sources do not match. When we curate

the > > differences, it is highly likely that we improve quality at Wikidata

the other source.

This sounds like "Let's do it quick and dirty and worry about the

problems

clean up the data! There is tremendous optimism about what can be done.

always

collateral damage as errors propagate before they are fixed. The

engineer's > eyes are not typically on the content, but on their software. The

content

hoaxes

and falsehoods find their way into the blogosphere, the media, even the academic literature. The stakes with Wikidata are potentially much

higher, > because I fear errors in Wikidata stand a good chance of being

massively

> propagated by Google's present and future automated information

delivery

> mechanisms, which are completely opaque. Most internet users aren't

even

aware to what extent the Google Knowledge Graph relies on anonymously compiled, crowdsourced data; they will just assume that if Google says

it, > it must be true. > > In addition to honest mistakes, transcription errors, outdated info

etc.,

the whole thing is a propagandist's wet dream. Anonymous accounts! Guaranteed identity protection! Plausible deniability! No legal

liability!

Automated import and dissemination without human oversight! Massive

impact

on public opinion![3] If information is power, then this provides the best chance of a power

grab

who's > publishing what and what agenda they follow. You can pick and choose, > compare and contrast, read between the lines. We won't have that

online.

> Wikimedia-fuelled search engines like Google and Bing dominate the > information supply. > > The right to enjoy a pluralist media landscape, populated by players

who

> are accountable to the public, was hard won in centuries past. Some > countries still don't enjoy that luxury today. Are we now blithely

giving

it away, in the name of progress, and for the greater glory of

etc.

To quote Heather Ford and Mark Graham, "We know that the engineers and developers, volunteers and passionate technologists are often trying to

their best in difficult circumstances. But there need to be better

attempts > by people working on these platforms to explain how decisions are made > about what is represented. These may just look like unimportant lines

and

at present it doesn't seem to be. It's well worth thinking about, and having a debate about what can be

done

to prevent the worst from happening. In particular, I would like to see the decision to publish Wikidata

under a

CC0 licence revisited. The public should know where the data it gets

comes

from; that's a basic issue of transparency. Andreas [1]

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed

[2]

http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-contro…

[3]

http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-201…

Gergo Tisza

28 Nov 28 Nov

1:17 a.m.

On Fri, Nov 27, 2015 at 11:14 AM, Lila Tretikov <lila(a)wikimedia.org> wrote:

...

What I hear in email from Andreas and Liam is not as much the propagation of the error (which I am sure happens with some % of the cases), but the fact that the original source is obscured and therefore it is hard to identify and correct errors, biases, etc. Because if the source of error is obscured, that error is that much harder to find and to correct. In fact, we see this even on Wikipedia articles today (wrong dates of births sourced from publications that don't do enough fact checking is something I came across personally). It is a powerful and important principle on Wikipedia, but with content re-use it gets lost. Public domain/CC0 in combination with AI lands our content for slicing and dicing and re-arranging by others, making it something entirely new, but also detached from our process of validation and verification. I am curious to hear if people think it is a problem. It definitely worries me.

This conversation seems to have morphed into trying to solve some problems that we are speculating Google might have (no one here actually *knows* how the Knowledge Graph works, of course; maybe it's sensitive to manipulation of Wikidata claims, maybe not). That seems like an entirely fruitless line of discourse to me; if the problem exists, it is Google's problem to solve (since they are the ones in a position to tell if it's a real problem or not; not to mention they have two or three magnitudes more resources to throw at it than the Wikimedia movement would). Trying to make our content less free for fear that someone might misuse it is a shamefully wrong frame of mind for and organization that's supposed to be a leader of the open content movement, IMO.

Wil Sinclair

6:05 a.m.

Gergo, do you mind if people continue discussing this? I'm finding it very interesting and fruitful. I hadn't thought through these issues before, and there are likely to be others on this list who haven't either. Best! ,Wil On Fri, Nov 27, 2015 at 5:17 PM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:

...

On Fri, Nov 27, 2015 at 11:14 AM, Lila Tretikov <lila(a)wikimedia.org> wrote:

Gerard Meijssen

6:45 a.m.

Hoi, There is no problem considering these points. You go in a direction that has little to do with what we are and where we stand. Wikidata is a wiki. That implies that it does not have to be perfect. It implies that approaches are taken that arguably wacky and we will see in time how it pans out. For instance, "Frankfurt" "instance of" "big city", big city is a city that is over a certain size. The size is debatable and consequently it is really poor as a concept. from a Wikidata point of view. It can be inferred and therefore even redundant. Does it matter? Not really because in time "we" will see the light. Our data is incomplete. Arguably importing data enables us to share more of the sum of all knowledge to our users. A given percentage of all data is incorrect. However having no data is arguably 100% incorrect and 100% not in line with our goal of serving the sum of all knowledge. Quality is important so processes and workflows are exceedingly important to have. We lack in that department so far. But comparing external data sources like VIAF or DNB in an iterative way is obvious when you want to identify those items and statements that are suspect. The data in Wikidata makes it easy because we have spend considerable effort linking external sources first to Wikipedia and now to Wikidata. It is easy to mark items with issues using qualifiers on the external source ID and have a basis for such workflows and quality markers. When you make a point of external sources trusting Wikidata, these external sources may be consumers or they can be partners. When they are partners, we can provide RSS feeds informing of issues that have been found and they can do their curation on their data. When they are consumers we can still provide such an RSS but we do not know what they do with it, it is their problem more than it is ours. As I say so often, Wikidata is immature. It is silly to blindly trust Wikidata. It is largely based on Wikipedia and it has constructs of its own that we do not need/want in Wikidata. Big cities is one example. We have items because of interwiki links that are a mix of all kinds eg a listed building and an organisation. This is conceptually wrong at Wikidata and it needs to be split. This is where many Wikipedians become uncomfortable but hey, Wikidata does not tell them to rewrite their article. So yes you can continue with this point but it has little impact on Wikidata and when you think it should do consider what impact it has on Wikidata as a wiki. It is NOT an academic resource or a reference source perse. It is a wiki, it is allowed to be wrong particularly when it has proper workflows to improve quality. If anything THIS is where we can do with a lot more talk and preferably action. This is where Wikidata is obviously lacking and when we do have proper workflows in place, we do NOT need the dump that is the "primary sources" as this is the antithesis of a wiki and it prevents us from sharing available knowledge. Thanks, GerardM On 28 November 2015 at 07:05, Wil Sinclair <wllm(a)wllm.com> wrote:

...

On Fri, Nov 27, 2015 at 11:14 AM, Lila Tretikov <lila(a)wikimedia.org>

wrote:

> What I hear in email from Andreas and Liam is not as much the

propagation

> of the error (which I am sure happens with some % of the cases), but the > fact that the original source is obscured and therefore it is hard to > identify and correct errors, biases, etc. Because if the source of

error is

> obscured, that error is that much harder to find and to correct. In

fact,

> we see this even on Wikipedia articles today (wrong dates of births

sourced

> from publications that don't do enough fact checking is something I came > across personally). It is a powerful and important principle on

Wikipedia,

> but with content re-use it gets lost. Public domain/CC0 in combination

with

> AI lands our content for slicing and dicing and re-arranging by others, > making it something entirely new, but also detached from our process of > validation and verification. I am curious to hear if people think it is

problem. It definitely worries me.

This conversation seems to have morphed into trying to solve some

problems

that we are speculating Google might have (no one here actually *knows*

how

the Knowledge Graph works, of course; maybe it's sensitive to

manipulation

of Wikidata claims, maybe not). That seems like an entirely fruitless

line

of discourse to me; if the problem exists, it is Google's problem to

solve

(since they are the ones in a position to tell if it's a real problem or not; not to mention they have two or three magnitudes more resources to throw at it than the Wikimedia movement would). Trying to make our

content

less free for fear that someone might misuse it is a shamefully wrong

frame

of mind for and organization that's supposed to be a leader of the open content movement, IMO. _______________________________________________ Wikimedia-l mailing list, guidelines at:

https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines

Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

9:39 a.m.

On Sat, Nov 28, 2015 at 1:17 AM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:

...

Trying to make our content less free for fear that someone might misuse it is a shamefully wrong frame of mind for and organization that's supposed to be a leader of the open content movement, IMO.

Do you think there is something "shameful" about Wikipedia using the Creative Commons Attribution-ShareAlike 3.0 Unported License? And if that isn't shameful, why would it be shameful if Wikidata used the same licence? Attribution has a dual benefit: 1. It provides visibility for Wikimedia and the open content movement. 2. The public can see where the data comes from. What is shameful about that?

Gergő Tisza

10:13 a.m.

On Sat, Nov 28, 2015 at 1:39 AM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

There is nothing wrong with BY-SA per se; it's antithetical to the spirit of the free content movement to pick a license for the reason that it would prevent (some types of) reuse, which seemed to be where this conversation was heading. (Just like there is nothing wrong with the GFDL either, but picking it as a Commons image license for the reason that it is technically a free license but onerous enough to prevent reuse in practice would be wrong, IMO.) We have spent enough time to dissuade organizations from publishing content under NC and ND and similar licences because they were afraid of losing control over how it will be used; I'd rather we didn't do that ourselves. ("Shameful" was an unnecessarily confrontational choice of word; I apologize.) There is also the practical matter of facts not being copyrightable in the US, and non-zero CC licenses not being particularly useful for databases (what you want is something like the GPL Affero for databases and CC does not have such a license).

Andreas Kolbe

1:23 p.m.

On Sat, Nov 28, 2015 at 10:13 AM, Gergő Tisza <gtisza(a)gmail.com> wrote:

...

("Shameful" was an unnecessarily confrontational choice of word; I apologize.)

Thanks.

...

There is also the practical matter of facts not being copyrightable in the US, and non-zero CC licenses not being particularly useful for databases (what you want is something like the GPL Affero for databases and CC does not have such a license).

That hasn't stopped DBpedia and other open-content databases (the Paleobiology database for example[1]) from using CC licenses requiring attribution. DBpedia arguably had to, because its database is derived from Wikipedia, which has an attribution required, share-alike license: "DBpedia is derived from Wikipedia and is distributed under the same licensing terms as Wikipedia itself."[2] To the extent that Wikidata draws on Wikipedia, its CC0 license would appear to be a gross violation of Wikipedia's share-alike license requirement. The generation of data always has a social context. Knowing where data come from is a good thing. [1] https://creativecommons.org/weblog/entry/41216 [2] http://wiki.dbpedia.org/terms-imprint

Pete Forsyth

11:02 p.m.

On Sat, Nov 28, 2015 at 5:23 AM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

To the extent that Wikidata draws on Wikipedia, its CC0 license would appear to be a gross violation of Wikipedia's share-alike license requirement.

It's essential to also consider whether the factual information derived from Wikipedia (or any other copyrighted source) is subject to copyright. For instance, a biography might contain facts like "born in year" and "born in place" and "elected to XYZ position". I don't think facts like those are copyrightable in any jurisdiction. Perhaps there are copyrightable elements from Wikipedia that are brought into Wikidata, but I don't know offhand what they might be. The generation of data always has a social context. Knowing where data come

...

from is a good thing.

Knowing where data comes from is a good thing, yes; but "copyright holder" and "intellectual source" are not identical concepts. If the purpose is to preserve the integrity of a line of reasoning, copyright law is probably not a very good tool for that purpose. A related question was recently asked on the web site Quora; here's my answer for why CC0 is generally preferable for data sets. (I may update it with some of the points brought up here.) https://www.quora.com/Should-open-data-be-publised-with-CC0-instead-of-CC-BY -Pete [[User:Peteforsyth]]

Gergo Tisza

29 Nov 29 Nov

12:36 a.m.

On Sat, Nov 28, 2015 at 5:23 AM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

To the extent that Wikidata draws on Wikipedia, its CC0 license would appear to be a gross violation of Wikipedia's share-alike license requirement.

By the same logic, to the extent Wikipedia takes its facts from non-free external source, its free license would be a copyright violation. Luckily for us, that's not how copyright works. Statements of facts can not be copyrighted; large-scale arrangements of facts (ie. a full database) probably can, but CC does not prevent others from using them without attribution, just distributing them (again, it's like the GPL/Affero difference); there are sui generis database rights in some countries but not in the USA where both Wikipedia and most proprietary reusers/compatitors are located, so relying on neighbouring rights would not help there but cause legal uncertainty for reusers (e.g. OSM which has lots of legal trouble importing coordinates due to being EU-based). The generation of data always has a social context. Knowing where data come

...

from is a good thing.

You probably won't find any Wikipedian who disagrees; verifiability is one of the fundaments of the project. But something being good and using restrictive licensing to force others to do it are very different things.

Andreas Kolbe

2:10 p.m.

Gergo, On Sun, Nov 29, 2015 at 12:36 AM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:

...

By the same logic, to the extent Wikipedia takes its facts from non-free external source, its free license would be a copyright violation. Luckily for us, that's not how copyright works.

I'm aware that facts are not copyrightable. By the same logic, Wikidata being offered under a CC BY-SA license, say, would not prevent anyone from extracting facts -- knowledge -- from it, and it would enable Wikidata to import a lot of data it presently cannot, because of licence incompatibilities.

...

Statements of facts can not be copyrighted; large-scale arrangements of facts (ie. a full database) probably can, but CC does not prevent others from using them without attribution, just distributing them (again, it's like the GPL/Affero difference);

Distribution is the issue here – large-scale distribution and viral propagation of data with a well-documented potential for manipulation and error, in a way that makes the provenance of these data a closed book to the end user. Do you accept that this is a potential problem, and if so, how would you guard against it, if not through the licence?

...

there are sui generis database rights in some countries but not in the USA where both Wikipedia and most proprietary reusers/compatitors are located, so relying on neighbouring rights would not help there but cause legal uncertainty for reusers (e.g. OSM which has lots of legal trouble importing coordinates due to being EU-based).

It seems noteworthy that Freebase specifically said, with regard to loading structured data, "If a data source is under CC-BY, you can load it into Freebase as long as you provide attribution."[1] Wikidata practice seems to have taken a different path regarding licence compatibility, given its systematic imports from Wikipedia. Interestingly enough, it's been pointed out to me that Denny said in 2012,[2] ---o0o--- Alexrk2, it is true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will *provide* data that can be reused in the Wikipedias. And a CC0 source can be used by a Share-Alike project, be it either Wikipedia or OSM. But not the other way around. Do we agree on this understanding? --Denny Vrandečić (WMDE) <https://meta.wikimedia.org/wiki/User:Denny_Vrande%C4%8Di%C4%87_(WMDE)> ( talk <https://meta.wikimedia.org/wiki/User_talk:Denny_Vrande%C4%8Di%C4%87_(WMDE)>) 12:39, 4 July 2012 (UTC) ---o0o--- The key sentence here is "Wikidata does not plan to extract content out of Wikipedia at all." That doesn't seem to be how things have turned out, because today we have people on Wikidata raising alarms about mass imports from Wikipedia:[3] ---o0o--- Reliable Bot imports from wikipedias? In a Wikipedia discussion I came by chance across a link to the following discussion: - Wikidata:Project_chat/Archive/2015/10#STOP_with_bot_import <https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/10#STOP_with_bot_import> [...] To provide an outside perspective as Wikipedian (and a potential use[r] of WD in the future). I wholeheartedly agree with Snipre, in fact "bots [ar]e running wild" and the uncontrolled import of data/information from Wikipedias is one of the main reasons for some Wikipedias developing an increasingly hostile attitude towards WD and its usage in Wikipedias. *If* WD is ever to function as a central data storage for various Wikimedia projects and in particular Wikipedia as well (in analogy to Commons), *then* quality has to take the driver's seat over quantity. A central storage needs a much better data integrity than the projects using it, because one mistake in its data will multiply throughout the projects relying on WD, which may cause all sorts of problems. For crude comparison think of a virus placed on a central server than on a single client.The consequences are much more severe and nobody in their right mind would run the server with even less protection/restrictions than the client. Another thin[g] is, that if you envision users of other Wikimedia projects such as Wikipedia or even 3rd party external projects to eventually help with data maintenance when they start using WD, then you might find them rather unwilling to do so, if not enough attention is paid to quality, instead they probably just dump WD from their projects. In general all the advantages of the central data storage depend on the quality (reliability) of data. If that is not given to reasonable high degree, there is no point to have central data storage at all. All the great application become useless if they operate on false data.--Kmhkmh <https://www.wikidata.org/w/index.php?title=User:Kmhkmh&action=edit&redlink=1> (talk <https://www.wikidata.org/wiki/User_talk:Kmhkmh>) 12:00, 19 November 2015 (UTC) ---o0o--- (I was unaware of that post by Kmhkmh when I started contributing to this discussion, but it obviously echoes some of my own concerns.) I've been told on the German Wikipedia that the Wikidata CC0 licence has long been a controversial issue, subject to recurrent discussion, especially with regard to official population statistics in Europe, whose publishers often require attribution, making their wholesale import in Wikidata's CC0 environment problematic.[4] In reviewing these discussions, I couldn't help but be reminded of Flickrwashing schemes by some contributors' lines of thought: how -- via which intermediary steps -- can we get the info into our CC0 project without being seen to fall foul of the original publishers' licenses? As I understand it, the intent is to bully other data publishers into making their data available under CC0 as well. I understand this from an open-content perspective, and I can see how it might benefit Google's and other information platforms' bottom line, but I reiterate -- there are very, very significant downsides to having a central database subject to anonymous manipulation by all comers whose data is automatically propagated by major search engines. There are many autocratic regimes in the world today who spend a lot of money and effort to achieve this kind of uniform media response in their countries. In my opinion, it creates a significant vulnerability in the global information infrastructure. If, in more troubled times ahead, people are fed the same unattributed lie by all major online outlets, because they are all automatically propagating the content of Wikimedia's CC0 database, then this could potentially alter the course of history, and not in a good way. I am happy to hear ideas about how to address this that do not involve licensing. We need more transparency about data provenance. You may argue that Wikidata is still in its early days, and has nowhere near the amount of data, nowhere near the reach and impact today to justify such an effort. Maybe it never will, and I'm worrying for nothing. But we thought much the same about Wikipedia around the time of the Seigenthaler incident. Before we knew it, Wikipedia had become the world's dominant information resource, with increasing numbers of government officials, judges, journalists and academics happy to accept its word uncritically – in a way that horrifies most Wikipedians, who are well aware of the system's weaknesses. Last month for example the Wikipedian in Residence at NIOSH (National Institute for Occupational Safety and Health) said on Wikidata that he would "cringe" at the thought of using Wikipedia as a source and personally refrained from it:[5] ---o0o--- - As a note, I do semi-automated edits on my work account <https://www.wikidata.org/wiki/User:James_Hare_(NIOSH)>, and I plan on doing some as a volunteer as well. I don't use Wikipedia as a source (as a Wikipedian of 11 years, I cringe at the thought ;), but if any batch edits I do manage to screw something up despite my meticulous planning, please let me know immediately. I will take responsibility for my own messes. Harej <https://www.wikidata.org/wiki/User:Harej> (talk <https://www.wikidata.org/wiki/User_talk:Harej>) 17:38, 27 October 2015 (UTC) ---o0o--- If Wikidata were to acquire the global reach its makers and sponsors hope for, then we would have done well to build a robust system that minimises harm, and cannot become a victim of its own success. I propose that there is work to be done here. Coming back briefly to the legal licensing situation, it seems to be fairly complex even in the US, according to the relevant Wikilegal page on Meta[6], with much depending on the amount of material extracted, as you pointed out above. Things are more complicated still in the EU, given that European law protects databases created by EU citizens or residents (which includes a good number of Wikimedians), with that protection extending to "sweat of the brow" (unprotected in the US). EU law even prohibits the "repeated and systematic extraction" of "insubstantial parts of the contents" of a database (where the term "database" is defined broadly enough to include a Wikipedia). There's not much point in my saying more about the legal aspects of licensing; even the advice from the Foundation's legal professionals says it's rarely easy to predict how a court might rule under either EU or US law.[6] Andreas [1] http://wiki.freebase.com/wiki/License_compatibility [2] https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_d… [3] https://www.wikidata.org/wiki/Wikidata:Project_chat#Reliable_Bot_imports_fr… [4] https://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00178.html https://www.wikidata.org/w/index.php?title=Wikidata:General_disclaimer&… http://osdir.com/ml/general/2012-11/msg31088.html http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg03088.html https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Modifyi… https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Data_re… https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Archive… http://www.gossamer-threads.com/lists/wiki/foundation/450291#450291 https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/05#Populat… https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Data_ow… [5] https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&diff=p… [6] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights

Gerard Meijssen

2:55 p.m.

Hoi, It would be a gross violation of trust to bring Wikidata under a different license. When an external source is willing to share its data, it can do so. With explicit agreement we can copy data in from them in this way. Even when this is not possible for whatever reason, we can still contribute because we can compare data and on the basis of differences in existing data curate our data and enable them to share our findings. I am amused by your fear for manipulation. Yes, data can be manipulated but once we see it happen, we can take measures when it affects the data we hold. Provenance of data is at this stage something we at Wikidata wish for. Arguably it does not make sense to make it a priority for all of our data because it would stifle Wikidata and it is utterly against the wiki spirit. The best way to guard against manipulation is to cooperate widely and take any difference in data as serious. It is in the differences where we want to know why the differences and why they exist. Focussing on known issues helps us identify systemic issues and when we do we can expose such manipulation with proof. In this way we are using a SMART methodology. No I would never use the license as a weapon, it is how manipulation is justified. Importing data from Wikipedia is a sensible thing to do. Its data is relatively well known for its quality. It has its issues but its basis is NPOV. When people are alarmed about importing from Wikipedia, it tells us more of what they think of the quality of Wikipedia than of the quality of Wikidata. When people are alarmed because they cannot control it, ask yourself what is their problem and how do their arguments enable the notion of Wikidata as a wiki? When imported data is wrong, there are tools to remove content quite delicately. So identify an issue and it can be dealt with. When you argue that Wikidata cannot be used as a central storage. Fine, do not use it. In the mean time quality of specific sets of data is of higher quality than any Wikipedia. This is a proven fact. The question if Wikidata is useful as a central datarepository at this time can only be answered as NO when it means it is about all of Wikidata. When it is about specific subsets of data the answer is clearly yes. It is also obvious that as time goes on more subsets of data will be of a higher quality than any Wikipedia (when thinking in terms of sets of data - there will always items where a Wikipedia has an edge). FYI I am in contact with a German university that is likely to use Wikidata internally for its research data. It needs Reasonator type of functionality to make it useful. It wants to share its data with Wikidata and wants two way RSS feeds in order to include new information When we set up cooperatation with statistical offices, we CAN attribute easily by having bots import data on their behalf using THEIR user id and adding sources to the new data. We can also provide data from their website in applications.. It is not the license that means anything it is what we agree to do. When we have sourced data in this way, you are silly to change it. False attributions are not permitted under any license. When we are afraid about a Seigenthaler type of event based on Wikidata, rest assured there is plenty wrong in either Wikipedia or Wikidata tha makes it possible for it to happen. The most important thing is to deal with it responsibly. Just being afraid will not help us in any way. Yes we need quality and quantity. As long as we make a best effort to improve our data, we will do well. As to the Wikipedian is residence, that is his opinion. At the same time the article on ebola has been very important. It may not be science but it certainly encyclopaedic. At the same time this Wikipedian in residence is involved, makes a positive contribution and while he may make mistakes he is part of the solution. I am happy that you propose that work is to be done. What have you done but more importantly what are you going to do? For me there is "Number of edits: 2,088,923" <https://www.wikidata.org/wiki/Special:Contributions/GerardM> Thanks, GerardM On 29 November 2015 at 15:10, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Gergo, On Sun, Nov 29, 2015 at 12:36 AM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:

By the same logic, to the extent Wikipedia takes its facts from non-free external source, its free license would be a copyright violation. Luckily for us, that's not how copyright works.

has

lots of legal trouble importing coordinates due to being EU-based).

)

12:39, 4 July 2012 (UTC) ---o0o--- The key sentence here is "Wikidata does not plan to extract content out of Wikipedia at all." That doesn't seem to be how things have turned out, because today we have people on Wikidata raising alarms about mass imports from Wikipedia:[3] ---o0o--- Reliable Bot imports from wikipedias? In a Wikipedia discussion I came by chance across a link to the following discussion: - Wikidata:Project_chat/Archive/2015/10#STOP_with_bot_import < https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/10#STOP_wi…

[...] To provide an outside perspective as Wikipedian (and a potential use[r] of WD in the future). I wholeheartedly agree with Snipre, in fact "bots [ar]e running wild" and the uncontrolled import of data/information from Wikipedias is one of the main reasons for some Wikipedias developing an increasingly hostile attitude towards WD and its usage in Wikipedias. *If* WD is ever to function as a central data storage for various Wikimedia projects and in particular Wikipedia as well (in analogy to Commons), *then* quality has to take the driver's seat over quantity. A central storage needs a much better data integrity than the projects using it, because one mistake in its data will multiply throughout the projects relying on WD, which may cause all sorts of problems. For crude comparison think of a virus placed on a central server than on a single client.The consequences are much more severe and nobody in their right mind would run the server with even less protection/restrictions than the client. Another thin[g] is, that if you envision users of other Wikimedia projects such as Wikipedia or even 3rd party external projects to eventually help with data maintenance when they start using WD, then you might find them rather unwilling to do so, if not enough attention is paid to quality, instead they probably just dump WD from their projects. In general all the advantages of the central data storage depend on the quality (reliability) of data. If that is not given to reasonable high degree, there is no point to have central data storage at all. All the great application become useless if they operate on false data.--Kmhkmh < https://www.wikidata.org/w/index.php?title=User:Kmhkmh&action=edit&…

(talk <https://www.wikidata.org/wiki/User_talk:Kmhkmh>) 12:00, 19 November 2015 (UTC) ---o0o--- (I was unaware of that post by Kmhkmh when I started contributing to this discussion, but it obviously echoes some of my own concerns.) I've been told on the German Wikipedia that the Wikidata CC0 licence has long been a controversial issue, subject to recurrent discussion, especially with regard to official population statistics in Europe, whose publishers often require attribution, making their wholesale import in Wikidata's CC0 environment problematic.[4] In reviewing these discussions, I couldn't help but be reminded of Flickrwashing schemes by some contributors' lines of thought: how -- via which intermediary steps -- can we get the info into our CC0 project without being seen to fall foul of the original publishers' licenses? As I understand it, the intent is to bully other data publishers into making their data available under CC0 as well. I understand this from an open-content perspective, and I can see how it might benefit Google's and other information platforms' bottom line, but I reiterate -- there are very, very significant downsides to having a central database subject to anonymous manipulation by all comers whose data is automatically propagated by major search engines. There are many autocratic regimes in the world today who spend a lot of money and effort to achieve this kind of uniform media response in their countries. In my opinion, it creates a significant vulnerability in the global information infrastructure. If, in more troubled times ahead, people are fed the same unattributed lie by all major online outlets, because they are all automatically propagating the content of Wikimedia's CC0 database, then this could potentially alter the course of history, and not in a good way. I am happy to hear ideas about how to address this that do not involve licensing. We need more transparency about data provenance. You may argue that Wikidata is still in its early days, and has nowhere near the amount of data, nowhere near the reach and impact today to justify such an effort. Maybe it never will, and I'm worrying for nothing. But we thought much the same about Wikipedia around the time of the Seigenthaler incident. Before we knew it, Wikipedia had become the world's dominant information resource, with increasing numbers of government officials, judges, journalists and academics happy to accept its word uncritically – in a way that horrifies most Wikipedians, who are well aware of the system's weaknesses. Last month for example the Wikipedian in Residence at NIOSH (National Institute for Occupational Safety and Health) said on Wikidata that he would "cringe" at the thought of using Wikipedia as a source and personally refrained from it:[5] ---o0o--- - As a note, I do semi-automated edits on my work account <https://www.wikidata.org/wiki/User:James_Hare_(NIOSH)>, and I plan on doing some as a volunteer as well. I don't use Wikipedia as a source (as a Wikipedian of 11 years, I cringe at the thought ;), but if any batch edits I do manage to screw something up despite my meticulous planning, please let me know immediately. I will take responsibility for my own messes. Harej <https://www.wikidata.org/wiki/User:Harej> (talk <https://www.wikidata.org/wiki/User_talk:Harej>) 17:38, 27 October 2015 (UTC) ---o0o--- If Wikidata were to acquire the global reach its makers and sponsors hope for, then we would have done well to build a robust system that minimises harm, and cannot become a victim of its own success. I propose that there is work to be done here. Coming back briefly to the legal licensing situation, it seems to be fairly complex even in the US, according to the relevant Wikilegal page on Meta[6], with much depending on the amount of material extracted, as you pointed out above. Things are more complicated still in the EU, given that European law protects databases created by EU citizens or residents (which includes a good number of Wikimedians), with that protection extending to "sweat of the brow" (unprotected in the US). EU law even prohibits the "repeated and systematic extraction" of "insubstantial parts of the contents" of a database (where the term "database" is defined broadly enough to include a Wikipedia). There's not much point in my saying more about the legal aspects of licensing; even the advice from the Foundation's legal professionals says it's rarely easy to predict how a court might rule under either EU or US law.[6] Andreas [1] http://wiki.freebase.com/wiki/License_compatibility [2] https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_d… [3] https://www.wikidata.org/wiki/Wikidata:Project_chat#Reliable_Bot_imports_fr… [4] https://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00178.html https://www.wikidata.org/w/index.php?title=Wikidata:General_disclaimer&… http://osdir.com/ml/general/2012-11/msg31088.html http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg03088.html https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Modifyi… https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Data_re… https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Archive… http://www.gossamer-threads.com/lists/wiki/foundation/450291#450291 https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/05#Populat… https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Data_ow… [5] https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&diff=p… [6] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

1 Dec 1 Dec

2:30 p.m.

On Sun, Nov 29, 2015 at 2:55 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

So identify an issue and it can be dealt with.

The fact an issue *can* be dealt with does not mean that it *will* be dealt with. For example, in the post that opened this discussion a little over a week ago, you said: "At Wikidata we often find issues with data imported from a Wikipedia. Lists have been produced with these issues on the Wikipedia involved and arguably they do present issues with the quality of Wikipedia or Wikidata for that matter. So far hardly anything resulted from such outreach." These were your own words: "hardly anything resulted from such outreach." Wikimedia is three years into this project. If people produce lists of quality issues, that's great, but if nothing happens as a result, that's not so great. An example of this is available in this very thread. Three days ago I mentioned the issues with the Grasulf II of Friuli entries on Reasonator and Wikidata. I didn't expect that you or anyone else would fix them, and they haven't been, at the time of writing. You certainly could have fixed them -- you have made hundreds of edits on Wikidata since replying to that post of mine -- but you haven't. Adding new data is more satisfying than sourcing and improving an obscure entry. (If you're wondering why I didn't fix the entry myself, see the section "And to answer the obvious question …" in last month's Signpost op-ed.[1]) This problem is replicated across the Wikimedia universe. Wikimedia projects are run by volunteers. They work on what interests them, or whatever they have an investment in. Fixing old errors is not as appealing as importing 2 million items of new data (including tens or hundreds of thousands of erroneous ones), because fixing errors is slow work. It retards the growth of your edit count! You spend one hour researching a date, and all you get for that effort is one lousy edit in your contributions history. There are plenty of tasks allowing you to rack up 500 edits in 5 minutes. People seem to prefer those. That is why Wikipedia has the familiar backlogs in areas like copyright infringement or AfC. Even warning templates indicating bias or other problematic content often sit for years without being addressed. There is a systemic mismatch between data creation and data curation. There is a lot of energy for the former, and very little energy for the latter. That is why initiatives like the one started by WMF board member James Heilman and others, to have the English Wikipedia's medical articles peer-reviewed, are so important. They are small steps in the right direction.

...

When we are afraid about a Seigenthaler type of event based on Wikidata, rest assured there is plenty wrong in either Wikipedia or Wikidata tha makes it possible for it to happen. The most important thing is to deal with it responsibly. Just being afraid will not help us in any way. Yes we need quality and quantity. As long as we make a best effort to improve our data, we will do well.

That's "eventualism". "Quality is terrible, but eventually it will be great, because ... we're all trying, and it's a wiki!" To me that sounds more like religious faith or magical thinking than empirical science. Things being on a wiki does not guarantee quality; far from it.[2][3][4][5]

...

As to the Wikipedian is residence, that is his opinion. At the same time the article on ebola has been very important. It may not be science but it certainly encyclopaedic. At the same time this Wikipedian in residence is involved, makes a positive contribution and while he may make mistakes he is part of the solution. I am happy that you propose that work is to be done. What have you done but more importantly what are you going to do? For me there is "Number of edits: 2,088,923" <https://www.wikidata.org/wiki/Special:Contributions/GerardM>

I will do what I can to encourage Wikimedia Foundation board members and management to review the situation, in consultation with outside academics like those at the Oxford Internet Institute who are concerned about present developments, and to consider whether more stringent sourcing policies are required for Wikidata in order to assure the quality and traceability of data in the Wikidata corpus. The public is the most important stakeholder in this, and should be informed and involved. If there are quality issues, the Wikimedia Foundation should be completely transparent about them in its public communications, neither minimising nor exaggerating the issues. Known problems and potential issues should be publicised as widely as possible in order to minimise the harm to society resulting from uncritical reuse of faulty data. I have started to reach out to scholars and journalists, inviting them to review this thread as well as related materials, and form their own conclusions. I may write an op-ed about it in the Signpost, because I believe it's an important issue that deserves wider attention and debate. As far as my own contributions are concerned, I am more inclined to boycott Wikidata. Apart from all the issues discussed over the past few days, there is another aspect to my reluctance to contribute to Wikidata. The Knowledge Graph is a major new Google feature. It adds value to Google's search engine results pages. It stops people from clicking through to other sources, including Wikipedia. The recent downturn in Wikipedia pageviews has been widely linked to the Knowledge Graph. By ensuring that more people visit Google's ad-filled pages, and stay on them rather than clicking through to other sites, the Knowledge Graph is at least partly responsible for recent increases in Google's revenue, which currently stands at around $200 million a day.[6] (Income after expenses is about a third of that, i.e. $65 million.) The development of Wikidata was co-funded by Google, which I understand donated 325,000 Euros (about $345,000) to that effort.[8] A little bit of arithmetic shows that, with Google's profits running at $65 million a day, it takes Google less than 8 minutes to earn that amount of money. Given how much Google stands to benefit from this development, it seems a paltry investment. This set me thinking. If we assume that Wikipedia's and Wikidata's contribution to Google's annual revenue via the Knowledge Graph is just 1/365 – the revenue of one day per year – the monetary value of these projects to Google is still astronomical. There have been around 2.5 billion edits to Wikimedia projects to date.[7] If Google chose to give one day's revenue each year to Wikimedia volunteers, as a thank-you, this would average out at about 200,000,000 / 2,500,000,000 = 8 cents per edit. Someone like Koavf, who's made 1.5 million edits[9], would stand to receive around $120,000 a year. Even my paltry 50,000 edits would net me about $4,000 a year. That's the value of free content. And that's just Google. Other major players like Facebook and Bing profit, too. Wikidata seems custom-made to benefit Google and Microsoft, at the expense of Wikipedia and other sites. Given my other commitments to Wikimedia projects, the limited number of hours in a day, and all the other concerns mentioned in this thread, I feel little inclined at present to further expand my volunteering in order to work for these multi-billion dollar corporations for free. [1] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed [2] http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-bus… [3] http://www.salon.com/2013/05/17/revenge_ego_and_the_corruption_of_wikipedia/ [4] https://www.washingtonpost.com/news/the-intersect/wp/2015/04/15/the-great-w… [5] http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-contro… [6] https://investor.google.com/earnings/2015/Q2_google_earnings.html [7] https://tools.wmflabs.org/wmcounter/ [8] https://www.wikimedia.de/wiki/Pressemitteilungen/PM_3_12_Wikidata_EN [9] https://www.washingtonpost.com/news/the-intersect/wp/2015/07/22/you-dont-kn…

Gerard Meijssen

4:16 p.m.

Hoi, <grin> I do work on quality issues. I blog about them. I work towards implementing solutions. </grin> I have fixed quite a few errors in Wikidata and I do not rack up as many edits as I could because of it. In the mean time with your "I do not want to be involved attitude" you are the proverbial sailor who stays on shore. It is your option to get your hands dirty or not. However, a friend of mine mentioned this attitude and compared it to the people who said that Wikipedia would never work. That is fine so I will just move on away from many of your arguments.. I do not care about profit. I have over 2 million edits on Wikidata alone and I have a few others on other projects as well. They may, it is implicit in the license make a profit. The point is that as more data is freed, it will free more data. With more free data we can inform more people. We can share more of the sum of all available knowledge. I wonder, there are many ways in which quality can be improved and all you do is refer to others. Why should I bother with your arguments when they are not yours and when you do not show how to make a difference? My arguments are plausible and I actively work towards getting them implemented. I do not need to convince people to do my work. The only thing I want to do is ask people for their support so that we get sooner to the stage where we will share in the sum of all available knowledge, something we do not really do at this stage. Thanks, GerardM On 1 December 2015 at 15:30, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Sun, Nov 29, 2015 at 2:55 PM, Gerard Meijssen < gerard.meijssen(a)gmail.com> wrote:

So identify an issue and it can be dealt with.

need quality and quantity. As long as we make a best effort to improve

our

data, we will do well.

As to the Wikipedian is residence, that is his opinion. At the same time the article on ebola has been very important. It may not be science but

certainly encyclopaedic. At the same time this Wikipedian in residence is involved, makes a positive contribution and while he may make mistakes he is part of the solution. I am happy that you propose that work is to be done. What have you done

but

more importantly what are you going to do? For me there is "Number of edits: 2,088,923" <https://www.wikidata.org/wiki/Special:Contributions/GerardM>

Andreas Kolbe

2 Dec 2 Dec

12:39 a.m.

On Tue, Dec 1, 2015 at 4:16 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

In the mean time with your "I do not want to be involved attitude" you are

the proverbial sailor who stays on shore. Well, me and 99.9999 percent of the global population. Not everyone has to contribute to Wikidata. :)

...

My arguments are plausible and I actively work towards getting them implemented. I do not need to convince people to do my work. The only thing I want to do is ask people for their support so that we get sooner to the stage where we will share in the sum of all available knowledge, something we do not really do at this stage.

Thanks for the spirited debate, and good luck to you, Gerard. May your efforts be fruitful. Andreas

Andreas Kolbe

28 Nov 28 Nov

6:23 p.m.

Gerard, On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote: When you compare the quality of Wikipedias with what en.wp used to be you

...

are comparing apples and oranges. The Myanmar Wikipedia is better informed on Myanmar than en.wp etc.

Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages at the time of writing, covering (or trying to cover) all countries of the world, and all aspects of human knowledge.[1] The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages within its purview.[2] I dare say that's more articles on Myanmar than the Burmese Wikipedia contains. As an indication, the English Wikipedia's article on Myanmar is more than twice as long as the one in the Burmese Wikipedia. Moreover, according to Freedom House[3], the internet in Myanmar is not free: "The government detained and charged internet users for online activities [...] Government officials pressured social media users not to distribute or share content that offends the military, or disturbs the functions of government."

...

When you qualify a Wikipedia as fascist, it does not follow that the data is suspect. Certainly when data in a source that you so easily dismiss is typically the same, there is not much meaning in what you say from a Wikidata point of view.

Data are always generated within a social context, and data generated by political extremists or people living under oppressive regimes are suspect whenever they have political implications. (Looking at the descriptions of Burmese politics, my feeling is the Burmese Wikipedia is not under significant government control, but largely written by ex-pats. However, the situation is quite different in some other Wikipedias serving countries labouring under similar regimes.)

...

PS What does your librarian think when she knows

It was a he, but I'll leave him to join in himself if he chooses to. I happen to work on Dukes of Friuli. Compare the data from Wikidata and the

...

information by Reasonator based on the same item for one of them. https://tools.wmflabs.org/reasonator/?&q=2471519 https://www.wikidata.org/wiki/Q2471519

Let's look at this example. Reasonator says of Grasulf II of Friulim, "He died in 653". There is no source. Wikidata says he died in 653, and the indicated source is the Italian Wikipedia. However, when you look at the (very brief) Italian Wikipedia article[4], you will find that the year 653 is given with a question mark. The English Wikipedia, in contrast, states, in its similarly brief article[5], "Nothing more is known about Grasulf and the date of his death is uncertain." Do you now see the problem about nuance? Reasonator and Wikidata confidently proclaim as uncontested fact something that in fact is rather uncertain. The sole source cited by both the English and the Italian Wikipedia is the Historia Langobardorum, available in Wikisource.[6] My Latin is a bit rusty, but while the Historia mentions that Ago succeeded Grasulf upon the latter's death, it says nothing specific about when that was. The Historia's time indications are in general very vague, usually limited to the phrase "Circa haec tempora", meaning "about this time". So it is in this case. For reference, the Google Knowledge Graph states equally confidently that Grasulf II of Friuli died in 651AD. This may be based on the English Wikipedia's unsourced claim (in the template at the bottom of the English Wikipedia article) that his reign ended c. 651, or on some other source like Freebase. The other Wikipedias that have articles on Grasulf II provide the following death dates Catalan: 651 Galician: 653 Lithuanian: 653 Polish: 651 Romanian: Unknown Russian: 653 Ukrainian: 651 As for published sources, I can offer Ersch's Allgemeine Encyclopädie (1849), which states on page 209 that Grasulf II died in 651.[7] The extreme vagueness of the available dates is pointed out by Thomas Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts the end of Grasulf's reign at 645, "as a mere random guess", and adds that "De Rubeis, following Sigonius", puts the accession of Ago in 661.[8] There may well be better and more recent sources beyond my reach, but having these published dates in Wikidata, with the source references, would actually make some sense. Unsourced data, not so much. Answers are comfortable, but they are not knowledge when they are unverifiable and/or wrong. [1] https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles [2] https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessm… [3] https://freedomhouse.org/report/freedom-net/2015/myanmar [4] https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid… [5] https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=6… [6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV [7] https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasul… [8] https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%…

Ed Erhart

7:17 p.m.

On the very specific point of knowledge and how it's not always possible to boil it down to a single quantifiable value, I couldn't agree more. Thank you, Andreas, for the detailed anecdote displaying that problem, and I'll be happy to provide more if needed. Does Wikidata have a way of marking data entries as estimates, or at least dates as circa (not just unknown)? --Ed On Nov 28, 2015 1:24 PM, "Andreas Kolbe" <jayen466(a)gmail.com> wrote:

...

Gerard, On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote: When you compare the quality of Wikipedias with what en.wp used to be you

are comparing apples and oranges. The Myanmar Wikipedia is better

informed

on Myanmar than en.wp etc.

PS What does your librarian think when she knows

It was a he, but I'll leave him to join in himself if he chooses to. I happen to work on Dukes of Friuli. Compare the data from Wikidata and the

information by Reasonator based on the same item for one of them. https://tools.wmflabs.org/reasonator/?&q=2471519 https://www.wikidata.org/wiki/Q2471519

Rob

8:05 p.m.

That male librarian here. I think we need to encourage people to add more and conflicting data to Wikidata, and to cite their sources when they do so. Currently it's not particularly easy to cite your sources on Wikidata. So the end result is that it encourages people to view whatever single uncited bit of data appears there as the one true fact. On Sat, Nov 28, 2015 at 2:17 PM, Ed Erhart <the.ed17(a)gmail.com> wrote:

...

Gerard, On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote: When you compare the quality of Wikipedias with what en.wp used to be you

are comparing apples and oranges. The Myanmar Wikipedia is better

informed

on Myanmar than en.wp etc.

PS What does your librarian think when she knows

It was a he, but I'll leave him to join in himself if he chooses to. I happen to work on Dukes of Friuli. Compare the data from Wikidata and the

information by Reasonator based on the same item for one of them. https://tools.wmflabs.org/reasonator/?&q=2471519 https://www.wikidata.org/wiki/Q2471519

geni

29 Nov 29 Nov

12:30 p.m.

On 28 November 2015 at 19:17, Ed Erhart <the.ed17(a)gmail.com> wrote:

...

Yes https://www.wikidata.org/wiki/Property:P1317 however a quick comparison between the English Wikipedia and wikidata suggests it isn't used very much. Of course there are a bunch of other issues. It gives dates for Egyptian Pharaohs without saying what chronology it is using. It keeps claiming dates are Gregorian without showing any conversion has actually taken place (wikipedians tend to be pretty poor when it comes to such conversions since they require a fair bit of background knowledge. For example depending on the year and writer the year in England can start on the 1st of January, 25th March or the first day of advent). Wikidata doesn't do very well on carbon dating either. If we look at Ötzi https://www.wikidata.org/wiki/Q171291 We again get dates with no indication of the calibration used. Really this would be better handled using the uncalibrated C14 numbers (4550 ± 27BP http://digitalcommons.library.arizona.edu/objectviewer?o=http%3A%2F%2Fradio… ) and then adding enough information for the correct calibration curve to be selected (Northern hemisphere land based which at the moment probably means INTCAL13) -- geni

Gerard Meijssen

12:37 a.m.

Hoi, It was from the Myanmar WIkipedia that a lot of data was imported to Wikidata. Data that did not exist elsewhere. I do not care really what "Freedom House" says. I do not know them, I do know that the data is relevant and useful It was even the subject on a blogpost.. You may ignore data that is not from a source that you like. This indiscriminate POV is not a NPOV. As to Grasulf, you failed to get the point. It was NOT about the data itself but about the presentation. I worked on this item because a duplicate was created with even less data. While I happily agree that Sources are good, I will not ask people to start adding Sources at this point of time it will not improve quality signifcantly. It makes more sense once we are at a stage where multiple sources disagree on values for statements. Adding sources is signifcantly more meaningful and useful once we start curating data. Statistically most errors will be found where sources disagree. When people add conflicting data, it is indeed really relevant to add Sources. My practice for adding data is that I will only add data that fulfils some minimal criteria. Typically I am not interested in adding data that already exists. I will remove less precise for more precise data. The biggest issue with data is that we do not have enough of it and the second most relevant issue is that we need processes to compare sources with Wikidata and have a workflow to curate differences. Thanks, GerardM On 28 November 2015 at 19:23, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Gerard, On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote: When you compare the quality of Wikipedias with what en.wp used to be you

are comparing apples and oranges. The Myanmar Wikipedia is better

informed

on Myanmar than en.wp etc.

PS What does your librarian think when she knows

It was a he, but I'll leave him to join in himself if he chooses to. I happen to work on Dukes of Friuli. Compare the data from Wikidata and the

information by Reasonator based on the same item for one of them. https://tools.wmflabs.org/reasonator/?&q=2471519 https://www.wikidata.org/wiki/Q2471519

Gnangarra

1:05 a.m.

...

While I happily agree that Sources are good, I will not ask people to start adding Sources at this point of time it will not improve quality signifcantly. It makes more sense once we are at a stage where multiple sources disagree on values for statements. Adding sources is signifcantly more meaningful and useful once we start curating data.

...

Gerard, On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote: When you compare the quality of Wikipedias with what en.wp used to be you

are comparing apples and oranges. The Myanmar Wikipedia is better

informed

on Myanmar than en.wp etc.

Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages

the time of writing, covering (or trying to cover) all countries of the world, and all aspects of human knowledge.[1] The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages within its purview.[2] I dare say that's more articles on Myanmar than

the

Burmese Wikipedia contains. As an indication, the English Wikipedia's article on Myanmar is more than twice as long as the one in the Burmese Wikipedia. Moreover, according to Freedom House[3], the internet in Myanmar is not free: "The government detained and charged internet users for online activities [...] Government officials pressured social media users not to distribute or share content that offends the military, or disturbs the functions of government." > When you qualify a Wikipedia as fascist, it does not follow that the

data

> is suspect. Certainly when data in a source that you so easily dismiss

typically the same, there is not much meaning in what you say from a Wikidata point of view.

Data are always generated within a social context, and data generated by political extremists or people living under oppressive regimes are

suspect

whenever they have political implications. (Looking at the descriptions

Burmese politics, my feeling is the Burmese Wikipedia is not under significant government control, but largely written by ex-pats. However, the situation is quite different in some other Wikipedias serving

countries

labouring under similar regimes.)

PS What does your librarian think when she knows

It was a he, but I'll leave him to join in himself if he chooses to. I happen to work on Dukes of Friuli. Compare the data from Wikidata and

the

information by Reasonator based on the same item for one of them. https://tools.wmflabs.org/reasonator/?&q=2471519 https://www.wikidata.org/wiki/Q2471519

English

Wikipedia, in contrast, states, in its similarly brief article[5], "Nothing more is known about Grasulf and the date of his death is uncertain." Do you now see the problem about nuance? Reasonator and Wikidata confidently proclaim as uncontested fact something that in fact is rather uncertain. The sole source cited by both the English and the Italian Wikipedia is

the

Historia Langobardorum, available in Wikisource.[6] My Latin is a bit rusty, but while the Historia mentions that Ago succeeded Grasulf upon

the

latter's death, it says nothing specific about when that was. The Historia's time indications are in general very vague, usually limited to the phrase "Circa haec tempora", meaning "about this time". So it is in this case. For reference, the Google Knowledge Graph states equally confidently that Grasulf II of Friuli died in 651AD. This may be based on the English Wikipedia's unsourced claim (in the template at the bottom of the English Wikipedia article) that his reign ended c. 651, or on some other source like Freebase. The other Wikipedias that have articles on Grasulf II provide the

following

death dates Catalan: 651 Galician: 653 Lithuanian: 653 Polish: 651 Romanian: Unknown Russian: 653 Ukrainian: 651 As for published sources, I can offer Ersch's Allgemeine Encyclopädie (1849), which states on page 209 that Grasulf II died in 651.[7] The extreme vagueness of the available dates is pointed out by Thomas Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts the

end

of Grasulf's reign at 645, "as a mere random guess", and adds that "De Rubeis, following Sigonius", puts the accession of Ago in 661.[8] There may well be better and more recent sources beyond my reach, but having these published dates in Wikidata, with the source references,

would

actually make some sense. Unsourced data, not so much. Answers are comfortable, but they are not knowledge when they are unverifiable and/or wrong. [1]

https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles

[2]

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessm…

[3] https://freedomhouse.org/report/freedom-net/2015/myanmar [4]

https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid…

[5]

https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=6…

[6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV [7]

https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasul…

[8]

https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%…

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Gerard Meijssen

9:42 a.m.

Hoi, Wikidata is a wiki and, you seem to always forget that. The corruption of data .. how? Each statement is its own data item how do you corrupt that? As I say so often, when you get a collection that is 80% correct you have an error rate of 20%. When you do not include that data you have an error rate of 100%. When you have an other source that is 90% correct that has similar data and you have an overlap of 50%, you can be smart and at the start or later compare the data and curate.. When you only import at the start what is the same, you probably get something like 84% correct data imported. You can gamify the rest but however you slice it, what you do not have and could have is 100% wrong. Wikidata is NOT Wikipedia. It is much easier to curate data and consequently your argument is FUD. The big thing we have not learned is cooperation. We do not cooperate. We do not have per standard RSS feeds for the changes to the items that belong to a specific source. We are happy to get data but we do not reach out and give back. For me the fact that VIAF uses Wikidata as a link is an opportunity to do better. The German DNB cooperation are the projects that we should emulate. When you talk about quality, you talk in an insular fashion. We have to do it, our community. At Wikidata our community can include other organisations with rich collections of data with high quality. We can share, compare, curate. Even with our current low quality, we have subsets of data that shine. Subsets of data that our of at least the same quality as Wikipedia. However this quality is often marred with a lack of quantity, quantity we can have when collaboration is what we do. You are afraid of our reputation. Reputation has many aspects. Jane023 presented at the Dutch Wikimedia conference. She uses a tool that is easier on her because no Wikipedians bother her because it is a Wikidata based list. A similar list is now used for its quality on the Welsh Wikipedia. The data is of a quality that Google actually uses it as she reported. When I see the religious application of Wikipedia sentiments. I find that we do not even care for the life of one of our own. Bassel is executed or likely to be executed soon and some think our neutrality is so important. FOR WHAT? So that we may not even protect our own? Is it right to protest against TTIP (and we should) and not protest for a Wikipedian that embodies our values? Wikipedia think is not applicable at this stage for Wikidata. Its quality is arguably piss poor but better in places. Many items are corrupt because they follow the structure of Wikipedia articles. A structure Wikipedians insist on because they wrote that article and "Wikidata is only a service project". I do agree that we need more quality. My approach has set theory on its side, it embodies the wiki approach and yours is one where Pallas Athena is to rise from the brain of Zeus in full armour. You may have noticed that my arguments are easy to follow and conform to something that is measurable. Yours is private, there is no possibility to verify the accuracy of your argument. I call bullshit on your argument, not because you do not make a fine argument but because it is an argument that prevents us from improving Wikidata. My hope is that we can work constructively on our quality and have a measurable effect. Thanks, GerardM On 29 November 2015 at 02:05, Gnangarra <gnangarra(a)gmail.com> wrote:

...

While I happily agree that Sources are good, I will not ask people to

start

adding Sources at this point of time it will not improve quality signifcantly. It makes more sense once we are at a stage where multiple sources disagree on values for statements. Adding sources is signifcantly more meaningful and useful once we start curating data.

the problems will that by the time Wikidata starts to curate data it'll will have corrupted that data with its own data, and secondly past experience with wiki's is that fixing data after its been entered is actually harder and more time consuming to do, along with the fact that the damage to reputation will have a lasting impact and fixing that consumes millions of dollars in Donner money.. As said earlier there are lesson in the development of Wikipedia that should be heeded in an attempt to avoid those same pitfalls On 29 November 2015 at 08:37, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote: > Hoi, > It was from the Myanmar WIkipedia that a lot of data was imported to > Wikidata. Data that did not exist elsewhere. I do not care really what > "Freedom House" says. I do not know them, I do know that the data is > relevant and useful It was even the subject on a blogpost.. > > You may ignore data that is not from a source that you like. This > indiscriminate POV is not a NPOV. > > As to Grasulf, you failed to get the point. It was NOT about the data > itself but about the presentation. I worked on this item because a > duplicate was created with even less data.

While I happily agree that Sources are good, I will not ask people to

start

most

errors will be found where sources disagree. When people add conflicting data, it is indeed really relevant to add Sources. My practice for adding data is that I will only add data that fulfils some minimal criteria. Typically I am not interested in adding

data

that already exists. I will remove less precise for more precise data. The biggest issue with data is that we do not have enough of it and the second most relevant issue is that we need processes to compare sources with Wikidata and have a workflow to curate differences. Thanks, GerardM On 28 November 2015 at 19:23, Andreas Kolbe <jayen466(a)gmail.com> wrote: > Gerard, > > > On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen(a)gmail.com>

wrote:

> > When you compare the quality of Wikipedias with what en.wp used to be

you

> > are comparing apples and oranges. The Myanmar Wikipedia is better > informed > > on Myanmar than en.wp etc. > > > > > Is it? The entire Burmese Wikipedia contains a mere 31,646 content

pages

the > Burmese Wikipedia contains. As an indication, the English Wikipedia's > article on Myanmar is more than twice as long as the one in the Burmese > Wikipedia. > > Moreover, according to Freedom House[3], the internet in Myanmar is not > free: > > "The government detained and charged internet users for online

activities

> [...] Government officials pressured social media users not to

distribute

> or share content that offends the military, or disturbs the functions

government." > When you qualify a Wikipedia as fascist, it does not follow that the

data > > is suspect. Certainly when data in a source that you so easily

dismiss

is > > typically the same, there is not much meaning in what you say from a > > Wikidata point of view. > > > > > Data are always generated within a social context, and data generated

political extremists or people living under oppressive regimes are

suspect

whenever they have political implications. (Looking at the descriptions

of > Burmese politics, my feeling is the Burmese Wikipedia is not under > significant government control, but largely written by ex-pats.

However,

the situation is quite different in some other Wikipedias serving

countries

labouring under similar regimes.)

PS What does your librarian think when she knows

It was a he, but I'll leave him to join in himself if he chooses to. I happen to work on Dukes of Friuli. Compare the data from Wikidata and

the > > information by Reasonator based on the same item for one of them. > > > > https://tools.wmflabs.org/reasonator/?&q=2471519 > > https://www.wikidata.org/wiki/Q2471519 > > > > > Let's look at this example. Reasonator says of Grasulf II of Friulim,

"He

> died in 653". There is no source. Wikidata says he died in 653, and the > indicated source is the Italian Wikipedia. > > However, when you look at the (very brief) Italian Wikipedia

article[4],

you will find that the year 653 is given with a question mark. The

English > Wikipedia, in contrast, states, in its similarly brief article[5], > > "Nothing more is known about Grasulf and the date of his death is > uncertain." > > Do you now see the problem about nuance? Reasonator and Wikidata > confidently proclaim as uncontested fact something that in fact is

rather

uncertain. The sole source cited by both the English and the Italian Wikipedia is

the

Historia Langobardorum, available in Wikisource.[6] My Latin is a bit rusty, but while the Historia mentions that Ago succeeded Grasulf upon

the > latter's death, it says nothing specific about when that was. The > Historia's time indications are in general very vague, usually limited

> the phrase "Circa haec tempora", meaning "about this time". So it is in > this case. > > For reference, the Google Knowledge Graph states equally confidently

that

> Grasulf II of Friuli died in 651AD. This may be based on the English > Wikipedia's unsourced claim (in the template at the bottom of the

English

Wikipedia article) that his reign ended c. 651, or on some other source like Freebase. The other Wikipedias that have articles on Grasulf II provide the

following

end

would

actually make some sense. Unsourced data, not so much. Answers are comfortable, but they are not knowledge when they are unverifiable and/or wrong. [1]

https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles

[2]

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessm…

[3] https://freedomhouse.org/report/freedom-net/2015/myanmar [4]

https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid…

[5]

https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=6…

[6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV [7]

https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasul…

[8]

https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%…

Jane Darnell

9:55 a.m.

Gerard, Thanks for highlighting my work! I already posted slides on Commons, but I want to flesh them out with links to actual edits so people can better understand some of these quality improvement workflows. The tools I use for lists are written mostly by the Wikidata "god" Magnus Manske and the tools I use on Commons are self-built kludges with the assistance of Commonist Vera de Kok. Here is an example of a quality improvement I did this morning for a file on Commons that was originally uploaded by an English Wikipedian who uploaded it with the default uploader for use in an English Wikipedia list. The improvements are coming from both the original edits of the uploader on Wikipedia as well as the associated Wikidata list: https://commons.wikimedia.org/w/index.php?title=File:Rembrandt_Man_with_a_F… Jane On Sun, Nov 29, 2015 at 10:42 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com

...

wrote:

> Hoi, > Wikidata is a wiki and, you seem to always forget that. > > The corruption of data .. how? Each statement is its own data item how do > you corrupt that? As I say so often, when you get a collection that is 80% > correct you have an error rate of 20%. When you do not include that data > you have an error rate of 100%. When you have an other source that is 90% > correct that has similar data and you have an overlap of 50%, you can be > smart and at the start or later compare the data and curate.. When you only > import at the start what is the same, you probably get something like 84% > correct data imported. You can gamify the rest but however you slice it, > what you do not have and could have is 100% wrong. > > Wikidata is NOT Wikipedia. It is much easier to curate data and > consequently your argument is FUD. The big thing we have not learned is > cooperation. We do not cooperate. We do not have per standard RSS feeds for > the changes to the items that belong to a specific source. We are happy to > get data but we do not reach out and give back. For me the fact that VIAF > uses Wikidata as a link is an opportunity to do better. The German DNB > cooperation are the projects that we should emulate. > > When you talk about quality, you talk in an insular fashion. We have to do > it, our community. At Wikidata our community can include other > organisations with rich collections of data with high quality. We can > share, compare, curate. Even with our current low quality, we have subsets > of data that shine. Subsets of data that our of at least the same quality > as Wikipedia. However this quality is often marred with a lack of quantity, > quantity we can have when collaboration is what we do. > > You are afraid of our reputation. Reputation has many aspects. Jane023 > presented at the Dutch Wikimedia conference. She uses a tool that is easier > on her because no Wikipedians bother her because it is a Wikidata based > list. A similar list is now used for its quality on the Welsh Wikipedia. > The data is of a quality that Google actually uses it as she reported. > > When I see the religious application of Wikipedia sentiments. I find that > we do not even care for the life of one of our own. Bassel is executed or > likely to be executed soon and some think our neutrality is so important. > FOR WHAT? So that we may not even protect our own? Is it right to protest > against TTIP (and we should) and not protest for a Wikipedian that embodies > our values? > > Wikipedia think is not applicable at this stage for Wikidata. Its quality > is arguably piss poor but better in places. Many items are corrupt because > they follow the structure of Wikipedia articles. A structure Wikipedians > insist on because they wrote that article and "Wikidata is only a service > project". > > I do agree that we need more quality. My approach has set theory on its > side, it embodies the wiki approach and yours is one where Pallas Athena is > to rise from the brain of Zeus in full armour. You may have noticed that my > arguments are easy to follow and conform to something that is measurable. > Yours is private, there is no possibility to verify the accuracy of your > argument. I call bullshit on your argument, not because you do not make a > fine argument but because it is an argument that prevents us from improving > Wikidata. > > My hope is that we can work constructively on our quality and have a > measurable effect. > Thanks, > GerardM > > On 29 November 2015 at 02:05, Gnangarra <gnangarra(a)gmail.com

...

wrote:

> > > > > > > While I happily agree that Sources are good, I will not ask people to > > start > > > adding Sources at this point of time it will not improve quality > > > signifcantly. It makes more sense once we are at a stage where multiple > > > sources disagree on values for statements. Adding sources is > signifcantly > > > more meaningful and useful once we start curating data. > > > > > > the problems will that by the time Wikidata starts to curate data it'll > > will have corrupted that data with its own data, and secondly past > > experience with wiki's is that fixing data after its been entered is > > actually harder and more time consuming to do, along with the fact that > the > > damage to reputation will have a lasting impact and fixing that consumes > > millions of dollars in Donner money.. As said earlier there are lesson in > > the development of Wikipedia that should be heeded in an attempt to avoid > > those same pitfalls > > > > > > On 29 November 2015 at 08:37, Gerard Meijssen <gerard.meijssen(a)gmail.com > > >

...

wrote:

> > > > > Hoi, > > > It was from the Myanmar WIkipedia that a lot of data was imported to > > > Wikidata. Data that did not exist elsewhere. I do not care really what > > > "Freedom House" says. I do not know them, I do know that the data is > > > relevant and useful It was even the subject on a blogpost.. > > > > > > You may ignore data that is not from a source that you like. This > > > indiscriminate POV is not a NPOV. > > > > > > As to Grasulf, you failed to get the point. It was NOT about the data > > > itself but about the presentation. I worked on this item because a > > > duplicate was created with even less data. > > > > > > While I happily agree that Sources are good, I will not ask people to > > start > > > adding Sources at this point of time it will not improve quality > > > signifcantly. It makes more sense once we are at a stage where multiple > > > sources disagree on values for statements. Adding sources is > signifcantly > > > more meaningful and useful once we start curating data. Statistically > > most > > > errors will be found where sources disagree. > > > > > > When people add conflicting data, it is indeed really relevant to add > > > Sources. My practice for adding data is that I will only add data that > > > fulfils some minimal criteria. Typically I am not interested in adding > > data > > > that already exists. I will remove less precise for more precise data. > > > > > > The biggest issue with data is that we do not have enough of it and the > > > second most relevant issue is that we need processes to compare sources > > > with Wikidata and have a workflow to curate differences. > > > Thanks, > > > GerardM > > > > > > On 28 November 2015 at 19:23, Andreas Kolbe <jayen466(a)gmail.com>

...

wrote:

> > > > > > > Gerard, > > > > > > > > > > > > On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen(a)gmail.com> >

...

wrote:

> > > > > > > > When you compare the quality of Wikipedias with what en.wp used to be > > you > > > > > are comparing apples and oranges. The Myanmar Wikipedia is better > > > > informed > > > > > on Myanmar than en.wp etc. > > > > > > > > > > > > > > > > > Is it? The entire Burmese Wikipedia contains a mere 31,646 content > > pages > > > at > > > > the time of writing, covering (or trying to cover) all countries of > the > > > > world, and all aspects of human knowledge.[1] > > > > > > > > The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 > pages > > > > within its purview.[2] I dare say that's more articles on Myanmar > than > > > the > > > > Burmese Wikipedia contains. As an indication, the English Wikipedia's > > > > article on Myanmar is more than twice as long as the one in the > Burmese > > > > Wikipedia. > > > > > > > > Moreover, according to Freedom House[3], the internet in Myanmar is > not > > > > free: > > > > > > > > "The government detained and charged internet users for online > > activities > > > > [...] Government officials pressured social media users not to > > distribute > > > > or share content that offends the military, or disturbs the functions > > of > > > > government." > > > > > > > > > > > > > > > > > When you qualify a Wikipedia as fascist, it does not follow that > the > > > data > > > > > is suspect. Certainly when data in a source that you so easily > > dismiss > > > is > > > > > typically the same, there is not much meaning in what you say from > a > > > > > Wikidata point of view. > > > > > > > > > > > > > > > > > Data are always generated within a social context, and data generated > > by > > > > political extremists or people living under oppressive regimes are > > > suspect > > > > whenever they have political implications. (Looking at the > descriptions > > > of > > > > Burmese politics, my feeling is the Burmese Wikipedia is not under > > > > significant government control, but largely written by ex-pats. > > However, > > > > the situation is quite different in some other Wikipedias serving > > > countries > > > > labouring under similar regimes.) > > > > > > > > > > > > > > > > > PS What does your librarian think when she knows > > > > > > > > > > > > > > > > It was a he, but I'll leave him to join in himself if he chooses to. > > > > > > > > > > > > I happen to work on Dukes of Friuli. Compare the data from Wikidata > and > > > the > > > > > information by Reasonator based on the same item for one of them. > > > > > > > > > > https://tools.wmflabs.org/reasonator/?&q=2471519 > > > > > https://www.wikidata.org/wiki/Q2471519 > > > > > > > > > > > > > > > > > Let's look at this example. Reasonator says of Grasulf II of Friulim, > > "He > > > > died in 653". There is no source. Wikidata says he died in 653, and > the > > > > indicated source is the Italian Wikipedia. > > > > > > > > However, when you look at the (very brief) Italian Wikipedia > > article[4], > > > > you will find that the year 653 is given with a question mark. The > > > English > > > > Wikipedia, in contrast, states, in its similarly brief article[5], > > > > > > > > "Nothing more is known about Grasulf and the date of his death is > > > > uncertain." > > > > > > > > Do you now see the problem about nuance? Reasonator and Wikidata > > > > confidently proclaim as uncontested fact something that in fact is > > rather > > > > uncertain. > > > > > > > > The sole source cited by both the English and the Italian Wikipedia > is > > > the > > > > Historia Langobardorum, available in Wikisource.[6] My Latin is a bit > > > > rusty, but while the Historia mentions that Ago succeeded Grasulf > upon > > > the > > > > latter's death, it says nothing specific about when that was. The > > > > Historia's time indications are in general very vague, usually > limited > > to > > > > the phrase "Circa haec tempora", meaning "about this time". So it is > in > > > > this case. > > > > > > > > For reference, the Google Knowledge Graph states equally confidently > > that > > > > Grasulf II of Friuli died in 651AD. This may be based on the English > > > > Wikipedia's unsourced claim (in the template at the bottom of the > > English > > > > Wikipedia article) that his reign ended c. 651, or on some other > source > > > > like Freebase. > > > > > > > > The other Wikipedias that have articles on Grasulf II provide the > > > following > > > > death dates > > > > > > > > Catalan: 651 > > > > Galician: 653 > > > > Lithuanian: 653 > > > > Polish: 651 > > > > Romanian: Unknown > > > > Russian: 653 > > > > Ukrainian: 651 > > > > > > > > As for published sources, I can offer Ersch's Allgemeine Encyclopädie > > > > (1849), which states on page 209 that Grasulf II died in 651.[7] > > > > > > > > The extreme vagueness of the available dates is pointed out by Thomas > > > > Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts > the > > > end > > > > of Grasulf's reign at 645, "as a mere random guess", and adds that > "De > > > > Rubeis, following Sigonius", puts the accession of Ago in 661.[8] > > > > > > > > There may well be better and more recent sources beyond my reach, but > > > > having these published dates in Wikidata, with the source references, > > > would > > > > actually make some sense. Unsourced data, not so much. > > > > > > > > Answers are comfortable, but they are not knowledge when they are > > > > unverifiable and/or wrong. > > > > > > > > > > > > [1] > > > https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles > > > > [2] > > > > > > > > > > > > > > https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessm… > > > > > > > > [3] https://freedomhouse.org/report/freedom-net/2015/myanmar > > > > [4] > > > > > > > > > > > > > > https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid… > > > > [5] > > > > > > > > > > > > > > https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=6… > > > > [6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV > > > > [7] > > > > > > > > > > > > > > https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasul… > > > > [8] > > > > > > > > > > > > > > https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%… > > > > _______________________________________________ > > > > Wikimedia-l mailing list, guidelines at: > > > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > > > Wikimedia-l(a)lists.wikimedia.org > > > > Unsubscribe: > https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > > > > > > _______________________________________________ > > > Wikimedia-l mailing list, guidelines at: > > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > > Wikimedia-l(a)lists.wikimedia.org > > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > > > > > > > > > > -- > > GN. > > President Wikimedia Australia > > WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra > > Photo Gallery: http://gnangarra.redbubble.com > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Lilburne

10:33 a.m.

On 29/11/2015 09:42, Gerard Meijssen wrote:

...

Hoi, Wikidata is a wiki and, you seem to always forget that. > > The corruption of data .. how? Each statement is its own data item how do you corrupt that? As I say so often, when you get a collection that is 80% correct you have an error rate of 20%.

Surely this isn't some exam paper where you get an 80% passing mark. What you have is a basket of eggs ... 20% of which are poisonous.

Gerard Meijssen

10:38 a.m.

Hoi, More FUD. Poisonous how? Thanks, GerardM On 29 November 2015 at 11:33, Lilburne <lilburne(a)tygers-of-wrath.net> wrote:

...

On 29/11/2015 09:42, Gerard Meijssen wrote:

that is 80% correct you have an error rate of 20%. Surely this isn't some exam paper where you get an 80% passing mark. What you have is a basket of eggs ... 20% of which are poisonous. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Lilburne

6:11 p.m.

Simply if I have a litre of sewage and add to it 100ml of pure water, I still have sewage. Conversely if I have a a litre of pure water and pour in 100ml of sewage into it then what do I have? What if 2 out of 10 bank statements are erroneous is that OK because 8 are accurate? What if ever 2 out of 10 gas stations delivered Gasoline from the Diesel pump? On 29/11/2015 10:38, Gerard Meijssen wrote:

...

Hoi, More FUD. Poisonous how? Thanks, GerardM On 29 November 2015 at 11:33, Lilburne <lilburne(a)tygers-of-wrath.net <mailto:lilburne@tygers-of-wrath.net>> wrote: On 29/11/2015 09:42, Gerard Meijssen wrote: Hoi, Wikidata is a wiki and, you seem to always forget that.

> The corruption of data .. how? Each statement is its own

data item

how do you corrupt that? As I say so often, when you get a

collection > that is 80% correct you have an error rate of 20%. Surely this isn't some exam paper where you get an 80% passing mark. What you have is a basket of eggs ... 20% of which are poisonous. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org <mailto:Wikimedia-l@lists.wikimedia.org> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe>

Gerard Meijssen

6:31 p.m.

Hoi, In the Netherlands water is an essential ingredient to our country. It is a friend and it is an enemy. Where I live we are 3 meters below sea level. The Rhine streams down after including all the effluent from Germany and Switzerland. We do swim in the Rhine, it is clean enough for the WWF to free sturgeons so that we may have them come and breed in the future. We use its water and process it to drinking water. Yes, shit happens and we can deal with that. Your attempt at rhetorical questions is suspect because you do not see that quality / purity in water is like with Wikidata a process. It is not in doing it once. It is done by doing it again and again. That is why the water is clean and iteratively we will what we ingress and make it the quality that we will not get in any other way, Thanks, GerardM On 29 November 2015 at 19:11, Lilburne <lilburne(a)tygers-of-wrath.net> wrote:

...

> The corruption of data .. how? Each statement is its own

data item

how do you corrupt that? As I say so often, when you get a

Andreas Kolbe

1 Dec 1 Dec

11:27 a.m.

Article by Mark Graham in Slate, Nov. 30, 2015: Why Does Google Say Jerusalem Is the Capital of Israel? It has to do with the fact that the Web is now optimized for machines, not people. http://www.slate.com/articles/technology/future_tense/2015/11/why_does_goog… Excerpt: [...] because of the ease of separating content from containers, the provenance of data is often obscured. Contexts are stripped away, and sources vanish into Google’s black box. For instance, most of the information in Google’s infoboxes on cities doesn’t tell us where the data is sourced from. Second, because of the stripping away of context, it can be challenging to represent important nuance. In the case of Jerusalem, the issue is less that particular viewpoints about the city’s status as a capital are true or false, but rather that there can be multiple truths, all of which are hard to fold into a single database entry. Finally, it’s difficult for users to challenge or contest representations that they deem to be unfair. Wikidata is, and Freebase used to be, built on user-generated content, but those users tend to be a highly specialized group—it’s not easy for lay users to participate in those platforms. And those platforms often aren’t the place in which their data is ultimately displayed, making it hard for some users to find them. Furthermore, because Google’s Knowledge Base is so opaque about where it pulls its information from, it is often unclear if those sites are even the origins of data in the first place. Jerusalem is just one example among many in which knowledge bases are increasingly distancing (and in some case cutting off) debate about contested knowledges of places. [followed by more examples] My point is not that any of these positions are right or wrong. It is instead that the move to linked data and the semantic Web means that many decisions about how places are represented are increasingly being made by people and processes far from, and invisible to, people living under the digital shadows of those very representations. Contestations are centralized and turned into single data points that make it difficult for local citizens to have a significant voice in the co-construction of their own cities. [...] Linked data and the machine-readable Web have important implications for representation, voice, and ultimately power in cities, and we need to ensure that we aren't seduced into codifying, categorizing, and structuring in cases when ambiguity, not certainty, reigns.

Gerard Meijssen

12:15 p.m.

Hoi, This thread is called "quality". There are ways to include multiple truisms. Wikidata is the data project of the Wikimedia Foundation, it is a wiki, so when you have issues, deal with it. I prefer to quote what John Ruskin had to say: "Quality is never an accident. It is always the result of intelligent effort". I am more concerned with the fact that the Linguapax Prize does not have all of its winners. I am more concerned that half of the items of Wikidata have fewer than three statements. These are issues that deal with the quality of Wikidata. As Magnus has started to produce reports on issues between Mix'n Match and Wikidata, he invites people to improve our quality. It is one way in which the quality of our current data improves measurably. When I blog about the Nansen Refugee award I report on the type of issues I find in Wikipedia. It is easy to find fault. The point however is not that Wikipedia is bad nor that Wikidata is good. The point is that in order to achieve quality there is a lot of work to do. Thanks, GerardM On 1 December 2015 at 12:27, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Yaroslav M. Blanter

7 Dec 7 Dec

8:29 p.m.

On 2015-12-01 12:27, Andreas Kolbe wrote:

...

Article by Mark Graham in Slate, Nov. 30, 2015: Why Does Google Say Jerusalem Is the Capital of Israel? It has to do with the fact that the Web is now optimized for machines, not people.

...

Second, because of the stripping away of context, it can be challenging to represent important nuance. In the case of Jerusalem, the issue is less that particular viewpoints about the city’s status as a capital are true or false, but rather that there can be multiple truths, all of which are hard to fold into a single database entry. Finally, it’s difficult for users to challenge or contest representations that they deem to be unfair. Wikidata is, and Freebase used to be, built on user-generated content, but those users tend to be a highly specialized group—it’s not easy for lay users to participate in those platforms. And those platforms often aren’t the place in which their data is ultimately displayed, making it hard for some users to find them. Furthermore, because Google’s Knowledge Base is so opaque about where it pulls its information from, it is often unclear if those sites are even the origins of data in the first place. Jerusalem is just one example among many in which knowledge bases are increasingly distancing (and in some case cutting off) debate about contested knowledges of places. [followed by more examples]

The story with Jerusalem is very simple. I created the Wikidata item. The English description was "city in Israel". Then POV pushers came. Some of them wanted "city in Palestine", and others wanted "capital of Israel". Then one user, who later was elected to the board of Wikimedia Israel, canvassed a number of users in Hebrew Wikipedia. When there were too many POV pushers, I just unwatched the page, and it became "capital of Israel". Later on, someone managed to change it to smth neutral. That's it. There is nothing automatic here. Cheers Yaroslav

Andreas Kolbe

8:53 p.m.

Hi Yaroslav, Thanks for the background. The "POV pushing" you describe is of course what Graham and Ford are examining in their paper. For what it's worth, the Wikidata item for Jerusalem[1] still contains the statement "capital of Israel" today. As I understand it, the Knowledge Graph uses a number of sources to "guess" whether something is factual or not. Whether Wikidata is one of them, and what weight it has in this process, is something I suspect no one outside Google knows. The op-ed I mentioned writing last week is now out as part of the current Signpost issue.[2] Andreas [1] https://www.wikidata.org/wiki/Q1218 [2] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed On Mon, Dec 7, 2015 at 8:29 PM, Yaroslav M. Blanter <putevod(a)mccme.ru> wrote:

...

Andrea Zanni

8:58 p.m.

On Mon, Dec 7, 2015 at 9:53 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Craig Franklin

11:52 p.m.

Such issues are always going to crop up when you're attempting to describe the world using Aristotelian propositions. In a source like Wikipedia, we can provide some nuance, explain both sides of the issue, the history of both claims, and let the reader decide. In a database, we are limited to saying that Jerusalem either is or is not the capital of Israel. To be fair, this is not an weakness that is implementation-specific to Wikidata; it is always going to happen when you try to describe the world in this way. It's not something that can be fixed with adding sources, or by bolting fancy new technical gadgets onto the side of the database. Cheers, Craig On 8 December 2015 at 06:58, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

On Mon, Dec 7, 2015 at 9:53 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

Hi Yaroslav, Thanks for the background. The "POV pushing" you describe is of course

what

Graham and Ford are examining in their paper. For what it's worth, the Wikidata item for Jerusalem[1] still contains

the

statement "capital of Israel" today.

Really, I do not understand the difference between this kind of problem and Wikipedia's edit wars or conflicts. Wikidata represents knowledge in a structured, collaborative way: both features define it, and it seems the op-ed just doesn't like them (either one or both). Aubrey _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gnangarra

8 Dec 8 Dec

12:15 a.m.

Criag is right this cant be fixed within the database because the data base is applying one truth where there is no one truth for everyone. This will always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth unless its measurable through some mathematical scientific process that can replicated by everyone, translated into any language. Wikipedia's answer is to present all considerations in an equal manor and not interpret the facts.... Wikidata defines what is fact, what is truth, what is right thats a big task and is something the community has never tackled before... should we even try, has the damage already been done or should we narrow the range of recorded data, could we flag alternatives, could we give a measure of acceptance for each fact. are there alternative means.... Quality itself has many different measures and many different ways of being measured all of which are the truth for the question being asked... Are we even asking the questions we need to in the way we need to? On 8 December 2015 at 07:52, Craig Franklin <cfranklin(a)halonetwork.net> wrote:

...

On Mon, Dec 7, 2015 at 9:53 PM, Andreas Kolbe <jayen466(a)gmail.com>

wrote:

Hi Yaroslav, Thanks for the background. The "POV pushing" you describe is of course

what

Graham and Ford are examining in their paper. For what it's worth, the Wikidata item for Jerusalem[1] still contains

the

statement "capital of Israel" today.

Really, I do not understand the difference between this kind of problem

and

Wikipedia's edit wars or conflicts. Wikidata represents knowledge in a structured, collaborative way: both features define it, and it seems the op-ed just doesn't like them (either one or both). Aubrey _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Lydia Pintscher

10 Dec 10 Dec

8:27 a.m.

On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra <gnangarra(a)gmail.com> wrote:

...

That is actually not correct. We have built Wikidata from the very beginning with some core believes. One of them is that Wikidata isn't supposed to have the one truth but instead is able to represent various different points of view and link to sources claiming these. Look for example at the country statements for Jerusalem: https://www.wikidata.org/wiki/Q1218 Now I am the first to say that this will not be able to capture the full complexity of the world around us. But that's not what it is meant to do. However please be aware that we have built more than just a dumb database with Wikidata and have gone to great length to make it possible to capture knowledge diversity. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Jane Darnell

8:44 a.m.

Amen to that! This discussion about Jerusalem reminds me of the discussion we had about the nationality of Anne Frank. For those interested, there have been some heated debates about whether Mobile should use the text in Wikidata "label descriptions" or rather some basic presentation of the P31 property. Most descriptions are still blank anyway. Personally I think texts such as "capital of Israel" or "holocaust victim" are both better than blank, but many disagree with me. Both of these represent associated items that have a lot of eyes on them, but what about our more obscure items? Lots of these may be improved by the people who originally created a Wikipedia page for them. As a Wikipedia editor who has created over 2000 Wikipedia pages, I feel somewhat dismayed at the idea that I need to walk through this long list and add statements to their Wikidata items as the responsible party who introduced them to the Wikiverse in the first place. But if I had a gadget that would tell me which of my created Wikipedia articles had 0-3 statements, I would probably update those. On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote:

...

On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra <gnangarra(a)gmail.com> wrote:

Criag is right this cant be fixed within the database because the data

base

is applying one truth where there is no one truth for everyone. This will always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth unless its measurable

through

some mathematical scientific process that can replicated by everyone, translated into any language. Wikipedia's answer is to present all considerations in an equal manor and not interpret the facts.... Wikidata defines what is fact, what is truth, what is right thats a big task and is something the community has never tackled before... should we even try, has the damage already been done or should we narrow the range

recorded data, could we flag alternatives, could we give a measure of acceptance for each fact. are there alternative means....

Gnangarra

9:14 a.m.

I agree getting bogged down on one item of data isnt helpful but the data does need to show its disputed and the data item on Israel <https://www.wikidata.org/wiki/Q801> should at least have Tel Aviv listed as its mentonym within the database because the data base

...

is applying one truth where there is no one truth for everyone. This will always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth

The Jerusalem/Israel example where the data doesnt indicate its disputed means that it will propagated as an absolute truth... Then again this is shifting away from the original concern over quality that the ability to verify the information isnt clear combined with the CC0 license the already established practice on other sources. Wikidata for falsehoods being easily manipulated its going to have a impact. On 10 December 2015 at 16:44, Jane Darnell <jane023(a)gmail.com> wrote:

...

On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra <gnangarra(a)gmail.com> wrote:

Criag is right this cant be fixed within the database because the data

base > is applying one truth where there is no one truth for everyone. This

will

always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth unless its measurable

through > some mathematical scientific process that can replicated by everyone, > translated into any language. > > Wikipedia's answer is to present all considerations in an equal manor

and

> not interpret the facts.... > > Wikidata defines what is fact, what is truth, what is right thats a big > task and is something the community has never tackled before... should

> even try, has the damage already been done or should we narrow the

range

recorded data, could we flag alternatives, could we give a measure of acceptance for each fact. are there alternative means....

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Jane Darnell

9:28 a.m.

Just as this discussion shifts, so does Wikidata quality. Both, hopefully, in a more constructive direction, which was Lydia's original point. On Thu, Dec 10, 2015 at 10:14 AM, Gnangarra <gnangarra(a)gmail.com> wrote:

...

is applying one truth where there is no one truth for everyone. This will always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth

Amen to that! This discussion about Jerusalem reminds me of the

discussion

we had about the nationality of Anne Frank. For those interested, there have been some heated debates about whether Mobile should use the text in Wikidata "label descriptions" or rather some basic presentation of the

P31

property. Most descriptions are still blank anyway. Personally I think texts such as "capital of Israel" or "holocaust victim" are both better than blank, but many disagree with me. Both of these represent associated items that have a lot of eyes on them, but what about our more obscure items? Lots of these may be improved by

the

people who originally created a Wikipedia page for them. As a Wikipedia editor who has created over 2000 Wikipedia pages, I feel somewhat

dismayed

at the idea that I need to walk through this long list and add statements to their Wikidata items as the responsible party who introduced them to

the

Wikiverse in the first place. But if I had a gadget that would tell me which of my created Wikipedia articles had 0-3 statements, I would

probably

update those. On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote: > On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra <gnangarra(a)gmail.com> wrote: > > Criag is right this cant be fixed within the database because the

data

base > is applying one truth where there is no one truth for everyone. This

will

always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth unless its measurable

through > some mathematical scientific process that can replicated by everyone, > translated into any language. > > Wikipedia's answer is to present all considerations in an equal manor

and > > not interpret the facts.... > > > > Wikidata defines what is fact, what is truth, what is right thats a

big

> > task and is something the community has never tackled before...

should

> even try, has the damage already been done or should we narrow the

range

recorded data, could we flag alternatives, could we give a measure of acceptance for each fact. are there alternative means....

Gerard Meijssen

9:31 a.m.

Hoi, The other side of being easily manipulated is that it is easy to rectify. The Signpost is FUD in so many ways and incorrect as well. Yes, you may have a concern about falsehoods. However, this is not going to be helped much by insisting that everything is to be sourced. It is also not the only way to consider quality and arguably it is the least helpful way of improving the quality at Wikidata. Typically what has been established on other sources is acceptable as valid for now. When we compare and find differences, it is of relevance to find sources and even document the differences. When it is a falsehood we should flag them as such. Sources can be wrong or considered to be wrong. The case for the CC-0 license is so in line with what the WMF stands for. Our aim is to share in the sum of all knowledge and it is the most obvious way to do it. When Wikidata is found to document falsehoods or established truths that are problematic, we gain a quality where people come to Wikidata to learn what they need to learn. So where some see a problem, there is opportunity. Thanks, GerardM On 10 December 2015 at 10:14, Gnangarra <gnangarra(a)gmail.com> wrote:

...

is applying one truth where there is no one truth for everyone. This will always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth

Amen to that! This discussion about Jerusalem reminds me of the

discussion

P31

the

people who originally created a Wikipedia page for them. As a Wikipedia editor who has created over 2000 Wikipedia pages, I feel somewhat

dismayed

at the idea that I need to walk through this long list and add statements to their Wikidata items as the responsible party who introduced them to

the

Wikiverse in the first place. But if I had a gadget that would tell me which of my created Wikipedia articles had 0-3 statements, I would

probably

data

base > is applying one truth where there is no one truth for everyone. This

will

always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth unless its measurable

through > some mathematical scientific process that can replicated by everyone, > translated into any language. > > Wikipedia's answer is to present all considerations in an equal manor

and > > not interpret the facts.... > > > > Wikidata defines what is fact, what is truth, what is right thats a

big

> > task and is something the community has never tackled before...

should

> even try, has the damage already been done or should we narrow the

range

recorded data, could we flag alternatives, could we give a measure of acceptance for each fact. are there alternative means....

Gerard Meijssen

10:27 a.m.

Hoi, The other side of the coin of being easily manipulated is that it is easy to rectify. The Signpost is FUD in so many ways and incorrect as well. Yes, you may have a concern about falsehoods. However, this is not going to be helped much by insisting that everything is to be sourced. It is also not the only way to consider quality and arguably it is the least helpful way of improving the quality at Wikidata. Typically what has been established on other sources is acceptable as valid for now. When we compare and find differences, it is of relevance to find sources and even document the differences. When it is a falsehood we should flag them as such. Sources can be wrong or considered to be wrong. The point however is that by concentrating on differences first we make the most effective use of people who like these kinds of puzzles. The case for the CC-0 license is so in line with what the WMF stands for. Our aim is to share in the sum of all knowledge and it is the most obvious way to do it. When Wikidata is found to document falsehoods or established truths that are problematic, we gain a quality where people come to Wikidata to learn what they need to learn. When you say it has an impact, OK. Let it have an impact but lets consider arguments and that is exactly what the author of this article did not do. It is the one reason why what he wrote is FUD. So do consider quality and recognise that we have made enormous strides forward. When this recognition sinks in, when people understand how quality actually works, the kind of quality that makes a difference improving Wikidata, we can easily go on doing what we do. We may be bold and should be bold, we may make mistakes and we do learn as we go along. Thanks, GerardM On 10 December 2015 at 10:14, Gnangarra <gnangarra(a)gmail.com> wrote:

...

is applying one truth where there is no one truth for everyone. This will always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth

Amen to that! This discussion about Jerusalem reminds me of the

discussion

P31

the

people who originally created a Wikipedia page for them. As a Wikipedia editor who has created over 2000 Wikipedia pages, I feel somewhat

dismayed

at the idea that I need to walk through this long list and add statements to their Wikidata items as the responsible party who introduced them to

the

Wikiverse in the first place. But if I had a gadget that would tell me which of my created Wikipedia articles had 0-3 statements, I would

probably

data

base > is applying one truth where there is no one truth for everyone. This

will

always be the single biggest flaw of Wikidata no matter how data is presented it can never be the absolute truth unless its measurable

through > some mathematical scientific process that can replicated by everyone, > translated into any language. > > Wikipedia's answer is to present all considerations in an equal manor

and > > not interpret the facts.... > > > > Wikidata defines what is fact, what is truth, what is right thats a

big

> > task and is something the community has never tackled before...

should

> even try, has the damage already been done or should we narrow the

range

recorded data, could we flag alternatives, could we give a measure of acceptance for each fact. are there alternative means....

Andreas Kolbe

12:17 p.m.

On Thu, Dec 10, 2015 at 10:27 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com

...

wrote:

...

The case for the CC-0 license is so in line with what the WMF stands for. Our aim is to share in the sum of all knowledge and it is the most obvious way to do it. When Wikidata is found to document falsehoods or established truths that are problematic, we gain a quality where people come to Wikidata to learn what they need to learn.

Gerard Meijssen

2:40 p.m.

Hoi, What other people say is there choice. The law is simple. Facts cannot be copyrighted and consequently the preference / the opinion of Denny is simply that. Typically statistics organisations are more than happy to share their data. They do so in the Netherlands and it is only for a lack of organisation on our end that it has not happened yet. When I copy data from Wikipedia, it is unstructured in every sense. As a follow up I often spend time to improve upon it further. I do not care for your opinion. So far I only have seen your FUD, you present preferences of people like Denny as a ground for compliance, it is not and there is not much positive in what I have seen from you so far. What is your contribution, what is it that you hope to achieve? You point to organisations like statistics organisation like they are the ones not interested in collaboration. They are ever so happy to collaborate and we are happy to acknowledge them for the source of information when they do. By seeking collaboration, by seeking to bring data together and achieve more, we are able to make a difference. This is not done by publicly claiming like you do that you are not involved and do not want to know. It is done by being involved, knowing what quality means and how we can achieve it and walking the walk and talk the talk. Thanks, GerardM On 10 December 2015 at 13:17, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Thu, Dec 10, 2015 at 10:27 AM, Gerard Meijssen < gerard.meijssen(a)gmail.com

wrote:

The case for the CC-0 license is so in line with what the WMF stands for. Our aim is to share in the sum of all knowledge and it is the most

obvious

way to do it. When Wikidata is found to document falsehoods or

established

truths that are problematic, we gain a quality where people come to Wikidata to learn what they need to learn.

According to Denny, Wikidata, under its CC0 licence, must not import data from Share-Alike sources. He reconfirmed this yesterday when I asked him whether he still stood by that. In practice though we have Wikidata importing massive amounts of data from Wikipedia, which was a Share-Alike source last time I looked. Isn't Wikidata then infringing Wikipedia contributors' rights? Why is it okay to import data from the CC BY-SA Wikipedia, but not from European CC BY-SA population statistics? There are inchoate and uncomfortable parallels to licence laundering here, which I would hope is not something the WMF stands for. Could someone please explain? _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Denny Vrandečić

12 Dec 12 Dec

12:05 a.m.

On Thu, Dec 10, 2015 at 4:18 AM Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Andreas, what I said was that Wikidata must not import data from a data source licensed under Share-Alike date source. The important thing that differentiates what I said from what you think I said is "import data from a data source". Wikipedia is not a data source, but text. Extracting facts or data from a text is a very different thing than taking data from one place and put it in another place. There was no database that contains the content of Wikipedia and that can be queried. Indeed, that is the whole reason why Wikidata has been started in the first place. In fact, extracting facts or data from one text and then writing a Wikipedia article is what Wikipedians do all the time, and the license of the original text we read has no effect on the license of the output text. So, there is no such thing as an import of data from Wikipedia, because Wikipedia is not a database. I have repeatedly pointed you to http://simia.net/wiki/Free_data and you yourself have repeatedly pointed to https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights so I would assume that you would have by now read these and developed an understanding of these issues. I am not a lawyer, and my understanding of these issues is also lacking, but I wanted at least to point out that you are misquoting me. Please, would you mind to correct your misquoting of me in the places where you did so, or at least point to this email for further context?

Andreas Kolbe

6:01 a.m.

Denny, I quoted your statement verbatim and in full in the op-ed. Moreover, your statement had a context. Alexrk2 had said,[1] ---o0o--- Read the above.. at least under European Union law databases are protected by copyright. CC0 won't be compatible with other projects like OpenStreetMap *or Wikipedia*. This means a CC0-WikiData won't be allowed to *import content from Wikipedia*, OpenStreetMap or any other share-alike data source. The worst case IMO would be if WikiData *extracts content out of Wikipedia and release it as CC0*. Under EU law this would be illegal. As a contributor in DE Wikipedia I would feel like being expropriated somehow. This is not acceptable! --Alexrk2 (talk) 15:32, 16 June 2012 (UTC) ---o0o--- Note Alexrk2's three (3) specific references to Wikipedia. Alexrk2 referred to imports of content from Wikipedia, and how it would make her or him feel expropriated if WikiData extracted content out of Wikipedia and released it under CC0. You replied, ---o0o--- Alexrk2, it is true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. *Wikidata does not plan to extract content out of Wikipedia at all*. Wikidata will *provide *data that can be reused in the Wikipedias. And a CC0 source can be used by a Share-Alike project, be it either Wikipedia or OSM. But not the other way around. Do we agree on this understanding? --Denny Vrandečić (WMDE) (talk) 12:39, 4 July 2012 (UTC) ---o0o--- Alexrk2 specifically mentioned Wikipedia. So did you in your reply, assuring Alexrk2 that Wikidata did not in fact plan to extract content out of Wikipedia at all. Does this lend itself to the interpretation that you were talking only about databases, and not about Wikipedia? Alexrk2 then replied to you, ---o0o--- @Denny Vrandečić: I agree. But I thought, the aim (or *one* aim) of WikiData would be to *draw all the data out of Wikipedia (infoboxes and such things)*. ---o0o--- You did not respond to that post, or participate further in that section. And these bot imports of Wikipedia infobox contents etc. have happened and are ongoing. They have been mentioned in many discussions. There are millions of statements in Wikidata that are cited to Wikipedia. Just a few days ago, Jheald said on Project Chat,[2] ---o0o--- But my own view is that we should very definitely be trying, as urgently as possible, to *capture as much as possible of the huge amount of data in infoboxes, templates, categorisations, etc on Wikipedia that is not yet in Wikidata* -- and that (at least in most subject areas) calls to restrict to only data from independent external sources are utterly utterly misguided, and typically bear no relation to either what is desirable, what is available, or what is still needed in order to utilise such sources effectively. Jheald (talk) 23:49, 8 December 2015 (UTC) ---o0o--- It's not plausible to my understanding to argue that Wikipedia's templates, infoboxes etc. are not "data sources" when contributors speak of capturing "the huge amount of data" contained in them. Much of the existing content of Wikidata consists of data extracted from Wikipedias. If you feel I have misquoted you anywhere on-wiki, please point me to the corresponding place (here or via my talk page in that project), and I will do whatever is necessary. [1] https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_d… [2] https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&diff=2… On Sat, Dec 12, 2015 at 12:05 AM, Denny Vrandečić <vrandecic(a)gmail.com> wrote:

...

On Thu, Dec 10, 2015 at 4:18 AM Andreas Kolbe <jayen466(a)gmail.com> wrote:

from

Wikipedia, which was a Share-Alike source last time I looked. Isn't Wikidata then infringing Wikipedia contributors' rights? Why is it okay to import data from the CC BY-SA Wikipedia, but not from European CC BY-SA population statistics?

Gerard Meijssen

8:22 a.m.

Andreas, Why is it that Denny is to answer on your terms and why is it that you have not addressed any of the points I made on quality, Moreover you deny his argument because YOU are not willing to acknowledge his point and thereby making him out for a liar. You have not acknowledged that Wikidata is a wiki and you do not appreciate its implications. You are told that your notion of quality has the least operational value in Wikidata. You have been told repeatedly why and how considering these other definitions of quality contribute to improved quality and participation and it is as if this is of total irrelevance. This all means nothing to you because you do not care, you are intentionally not involved. You are like a pharisee in the temple. I have heard it said several times now that your attitude is the same as the ones mocking Wikipedia when it was young. Given that you stand for Wikipedia Signpost, you degrade the appreciation of the English Wikipedia considerably because you seem to be arguing the anti thesis of the wiki concept, Get a live. Thanks, GerardM On 12 December 2015 at 07:01, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Thu, Dec 10, 2015 at 4:18 AM Andreas Kolbe <jayen466(a)gmail.com>

wrote:

> According to Denny, Wikidata, under its CC0 licence, must not import

data

> from Share-Alike sources. He reconfirmed this yesterday when I asked

him

whether he still stood by that. In practice though we have Wikidata importing massive amounts of data

from

first

place. In fact, extracting facts or data from one text and then writing a Wikipedia article is what Wikipedians do all the time, and the license of the original text we read has no effect on the license of the output

text.

So, there is no such thing as an import of data from Wikipedia, because Wikipedia is not a database. I have repeatedly pointed you to http://simia.net/wiki/Free_data and you yourself have repeatedly pointed to https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights so I would assume that you would have by now read these and developed an understanding of these issues. I am not a lawyer, and my understanding of these issues is also lacking, but I wanted at least to point out that you are misquoting me. Please, would you mind to correct your misquoting of me in the places

where

you did so, or at least point to this email for further context?

Lydia Pintscher

9:18 p.m.

On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher <lydia.pintscher(a)wikimedia.de> wrote:

...

Jane Darnell

13 Dec 13 Dec

9:15 a.m.

Thanks for that essay, Lydia! You said it well, and I especially agree with what you wrote about trust and believing in ourselves. I had to laugh at some of the comments, because if you substitute "Wikipedia" for "Wikidata" those comments could have been written 3 years ago before Wikidata came on the scene. On Sat, Dec 12, 2015 at 10:18 PM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote:

...

On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher <lydia.pintscher(a)wikimedia.de> wrote:

I've taken the time and written a longer piece about data quality and knowledge diversity on Wikidata for the current edition of the Signpost: https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

3:57 p.m.

Jane, The issue is that you can't cite one Wikipedia article as a source in another. If, as some envisage, you were to fill Wikipedia's infoboxes with Wikidata content that's unsourced, or sourced only to a Wikipedia, you'd be doing exactly that, and violating WP:V in the process: "Do not use articles from Wikipedia as sources. Also, do not use *websites that mirror Wikipedia content or publications that rely on material from Wikipedia as sources*." (WP:CIRCULAR) That includes Wikidata. As long as Wikidata doesn't provide external sourcing, it's unusable in Wikipedia. Andreas On Sun, Dec 13, 2015 at 9:15 AM, Jane Darnell <jane023(a)gmail.com> wrote:

...

On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher <lydia.pintscher(a)wikimedia.de> wrote:

I've taken the time and written a longer piece about data quality and knowledge diversity on Wikidata for the current edition of the Signpost:

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed

Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

geni

5:32 p.m.

On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Jane, The issue is that you can't cite one Wikipedia article as a source in another.

However you can within the same article per [[WP:LEAD]]. -- geni

Andreas Kolbe

5:40 p.m.

On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote:

...

On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com> wrote:

Jane, The issue is that you can't cite one Wikipedia article as a source in another.

However you can within the same article per [[WP:LEAD]].

Well, of course, if there are reliable sources cited in the body of the article that back up the statements made in the lead. You still need to cite a reliable source though; that's Wikipedia 101.

Andrea Zanni

6:10 p.m.

I really feel we are drowning in a glass of water. The issue of "data quality" or "reliability" that Andreas raises is well known: what I don't understand if the "scale" of it is much bigger on Wikidata than Wikipedia, and if this different scale makes it much more important. The scale of the issue is maybe something worth discussing, and not the issue itself? Is the fact that Wikidata is centralised different from statements on Wikipedia? I don't know, but to me this is a more neutral and interesting question. I often say that the Wikimedia world made quality an "heisemberghian" feature: you always have to check if it's there. The point is: it's been always like this. We always had to check for quality, even when we used Britannica or authority controls or whatever "reliable" sources we wanted. Wikipedia, and now Wikidata, is made for everyone to contribute, it's open and honest in being open, vulnerable, prone to errors. But we are transparent, we say that in advance, we can claim any statement to the smallest detail. Of course it's difficult, but we can do it. Wikidata, as Lydia said, can actually have conflicting statements in every item: we "just" have to put them there, as we did to Wikipedia. If Google uses our data and they are wrong, that's bad for them. If they correct the errors and do not give us the corrections, that's bad for us and not ethical from them. The point is: there is no license (for what I know) that can force them to contribute to Wikidata. That is, IMHO, the problem with "over-the-top" actors: they can harness collective intelligent and "not give back." Even with CC-BY-SA, they could store (as they are probably already doing) all the data in their knowledge vault, which is secret as it is an incredible asset for them. I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA or CC0, but it's not there. So, as we are working via GLAMs with Wikipedia for getting reliable sources and content, we are working with them also for good statements and data. Putting good data in Wikidata makes it better, and I don't understand what is the problem here (I understand, again, the issue of putting too much data and still having a small community). For example: if we are importing different reliable databases, andthe institutions behind them find it useful and helpful to have an aggregator of identifiers and authority controls, what is the issue? There is value in aggregating data, because you can spot errors and inconsistencies. It's not easy, of course, to find a good workflow, but, again, that is *another* problem. So, in conclusion: I find many issues in Wikidata, but not on the mission/vision, just in the complexity of the project, the size of the dataset, the size of the community. Can we talk about those? Aubrey On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote:

On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com> wrote:

Jane, The issue is that you can't cite one Wikipedia article as a source in another.

However you can within the same article per [[WP:LEAD]].

Well, of course, if there are reliable sources cited in the body of the article that back up the statements made in the lead. You still need to cite a reliable source though; that's Wikipedia 101. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gerard Meijssen

7:37 p.m.

Hoi, Thank you for another approach. When Wikidata imports data from Wikipedia, it essentially stands on the shoulders of giants. Yes, there are sources in Wikipedia and it does not prevent occasional issues. Yes, we import a lot of data from Wikipedia and this makes life at Wikidata easy and what we do obvious. It all started with improving quality at Wikidata by making interwiki links manageable and we are still often involved in fixing wikilinks in Wikidata because the assumptions to link some articles are "funny". When you look at Wikipedia, a lot of the fixtures are essentially about data. A category or a list can be replicated in many ways by querying Wikidata.The inverse is that Wikidata can be populated from Wikipedia. Consequently when we say that we know about men and women in so many Wikipedias it is because of this that we can and do. When Wikipedia is correct, Wikidata is. When Wikipedias do not agree, you will find this expressed in Wikidata. When people build tools, bots and they have done so for a long time it is EXACTLY based on the assumption that Wikipedia is essentially correct and, it is why the quality and quantity of Wikidata is already this good. When you want to consider Wikidata and its complexity, it is important to look at the statistics. The statistics by Magnus are the most relevant because they help explain many of the issues of Wikidata. One important point. No Wikipedia can claim Wikidata as it is a composite. Wikipedia policies do not apply. When people insist that all the data in Wikidata has to be 100% correct, forget it. Wikipedia is not 00% correct either and that is what we build upon. It has never been this way and it is impossible to do this any time soon. What we can do is build upon existing qualities, compare and curate. It is for instance fairly easy to improve on Wikipedia based upon the information that is already there but shown to be problematic. It is easy when we collaborate as it will improve the quality of what we offer. One problem is that we are SO bad at collaboration. Wikipedians work on one article at a time and when I work on awards there are easily 60 persons involved and I trust Wikipedia to be right. The kind of issues I encounter I blog about regularly. I am not involved in single items or they have to be of relevance to me like Bassel, the only Wikipedian sentenced to death. So I did add new items that exist as red links in the award he received and I did ask Magnus to help me with a list for the award he received. I added the website I used on the award and that is as far as I go. When you want to talk about the issues, what is it that you want to achieve. So far there has been little interest in Wikidata. When you want to learn about issues, research the issues. Find methods to calculate the error rate, find methods to compare Wikidata with the Wikipedias and with other sources in a meaningful way. But do approach it like Magnus does. His contributions help us make a positive difference. When you find numbers for now that you cannot replicate with the next dump and the next, they are essentially without much value because they do not enable us to improve on what we have. They do not help us engage our minds to make a difference. I ask Amir regularly to run a bot based on the statistics produced by Magnus, we are not at the stage where we have such tasks automated... Andrea, Wikidata is a wiki. It is young and it has already proven itself for several applications. What can be done improves as our data improves. We have a lack of data on many subjects because it is where Wikipedia is lacking. How will we approach for instance the fact that we have fewer than 1000 Syrians and one of them is an emperor of the Roman empire and another is Bassel? Let us be bold and allow us to be a wiki. Let us work towards the quality that is possible to achieve and do not burden us with the assumptions of some Wikipedias. When you are serious, get involved. Thanks, GerardM On 13 December 2015 at 19:10, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com>

wrote:

Jane, The issue is that you can't cite one Wikipedia article as a source in another.

However you can within the same article per [[WP:LEAD]].

Jane Darnell

8:35 p.m.

Andrea, I totally agree on the mission/vision thing, but am not sure what you mean exactly by scale - do you mean that Wikidata shouldn't try to be so granular that it has a statement to cover each factoid in any Wikipedia article, or do you mean we need to talk about what constitutes notability in order not to grow Wikidata exponentially to the point the servers crash? Jane On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com>

wrote:

Jane, The issue is that you can't cite one Wikipedia article as a source in another.

However you can within the same article per [[WP:LEAD]].

Andrea Zanni

16 Dec 16 Dec

11:12 a.m.

On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com> wrote:

...

Hi Jane, I explained myself poorly (sometime English is too difficult :-) What I mean is that the scale of the error *could* be of another scale, another order of magnitude. The propagation of the error is multiplied, it's not just a single error on a wikipage: it's an error propagated in many wikipages, and then Google, etc. A single point of failure. Of course, the opposite is also true: it's a single point of openness, correction, information. I was just wondering if this different scale is a factor in making Wikipedia and Wikidata different enough to accept/reject Andreas arguments. Andrea

...

On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

the

issue is maybe something worth discussing, and not the issue itself? Is

the

fact that Wikidata is centralised different from statements on

Wikipedia? I

don't know, but to me this is a more neutral and interesting question. I often say that the Wikimedia world made quality an "heisemberghian" feature: you always have to check if it's there. The point is: it's been always like this. We always had to check for quality, even when we used Britannica or authority controls or whatever "reliable" sources we wanted. Wikipedia,

and

now Wikidata, is made for everyone to contribute, it's open and honest in being open, vulnerable, prone to errors. But we are transparent, we say that in advance, we can claim any statement to the smallest detail. Of course it's difficult, but we can do it. Wikidata, as Lydia said, can actually have conflicting statements in every item: we "just" have to put them there, as we did to Wikipedia. If Google uses our data and they are wrong, that's bad for them. If they correct the errors and do not give us the corrections, that's bad for us and not ethical from them. The point is: there is no license (for what I know) that can force them to contribute to Wikidata. That is, IMHO, the problem with "over-the-top" actors: they can harness collective

intelligent

and "not give back." Even with CC-BY-SA, they could store (as they are probably already doing) all the data in their knowledge vault, which is secret as it is an incredible asset for them. I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA

CC0, but it's not there. So, as we are working via GLAMs with Wikipedia for getting reliable sources and content, we are working with them also for good statements

and

data. Putting good data in Wikidata makes it better, and I don't

understand

what is the problem here (I understand, again, the issue of putting too much data and still having a small community). For example: if we are importing different reliable databases, andthe institutions behind them find it useful and helpful to have an aggregator of identifiers and authority controls, what is the issue? There is value

aggregating data, because you can spot errors and inconsistencies. It's

not

easy, of course, to find a good workflow, but, again, that is *another* problem. So, in conclusion: I find many issues in Wikidata, but not on the mission/vision, just in the complexity of the project, the size of the dataset, the size of the community. Can we talk about those? Aubrey On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe <jayen466(a)gmail.com>

wrote:

On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com>

wrote: > > > > > Jane, > > > > > > The issue is that you can't cite one Wikipedia article as a source

> another. > However you can within the same article per [[WP:LEAD]].

Jane Darnell

11:28 a.m.

OK I see now what you mean, and that is an interesting point. I think in this context you need to see the objections to the "Bonnie and Clyde" problem. Now that we have exploded the concepts of Wikipedia into items, our interlinking (which is what Wikidata was built for) is a bit less tightly knit than it was. Some would argue that it's a good thing because we have fewer unresolvable interwiki links and others would argue it's a bad thing because they have less opportunity to redirect readers to material on other projects. Most recently this has come up in the discussions around structured data for commons, but early adopters noticed it immediately in the interlanguage links. The only way forward (or backward, depending on your point of view) is to explode the Wikipedias in a similar way. So for example I like to work on 17th-century paintings and sometimes they are interesting because of their subjects, and sometimes they are interesting because of their provenance, but rarely both, so Wikipedia articles generally deal with both. On Wikidata we will often have items for both (the portrait and the portrayed; or a landscape and the objects depicted in that landscape) and the interwikis link accordingly, which means some interwikis disappear because one language Wikipedia article is talking about the person while another language Wikipedia article is talking about the painting, and so forth. I guess for Wikisource it's similar with "Wikisource editions of biographies of people" vs. items about actual people. On Wed, Dec 16, 2015 at 12:12 PM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com> wrote:

Andrea, I totally agree on the mission/vision thing, but am not sure what you

mean

exactly by scale - do you mean that Wikidata shouldn't try to be so granular that it has a statement to cover each factoid in any Wikipedia article, or do you mean we need to talk about what constitutes notability in order not to grow Wikidata exponentially to the point the servers

crash?

Jane

On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote: > I really feel we are drowning in a glass of water. > The issue of "data quality" or "reliability" that Andreas raises is

well

known: what I don't understand if the "scale" of it is much bigger on Wikidata than Wikipedia, and if this different scale makes it much more important. The scale of

the

issue is maybe something worth discussing, and not the issue itself? Is

the

fact that Wikidata is centralised different from statements on

Wikipedia? I

and > now Wikidata, is made for everyone to contribute, it's open and honest

> being open, vulnerable, prone to errors. But we are transparent, we say > that in advance, we can claim any statement to the smallest detail. Of > course it's difficult, but we can do it. Wikidata, as Lydia said, can > actually have conflicting statements in every item: we "just" have to

put

> them there, as we did to Wikipedia. > > If Google uses our data and they are wrong, that's bad for them. If

they

> correct the errors and do not give us the corrections, that's bad for

> and not ethical from them. The point is: there is no license (for what

know) that can force them to contribute to Wikidata. That is, IMHO, the problem with "over-the-top" actors: they can harness collective

intelligent > and "not give back." Even with CC-BY-SA, they could store (as they are > probably already doing) all the data in their knowledge vault, which is > secret as it is an incredible asset for them. > > I'd be happy to insert a new clause of "forced transparency" in

CC-BY-SA

CC0, but it's not there. So, as we are working via GLAMs with Wikipedia for getting reliable sources and content, we are working with them also for good statements

and

data. Putting good data in Wikidata makes it better, and I don't

understand > what is the problem here (I understand, again, the issue of putting too > much data and still having a small community). > For example: if we are importing different reliable databases, andthe > institutions behind them find it useful and helpful to have an

aggregator

> of identifiers and authority controls, what is the issue? There is

value

aggregating data, because you can spot errors and inconsistencies. It's

not

wrote: > > > On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > > > > > On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com> > wrote: > > > > > > > Jane, > > > > > > > > The issue is that you can't cite one Wikipedia article as a

source

in > > > > another. > > > > > > > > > > > > > However you can within the same article per [[WP:LEAD]]. > > > > > > > > > Well, of course, if there are reliable sources cited in the body of

the

> > article that back up the statements made in the lead. You still need

> > cite a reliable source though; that's Wikipedia 101. > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gerard Meijssen

11:31 a.m.

Hoi, The one thing where Wikidata shines is in connecting sources through identifiers. It connects all Wikipedias through the interwiki links and improving these has been an ongoing process of the last three years. Every week more external identifiers are added and it is in the mix-n-match tool by Magnus that many of these connections are made. As more sources are added, the opportunity grows to compare and curate. The law on copyright hold that you cannot use complete databases but it allows you to compare and curate. When values match, there is no obvious issue. When they do not, it is a matter of signalling the difference and evaluating the opposing values. What we should NOT do is accept any value as 100% correct. Sources are known to be wrong but where everybody agrees, we can at least concentrate on where there is a disagreement, where an investment in time makes the most difference. In this way we do make a positive difference for our own content and by signalling differences at the other end as well. The problem with Andreas argument is that it does not provide any way forward. It may be a problem and then what. By concentrating on what we do best, sharing in the sum of all available knowledge we enable parties to compare their content with all the other parties that have content. We publish where we find a difference and it is then for us and others to do the best we can. Thanks, GerardM On 16 December 2015 at 12:12, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com> wrote:

Andrea, I totally agree on the mission/vision thing, but am not sure what you

mean

crash?

Jane

well

known: what I don't understand if the "scale" of it is much bigger on Wikidata than Wikipedia, and if this different scale makes it much more important. The scale of

the

issue is maybe something worth discussing, and not the issue itself? Is

the

fact that Wikidata is centralised different from statements on

Wikipedia? I

and > now Wikidata, is made for everyone to contribute, it's open and honest

put

> them there, as we did to Wikipedia. > > If Google uses our data and they are wrong, that's bad for them. If

they

> correct the errors and do not give us the corrections, that's bad for

> and not ethical from them. The point is: there is no license (for what

know) that can force them to contribute to Wikidata. That is, IMHO, the problem with "over-the-top" actors: they can harness collective

CC-BY-SA

CC0, but it's not there. So, as we are working via GLAMs with Wikipedia for getting reliable sources and content, we are working with them also for good statements

and

data. Putting good data in Wikidata makes it better, and I don't

aggregator

> of identifiers and authority controls, what is the issue? There is

value

aggregating data, because you can spot errors and inconsistencies. It's

not

source

in > > > > another. > > > > > > > > > > > > > However you can within the same article per [[WP:LEAD]]. > > > > > > > > > Well, of course, if there are reliable sources cited in the body of

the

> > article that back up the statements made in the lead. You still need

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

17 Dec 17 Dec

2:17 p.m.

On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com> wrote:

Andrea, I totally agree on the mission/vision thing, but am not sure what you

mean

crash?

Jane

Exactly: a single point of failure. A system where a single point of failure can have such consequences, potentially corrupting knowledge forever, is a bad system. It's not robust. In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an example of error propagation (which happened entirely without Wikidata's and the Knowledge Graph's help). It took the New Yorker quite a bit of research to piece together and confirm what happened, research which I understand would not have happened if the originator of the hoax had not been willing to talk about his prank. It was the same with the fake Maurice Jarre quotes in Wikipedia[2] that made their way into mainstream press obituaries a few years ago. If the hoaxer had not come forward, no one would have been the wiser. The fake quotes would have remained a permanent part of the historical record. More recent cases include the widely repeated (including by Associated Press, for God's sake, to this day) claim that Joe Streater was involved in the Boston College basketball point shaving scandal[3] and the Amelia Bedelia hoax.[4] If even things people insert as a joke propagate around the globe as a result of this vulnerability, then there is a clear and present potential for purposeful manipulation. We've seen enough cases of that, too.[5] This is not the sort of system the Wikimedia community should be helping to build. The very values at the heart of the Wikimedia movement are about transparency, accountability, multiple points of view, pluralism, democracy, opposing dominance and control by vested interests, and so forth. What is the way forward? Wikidata should, as a matter of urgency, rescind its decision to make its content available under the CC0 licence. Global propagation without attribution is a terrible idea. Quite apart from that, in my opinion Wikidata's CC0 licensing also infringes Wikipedia contributors' rights as enshrined in Wikipedia's CC BY-SA licence, a point Lydia Pintscher did not even contest on the Signpost talk page. As I understand her response,[6] she restricts herself to asserting that the responsibility for any potential licence infringement lies with Wikidata contributors rather than with her and Wikimedia Deutschland. That's passing the buck. If Wikidata is not prepared to follow CC BY-SA, the way DBpedia does[7], the next step should be a DMCA takedown notice for material mass-imported from Wikipedia. And of course, Wikidata needs to step up its efforts to cite verifiable sources. [1] http://www.newyorker.com/tech/elements/how-a-raccoon-became-an-aardvark [2] http://www.theguardian.com/commentisfree/2009/may/04/journalism-obituaries-… [3] http://awfulannouncing.com/2014/guilt-wikipedia-joe-streater-became-falsely… Associated Press: http://bigstory.ap.org/article/list-worst-scandals-college-sports [4] http://www.dailydot.com/lol/amelia-bedelia-wikipedia-hoax/ [5] http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-bus… and http://www.dailydot.com/lifestyle/wikipedia-plastic-surgery-otto-placik-lab… and many others [6] https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Wikipedia_Signpos… [7] http://wiki.dbpedia.org/terms-imprint

...

Of course, the opposite is also true: it's a single point of openness, correction, information. I was just wondering if this different scale is a factor in making Wikipedia and Wikidata different enough to accept/reject Andreas arguments. Andrea

well

known: what I don't understand if the "scale" of it is much bigger on Wikidata than Wikipedia, and if this different scale makes it much more important. The scale of

the

issue is maybe something worth discussing, and not the issue itself? Is

the

fact that Wikidata is centralised different from statements on

Wikipedia? I

and > now Wikidata, is made for everyone to contribute, it's open and honest

put

> them there, as we did to Wikipedia. > > If Google uses our data and they are wrong, that's bad for them. If

they

> correct the errors and do not give us the corrections, that's bad for

> and not ethical from them. The point is: there is no license (for what

know) that can force them to contribute to Wikidata. That is, IMHO, the problem with "over-the-top" actors: they can harness collective

CC-BY-SA

CC0, but it's not there. So, as we are working via GLAMs with Wikipedia for getting reliable sources and content, we are working with them also for good statements

and

data. Putting good data in Wikidata makes it better, and I don't

aggregator

> of identifiers and authority controls, what is the issue? There is

value

aggregating data, because you can spot errors and inconsistencies. It's

not

source

in > > > > another. > > > > > > > > > > > > > However you can within the same article per [[WP:LEAD]]. > > > > > > > > > Well, of course, if there are reliable sources cited in the body of

the

> > article that back up the statements made in the lead. You still need

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gerard Meijssen

2:33 p.m.

Hoi, Andreas, the law is an arse. However the law has it that you cannot license facts. When in distributed processes data is retrieved from Wikipedia, it is the authors who may contest their rights. There is no such thing as collective rights for Wikipedia, all Wikipedias. You may not like this and that is fine. DBpedia has its license in the current way NOT because they care about the license but because they are not interested in a row with Wikipedians on the subject. They are quite happy to share their data with Wikidata and make data retrieved in their processes with a CC-0. Thanks, GerardM On 17 December 2015 at 15:17, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com> wrote:

Andrea, I totally agree on the mission/vision thing, but am not sure what you

mean > exactly by scale - do you mean that Wikidata shouldn't try to be so > granular that it has a statement to cover each factoid in any Wikipedia > article, or do you mean we need to talk about what constitutes

notability

in order not to grow Wikidata exponentially to the point the servers

crash?

Jane

a wikipage: it's an error propagated in many wikipages, and then Google, etc. A single point of failure.

arguments.

Andrea > On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni <

zanni.andrea84(a)gmail.com>

wrote: > I really feel we are drowning in a glass of water. > The issue of "data quality" or "reliability" that Andreas raises is

well > > known: > > what I don't understand if the "scale" of it is much bigger on

Wikidata

> > than Wikipedia, > > and if this different scale makes it much more important. The scale

> the > > issue is maybe something worth discussing, and not the issue itself?

> the > > fact that Wikidata is centralised different from statements on > Wikipedia? I > > don't know, but to me this is a more neutral and interesting

question.

> > > > I often say that the Wikimedia world made quality an "heisemberghian" > > feature: you always have to check if it's there. > > The point is: it's been always like this. > > We always had to check for quality, even when we used Britannica or > > authority controls or whatever "reliable" sources we wanted.

Wikipedia,

> and > > now Wikidata, is made for everyone to contribute, it's open and

honest

in > > being open, vulnerable, prone to errors. But we are transparent, we

say

> > that in advance, we can claim any statement to the smallest detail.

> course it's difficult, but we can do it. Wikidata, as Lydia said, can > actually have conflicting statements in every item: we "just" have to

put

> them there, as we did to Wikipedia. > > If Google uses our data and they are wrong, that's bad for them. If

they

> correct the errors and do not give us the corrections, that's bad for

us > > and not ethical from them. The point is: there is no license (for

what

I > > know) that can force them to contribute to Wikidata. That is, IMHO,

the

> > problem with "over-the-top" actors: they can harness collective > intelligent > > and "not give back." Even with CC-BY-SA, they could store (as they

are

> > probably already doing) all the data in their knowledge vault, which

> secret as it is an incredible asset for them. > > I'd be happy to insert a new clause of "forced transparency" in

CC-BY-SA > or > > CC0, but it's not there. > > > > So, as we are working via GLAMs with Wikipedia for getting reliable > > sources and content, we are working with them also for good

statements

> and > > data. Putting good data in Wikidata makes it better, and I don't > understand > > what is the problem here (I understand, again, the issue of putting

too

> much data and still having a small community). > For example: if we are importing different reliable databases, andthe > institutions behind them find it useful and helpful to have an

aggregator

> of identifiers and authority controls, what is the issue? There is

value > in > > aggregating data, because you can spot errors and inconsistencies.

It's

> not > > easy, of course, to find a good workflow, but, again, that is

*another*

> > problem. > > > > So, in conclusion: I find many issues in Wikidata, but not on the > > mission/vision, just in the complexity of the project, the size of

the

dataset, the size of the community. Can we talk about those? Aubrey On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe <jayen466(a)gmail.com>

source

in > > > > another. > > > > > > > > > > > > > However you can within the same article per [[WP:LEAD]]. > > > > > > > > > Well, of course, if there are reliable sources cited in the body of

the > > > article that back up the statements made in the lead. You still

need

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

18 Dec 18 Dec

8:05 a.m.

Gerard, Of course you can't license or copyright facts, but as the WMF legal team's page on this topic[1] outlines, there are database and compilation rights that exist independently of copyright. IANAL, but as I read that page, if you simply go ahead and copy all the infobox, template etc. content from a Wikipedia, this "would likely be a violation" even under US law (not to mention EU law). I don't know why Wikipedia was set up with a CC BY-SA licence rather than a CC0 licence, and the attribution required under CC BY-SA is unduly cumbersome, but attribution has always seemed to me like a useful concept. The fact that people like VDM Publishing who sell Wikipedia articles as books are required to say that their material comes from Wikipedia is useful, for example. Naturally it fosters re-use if you make Wikidata CC0, but that's precisely the point: you end up with a level of "market dominance" that just ain't healthy. [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni <zanni.andrea84(a)gmail.com wrote: > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com>

wrote:

> > > Andrea, > > I totally agree on the mission/vision thing, but am not sure what you > mean > > exactly by scale - do you mean that Wikidata shouldn't try to be so > > granular that it has a statement to cover each factoid in any

Wikipedia

> article, or do you mean we need to talk about what constitutes

notability > > in order not to grow Wikidata exponentially to the point the servers > crash? > > Jane > > > > > Hi Jane, I explained myself poorly (sometime English is too difficult

:-)

> > What I mean is that the scale of the error *could* be of another scale, > another order of magnitude. > The propagation of the error is multiplied, it's not just a single

error

on > a wikipage: it's an error propagated in many wikipages, and then

Google,

etc. A single point of failure.

piece together and confirm what happened, research which I understand

would

not have happened if the originator of the hoax had not been willing to talk about his prank. It was the same with the fake Maurice Jarre quotes in Wikipedia[2] that made their way into mainstream press obituaries a few years ago. If the hoaxer had not come forward, no one would have been the wiser. The fake quotes would have remained a permanent part of the historical record. More recent cases include the widely repeated (including by Associated Press, for God's sake, to this day) claim that Joe Streater was involved

the Boston College basketball point shaving scandal[3] and the Amelia Bedelia hoax.[4] If even things people insert as a joke propagate around the globe as a result of this vulnerability, then there is a clear and present potential for purposeful manipulation. We've seen enough cases of that, too.[5] This is not the sort of system the Wikimedia community should be helping

build. The very values at the heart of the Wikimedia movement are about transparency, accountability, multiple points of view, pluralism, democracy, opposing dominance and control by vested interests, and so forth. What is the way forward? Wikidata should, as a matter of urgency, rescind its decision to make its content available under the CC0 licence. Global propagation without attribution is a terrible idea. Quite apart from that, in my opinion Wikidata's CC0 licensing also infringes Wikipedia contributors' rights as enshrined in Wikipedia's CC BY-SA licence, a point Lydia Pintscher did not even contest on the

Signpost

talk page. As I understand her response,[6] she restricts herself to asserting that the responsibility for any potential licence infringement lies with Wikidata contributors rather than with her and Wikimedia Deutschland. That's passing the buck. If Wikidata is not prepared to follow CC BY-SA, the way DBpedia does[7], the next step should be a DMCA takedown notice for material mass-imported from Wikipedia. And of course, Wikidata needs to step up its efforts to cite verifiable sources. [1] http://www.newyorker.com/tech/elements/how-a-raccoon-became-an-aardvark [2]

http://www.theguardian.com/commentisfree/2009/may/04/journalism-obituaries-…

[3]

http://awfulannouncing.com/2014/guilt-wikipedia-joe-streater-became-falsely…

Associated Press: http://bigstory.ap.org/article/list-worst-scandals-college-sports [4] http://www.dailydot.com/lol/amelia-bedelia-wikipedia-hoax/ [5]

http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-bus…

and

http://www.dailydot.com/lifestyle/wikipedia-plastic-surgery-otto-placik-lab…

and many others [6]

https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Wikipedia_Signpos…

[7] http://wiki.dbpedia.org/terms-imprint

arguments.

Andrea > On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni <

zanni.andrea84(a)gmail.com>

wrote: > I really feel we are drowning in a glass of water. > The issue of "data quality" or "reliability" that Andreas raises is

well > > known: > > what I don't understand if the "scale" of it is much bigger on

Wikidata

> > than Wikipedia, > > and if this different scale makes it much more important. The scale

of > > the > > > issue is maybe something worth discussing, and not the issue

itself?

> the > > fact that Wikidata is centralised different from statements on > Wikipedia? I > > don't know, but to me this is a more neutral and interesting

question. > > > > > > I often say that the Wikimedia world made quality an

"heisemberghian"

> > feature: you always have to check if it's there. > > The point is: it's been always like this. > > We always had to check for quality, even when we used Britannica or > > authority controls or whatever "reliable" sources we wanted.

Wikipedia,

> and > > now Wikidata, is made for everyone to contribute, it's open and

honest

in > > being open, vulnerable, prone to errors. But we are transparent, we

say > > > that in advance, we can claim any statement to the smallest

detail.

Of > > > course it's difficult, but we can do it. Wikidata, as Lydia said,

can

> > > actually have conflicting statements in every item: we "just" have

> put > > > them there, as we did to Wikipedia. > > > > > > If Google uses our data and they are wrong, that's bad for them. If > they > > > correct the errors and do not give us the corrections, that's bad

for

us > > and not ethical from them. The point is: there is no license (for

what

I > > know) that can force them to contribute to Wikidata. That is, IMHO,

the

> > problem with "over-the-top" actors: they can harness collective > intelligent > > and "not give back." Even with CC-BY-SA, they could store (as they

are > > > probably already doing) all the data in their knowledge vault,

which

is > > > secret as it is an incredible asset for them. > > > > > > I'd be happy to insert a new clause of "forced transparency" in > CC-BY-SA > > or > > > CC0, but it's not there. > > > > > > So, as we are working via GLAMs with Wikipedia for getting

reliable

> > sources and content, we are working with them also for good

statements

> and > > data. Putting good data in Wikidata makes it better, and I don't > understand > > what is the problem here (I understand, again, the issue of putting

too > > > much data and still having a small community). > > > For example: if we are importing different reliable databases,

andthe

> institutions behind them find it useful and helpful to have an

aggregator

> of identifiers and authority controls, what is the issue? There is

value > in > > aggregating data, because you can spot errors and inconsistencies.

It's

> not > > easy, of course, to find a good workflow, but, again, that is

*another*

> > problem. > > > > So, in conclusion: I find many issues in Wikidata, but not on the > > mission/vision, just in the complexity of the project, the size of

the

> > dataset, the size of the community. > > > > Can we talk about those? > > > > Aubrey > > > > > > > > On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe <jayen466(a)gmail.com

> > wrote: > > > > > > > On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > > > > > > > > > On 13 December 2015 at 15:57, Andreas Kolbe <

jayen466(a)gmail.com>

> > > wrote: > > > > > > > > > > > Jane, > > > > > > > > > > > > The issue is that you can't cite one Wikipedia article as a > source > > in > > > > > > another. > > > > > > > > > > > > > > > > > > > > > However you can within the same article per [[WP:LEAD]]. > > > > > > > > > > > > > > > > > Well, of course, if there are reliable sources cited in the body

the > > > article that back up the statements made in the lead. You still

need

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

> > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Peter Southwood

8:24 a.m.

Wikipedia is not about infoboxes, they are (and are intended to be) a small to very small part of the article in most cases. Similarly, Wikipedias are not databases, so also without being a lawyer, I think your interpretation is wrong. Cheers, Peter -----Original Message----- From: Wikimedia-l [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Andreas Kolbe Sent: Friday, 18 December 2015 10:06 AM To: Wikimedia Mailing List Subject: Re: [Wikimedia-l] Quality issues Gerard, Of course you can't license or copyright facts, but as the WMF legal team's page on this topic[1] outlines, there are database and compilation rights that exist independently of copyright. IANAL, but as I read that page, if you simply go ahead and copy all the infobox, template etc. content from a Wikipedia, this "would likely be a violation" even under US law (not to mention EU law). I don't know why Wikipedia was set up with a CC BY-SA licence rather than a CC0 licence, and the attribution required under CC BY-SA is unduly cumbersome, but attribution has always seemed to me like a useful concept. The fact that people like VDM Publishing who sell Wikipedia articles as books are required to say that their material comes from Wikipedia is useful, for example. Naturally it fosters re-use if you make Wikidata CC0, but that's precisely the point: you end up with a level of "market dominance" that just ain't healthy. [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni <zanni.andrea84(a)gmail.com wrote: > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com>

wrote:

> > > Andrea, > > I totally agree on the mission/vision thing, but am not sure > > what you > mean > > exactly by scale - do you mean that Wikidata shouldn't try to be > > so granular that it has a statement to cover each factoid in any

Wikipedia

> article, or do you mean we need to talk about what constitutes

notability > > in order not to grow Wikidata exponentially to the point the > > servers > crash? > > Jane > > > > > Hi Jane, I explained myself poorly (sometime English is too > difficult

:-)

> > What I mean is that the scale of the error *could* be of another > scale, another order of magnitude. > The propagation of the error is multiplied, it's not just a single

error

on > a wikipage: it's an error propagated in many wikipages, and then

Google,

etc. A single point of failure.

piece together and confirm what happened, research which I understand

would

Signpost

http://www.theguardian.com/commentisfree/2009/may/04/journalism-obitua ries-shane-fitzgerald

[3]

http://awfulannouncing.com/2014/guilt-wikipedia-joe-streater-became-fa lsely-attached-boston-college-point-shaving-scandal.html

Associated Press: http://bigstory.ap.org/article/list-worst-scandals-college-sports [4] http://www.dailydot.com/lol/amelia-bedelia-wikipedia-hoax/ [5]

http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogu s-business-school-316133.html

and

http://www.dailydot.com/lifestyle/wikipedia-plastic-surgery-otto-placi k-labiaplasty/

and many others [6]

https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Wikipedia_Si gnpost/2015-12-09/Op-ed&diff=695228403&oldid=695228022

[7] http://wiki.dbpedia.org/terms-imprint

arguments.

Andrea > On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni <

zanni.andrea84(a)gmail.com>

wrote: > I really feel we are drowning in a glass of water. > The issue of "data quality" or "reliability" that Andreas > raises is

well > > known: > > what I don't understand if the "scale" of it is much bigger on

Wikidata

> > than Wikipedia, > > and if this different scale makes it much more important. The > > scale

of > > the > > > issue is maybe something worth discussing, and not the issue

itself?

> the > > fact that Wikidata is centralised different from statements on > Wikipedia? I > > don't know, but to me this is a more neutral and interesting

question. > > > > > > I often say that the Wikimedia world made quality an

"heisemberghian"

> > feature: you always have to check if it's there. > > The point is: it's been always like this. > > We always had to check for quality, even when we used > > Britannica or authority controls or whatever "reliable" sources we wanted.

Wikipedia,

> and > > now Wikidata, is made for everyone to contribute, it's open > > and

honest

in > > being open, vulnerable, prone to errors. But we are > > transparent, we

say > > > that in advance, we can claim any statement to the smallest

detail.

Of > > > course it's difficult, but we can do it. Wikidata, as Lydia > > > said,

can

> > > actually have conflicting statements in every item: we "just" > > > have

> put > > > them there, as we did to Wikipedia. > > > > > > If Google uses our data and they are wrong, that's bad for > > > them. If > they > > > correct the errors and do not give us the corrections, that's > > > bad

for

us > > and not ethical from them. The point is: there is no license > > (for

what

I > > know) that can force them to contribute to Wikidata. That is, > > IMHO,

the

> > problem with "over-the-top" actors: they can harness > > collective > intelligent > > and "not give back." Even with CC-BY-SA, they could store (as > > they

are > > > probably already doing) all the data in their knowledge vault,

which

is > > > secret as it is an incredible asset for them. > > > > > > I'd be happy to insert a new clause of "forced transparency" > > > in > CC-BY-SA > > or > > > CC0, but it's not there. > > > > > > So, as we are working via GLAMs with Wikipedia for getting

reliable

> > sources and content, we are working with them also for good

statements

> and > > data. Putting good data in Wikidata makes it better, and I > > don't > understand > > what is the problem here (I understand, again, the issue of > > putting

too > > > much data and still having a small community). > > > For example: if we are importing different reliable databases,

andthe

> institutions behind them find it useful and helpful to have an

aggregator

> of identifiers and authority controls, what is the issue? > There is

value > in > > aggregating data, because you can spot errors and inconsistencies.

It's

> not > > easy, of course, to find a good workflow, but, again, that is

*another*

> > problem. > > > > So, in conclusion: I find many issues in Wikidata, but not on > > the mission/vision, just in the complexity of the project, the > > size of

the

> > dataset, the size of the community. > > > > Can we talk about those? > > > > Aubrey > > > > > > > > On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe > > <jayen466(a)gmail.com

> > wrote: > > > > > > > On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > > > > > > > > > On 13 December 2015 at 15:57, Andreas Kolbe <

jayen466(a)gmail.com>

> > > wrote: > > > > > > > > > > > Jane, > > > > > > > > > > > > The issue is that you can't cite one Wikipedia article > > > > > > as a > source > > in > > > > > > another. > > > > > > > > > > > > > > > > > > > > > However you can within the same article per [[WP:LEAD]]. > > > > > > > > > > > > > > > > > Well, of course, if there are reliable sources cited in the > > > > body

the > > > article that back up the statements made in the lead. You > > > still

need

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscr ibe>

Jane Darnell

8:34 a.m.

The only infoboxes I have touched on Wikipedia in relation to Wikidata are the ones I created with data from Wikidata with PrepBio and not the other way around. As far as I know there is no tool available to import Wikidata statements from Wikipedia infoboxes. This is why it took so long to get rid of the persondata infoboxes, because the data was not formatted in a way that was easily importable into Wikidata. Eventually the persondata was deleted because the birth/death data was updated in Wikidata, albeit in a different way. Unfortunately we lost all of the alternate spellings that could have been added to the aliases on Wikidata, but I was delighted that Maarten Dammers was able to upload aliases for artists into Wikidata last week from ULAN, which means we now have way more aliases per artist available for searching than we ever had on Wikipedia. On Fri, Dec 18, 2015 at 9:24 AM, Peter Southwood < peter.southwood(a)telkomsa.net> wrote:

...

On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni <zanni.andrea84(a)gmail.com wrote: > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com>

wrote:

Wikipedia

> article, or do you mean we need to talk about what constitutes

notability > > in order not to grow Wikidata exponentially to the point the > > servers > crash? > > Jane > > > > > Hi Jane, I explained myself poorly (sometime English is too > difficult

:-)

> > What I mean is that the scale of the error *could* be of another > scale, another order of magnitude. > The propagation of the error is multiplied, it's not just a single

error

on > a wikipage: it's an error propagated in many wikipages, and then

Google,

etc. A single point of failure.

piece together and confirm what happened, research which I understand

would > not have happened if the originator of the hoax had not been willing > to talk about his prank. > > It was the same with the fake Maurice Jarre quotes in Wikipedia[2] > that made their way into mainstream press obituaries a few years > ago. If the hoaxer had not come forward, no one would have been the > wiser. The fake quotes would have remained a permanent part of the

historical record.

More recent cases include the widely repeated (including by Associated Press, for God's sake, to this day) claim that Joe Streater was involved

Signpost

http://www.theguardian.com/commentisfree/2009/may/04/journalism-obitua ries-shane-fitzgerald

[3]

http://awfulannouncing.com/2014/guilt-wikipedia-joe-streater-became-fa lsely-attached-boston-college-point-shaving-scandal.html

Associated Press: http://bigstory.ap.org/article/list-worst-scandals-college-sports [4] http://www.dailydot.com/lol/amelia-bedelia-wikipedia-hoax/ [5]

http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogu s-business-school-316133.html

and

http://www.dailydot.com/lifestyle/wikipedia-plastic-surgery-otto-placi k-labiaplasty/

and many others [6]

https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Wikipedia_Si gnpost/2015-12-09/Op-ed&diff=695228403&oldid=695228022

[7] http://wiki.dbpedia.org/terms-imprint

arguments.

Andrea > On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni <

zanni.andrea84(a)gmail.com>

> wrote: > > > I really feel we are drowning in a glass of water. > > The issue of "data quality" or "reliability" that Andreas > > raises is well > > known: > > what I don't understand if the "scale" of it is much bigger on

Wikidata

> > than Wikipedia, > > and if this different scale makes it much more important. The > > scale

of > > the > > > issue is maybe something worth discussing, and not the issue

itself?

> the > > fact that Wikidata is centralised different from statements on > Wikipedia? I > > don't know, but to me this is a more neutral and interesting

question. > > > > > > I often say that the Wikimedia world made quality an

"heisemberghian" > > > > feature: you always have to check if it's there. > > > > The point is: it's been always like this. > > > > We always had to check for quality, even when we used > > > > Britannica or authority controls or whatever "reliable" sources

we wanted.

Wikipedia,

> and > > now Wikidata, is made for everyone to contribute, it's open > > and

honest

in > > being open, vulnerable, prone to errors. But we are > > transparent, we

say > > > that in advance, we can claim any statement to the smallest

detail.

Of > > > course it's difficult, but we can do it. Wikidata, as Lydia > > > said,

can

> > > actually have conflicting statements in every item: we "just" > > > have

for

us > > and not ethical from them. The point is: there is no license > > (for

what

I > > know) that can force them to contribute to Wikidata. That is, > > IMHO,

the

> > problem with "over-the-top" actors: they can harness > > collective > intelligent > > and "not give back." Even with CC-BY-SA, they could store (as > > they

are > > > probably already doing) all the data in their knowledge vault,

which

reliable

> > sources and content, we are working with them also for good

statements

> and > > data. Putting good data in Wikidata makes it better, and I > > don't > understand > > what is the problem here (I understand, again, the issue of > > putting

too > > > much data and still having a small community). > > > For example: if we are importing different reliable databases,

andthe > > > > institutions behind them find it useful and helpful to have an > > aggregator > > > > of identifiers and authority controls, what is the issue? > > > > There is > > value > > > in > > > > aggregating data, because you can spot errors and

inconsistencies.

> It's > > > not > > > > easy, of course, to find a good workflow, but, again, that is > *another* > > > > problem. > > > > > > > > So, in conclusion: I find many issues in Wikidata, but not on > > > > the mission/vision, just in the complexity of the project, the > > > > size of > the > > > > dataset, the size of the community. > > > > > > > > Can we talk about those? > > > > > > > > Aubrey > > > > > > > > > > > > > > > > On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe > > > > <jayen466(a)gmail.com > > > > wrote: > > > > > > > > > On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com>

wrote:

> > > > > > > > > On 13 December 2015 at 15:57, Andreas Kolbe <

jayen466(a)gmail.com>

the > > > article that back up the statements made in the lead. You > > > still

need

to > > > cite a reliable source though; that's Wikipedia 101.

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscr > ibe> > _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscrib e>

_______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7294 / Virus Database: 4489/11202 - Release Date: 12/18/15 _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gerard Meijssen

10:21 a.m.

Hoi, The CC-0 license was set up with the express reason that everybody can use our data without any impediment. Our objective is to share in the sum of all knowledge and we are more effective in that way. We do not care about market dominance, we care about doing our utmost to have the best data available. At that I could not care less for theoretical what ifs, I am interested in making a difference in our content because that is where we make a difference. Thanks, GerardM On 18 December 2015 at 09:05, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Hoi, Andreas, the law is an arse. However the law has it that you cannot

license

facts. When in distributed processes data is retrieved from Wikipedia, it is the authors who may contest their rights. There is no such thing as collective rights for Wikipedia, all Wikipedias. You may not like this and that is fine. DBpedia has its license in the current way NOT because they care about

the

license but because they are not interested in a row with Wikipedians on the subject. They are quite happy to share their data with Wikidata and make data retrieved in their processes with a CC-0. Thanks, GerardM On 17 December 2015 at 15:17, Andreas Kolbe <jayen466(a)gmail.com> wrote: > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni <

zanni.andrea84(a)gmail.com

wrote: > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane023(a)gmail.com>

wrote: > > > > > Andrea, > > > I totally agree on the mission/vision thing, but am not sure what

you

> mean > > exactly by scale - do you mean that Wikidata shouldn't try to be so > > granular that it has a statement to cover each factoid in any

Wikipedia > > > article, or do you mean we need to talk about what constitutes > notability > > > in order not to grow Wikidata exponentially to the point the

servers

> crash? > > Jane > > > > > Hi Jane, I explained myself poorly (sometime English is too difficult

:-) > > > > What I mean is that the scale of the error *could* be of another

scale,

> another order of magnitude. > The propagation of the error is multiplied, it's not just a single

error

on > a wikipage: it's an error propagated in many wikipages, and then

Google, > > etc. > > A single point of failure. > > > > > Exactly: a single point of failure. A system where a single point of > failure can have such consequences, potentially corrupting knowledge > forever, is a bad system. It's not robust. > > In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an example

error propagation (which happened entirely without Wikidata's and the Knowledge Graph's help). It took the New Yorker quite a bit of research

piece together and confirm what happened, research which I understand

would > not have happened if the originator of the hoax had not been willing to > talk about his prank. > > It was the same with the fake Maurice Jarre quotes in Wikipedia[2] that > made their way into mainstream press obituaries a few years ago. If the > hoaxer had not come forward, no one would have been the wiser. The fake > quotes would have remained a permanent part of the historical record. > > More recent cases include the widely repeated (including by Associated > Press, for God's sake, to this day) claim that Joe Streater was

involved

in > the Boston College basketball point shaving scandal[3] and the Amelia > Bedelia hoax.[4] > > If even things people insert as a joke propagate around the globe as a > result of this vulnerability, then there is a clear and present

potential

> for purposeful manipulation. We've seen enough cases of that, too.[5] > > This is not the sort of system the Wikimedia community should be

helping

to > build. The very values at the heart of the Wikimedia movement are about > transparency, accountability, multiple points of view, pluralism, > democracy, opposing dominance and control by vested interests, and so > forth. > > What is the way forward? > > Wikidata should, as a matter of urgency, rescind its decision to make

its

content available under the CC0 licence. Global propagation without attribution is a terrible idea. Quite apart from that, in my opinion Wikidata's CC0 licensing also infringes Wikipedia contributors' rights as enshrined in Wikipedia's CC BY-SA licence, a point Lydia Pintscher did not even contest on the

Signpost > talk page. As I understand her response,[6] she restricts herself to > asserting that the responsibility for any potential licence

infringement

> lies with Wikidata contributors rather than with her and Wikimedia > Deutschland. That's passing the buck. > > If Wikidata is not prepared to follow CC BY-SA, the way DBpedia

does[7],

> the next step should be a DMCA takedown notice for material

mass-imported

> from Wikipedia. > > And of course, Wikidata needs to step up its efforts to cite verifiable > sources. > > > [1] >

http://www.newyorker.com/tech/elements/how-a-raccoon-became-an-aardvark

[2]

http://www.theguardian.com/commentisfree/2009/may/04/journalism-obituaries-…

[3]

http://awfulannouncing.com/2014/guilt-wikipedia-joe-streater-became-falsely…

Associated Press: http://bigstory.ap.org/article/list-worst-scandals-college-sports [4] http://www.dailydot.com/lol/amelia-bedelia-wikipedia-hoax/ [5]

http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-bus…

and

http://www.dailydot.com/lifestyle/wikipedia-plastic-surgery-otto-placik-lab…

and many others [6]

https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Wikipedia_Signpos…

> [7] http://wiki.dbpedia.org/terms-imprint > > > > Of course, the opposite is also true: it's a single point of

openness,

> > correction, information. > > I was just wondering if this different scale is a factor in making > > Wikipedia and Wikidata different enough to accept/reject Andreas > arguments. > > > > Andrea > > > > > > > > > On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni < > zanni.andrea84(a)gmail.com> > > > wrote: > > > > > > > I really feel we are drowning in a glass of water. > > > > The issue of "data quality" or "reliability" that Andreas raises

> > well > > > > known: > > > > what I don't understand if the "scale" of it is much bigger on > Wikidata > > > > than Wikipedia, > > > > and if this different scale makes it much more important. The

scale

of > > the > > > issue is maybe something worth discussing, and not the issue

itself?

> the > > fact that Wikidata is centralised different from statements on > Wikipedia? I > > don't know, but to me this is a more neutral and interesting

question. > > > > > > I often say that the Wikimedia world made quality an

"heisemberghian" > > > > feature: you always have to check if it's there. > > > > The point is: it's been always like this. > > > > We always had to check for quality, even when we used Britannica

> > > > authority controls or whatever "reliable" sources we wanted. > Wikipedia, > > > and > > > > now Wikidata, is made for everyone to contribute, it's open and > honest > > in > > > > being open, vulnerable, prone to errors. But we are transparent,

say > > > that in advance, we can claim any statement to the smallest

detail.

Of > > > course it's difficult, but we can do it. Wikidata, as Lydia said,

can > > > > actually have conflicting statements in every item: we "just"

have

to > > put > > > > them there, as we did to Wikipedia. > > > > > > > > If Google uses our data and they are wrong, that's bad for them.

> they > > > correct the errors and do not give us the corrections, that's bad

for > > us > > > > and not ethical from them. The point is: there is no license (for > what > > I > > > > know) that can force them to contribute to Wikidata. That is,

IMHO,

> the > > > > problem with "over-the-top" actors: they can harness collective > > > intelligent > > > > and "not give back." Even with CC-BY-SA, they could store (as

they

are > > > probably already doing) all the data in their knowledge vault,

which

reliable > > > > sources and content, we are working with them also for good > statements > > > and > > > > data. Putting good data in Wikidata makes it better, and I don't > > > understand > > > > what is the problem here (I understand, again, the issue of

putting

too > > > much data and still having a small community). > > > For example: if we are importing different reliable databases,

andthe > > > > institutions behind them find it useful and helpful to have an > > aggregator > > > > of identifiers and authority controls, what is the issue? There

> > value > > > in > > > > aggregating data, because you can spot errors and

inconsistencies.

> It's > > > not > > > > easy, of course, to find a good workflow, but, again, that is > *another* > > > > problem. > > > > > > > > So, in conclusion: I find many issues in Wikidata, but not on the > > > > mission/vision, just in the complexity of the project, the size

> the > > > > dataset, the size of the community. > > > > > > > > Can we talk about those? > > > > > > > > Aubrey > > > > > > > > > > > > > > > > On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe <

jayen466(a)gmail.com

> > > > wrote: > > > > > > > > > On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com>

wrote:

> > > > > > > > > On 13 December 2015 at 15:57, Andreas Kolbe <

jayen466(a)gmail.com> > > > > wrote: > > > > > > > > > > > > > Jane, > > > > > > > > > > > > > > The issue is that you can't cite one Wikipedia article as a > > source > > > in > > > > > > > another. > > > > > > > > > > > > > > > > > > > > > > > > > However you can within the same article per [[WP:LEAD]]. > > > > > > > > > > > > > > > > > > > > > Well, of course, if there are reliable sources cited in the

body

the > > > article that back up the statements made in the lead. You still

need

to > > > cite a reliable source though; that's Wikipedia 101.

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

11:04 a.m.

On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood < peter.southwood(a)telkomsa.net

...

wrote:

...

If you look at the Meta document I linked, you'll find that the definition of a database provided there is quite broad: ---o0o---

...

From a legal perspective, a database is any organized collection of

materials — hard copy or electronic — that permits a user to search for and access individual pieces of information contained within the materials. No database software, as a programmer would understand it, is necessary. In the US, for example, Black’s Law Dictionary defines a database as a "compilation of information arranged in a systematic way and offering a means of finding specific elements it contains, often today by electronic means."[1] Databases may be protected by US copyright law as "compilations." In the EU, databases are protected by the Database Directive, which defines a database as "a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means." ---o0o--- You could argue that the sum of Wikipedia's harvestable infoboxes, templates etc. constitutes a database, according to those definitions. There is also the argument about the benefit of attribution, as opposed to having data appear out of nowhere in a way that is completely opaque to end users. On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com

...

wrote:

...

We do not care about market dominance, we care about doing our utmost to have the best data available.

Are these not just well-worn platitudes? If you cared so much about quality, you or someone else would have fixed the Grasulf II of Friuli entry by now. > On 18 December 2015 at 09:05, Andreas Kolbe <jayen466(a)gmail.com

...

wrote:

> > > Gerard, > > > > Of course you can't license or copyright facts, but as the WMF legal > team's > > page on this topic[1] outlines, there are database and compilation rights > > that exist independently of copyright. IANAL, but as I read that page, if > > you simply go ahead and copy all the infobox, template etc. content from > a > > Wikipedia, this "would likely be a violation" even under US law (not to > > mention EU law). > > > > I don't know why Wikipedia was set up with a CC BY-SA licence rather > than a > > CC0 licence, and the attribution required under CC BY-SA is unduly > > cumbersome, but attribution has always seemed to me like a useful > concept. > > The fact that people like VDM Publishing who sell Wikipedia articles as > > books are required to say that their material comes from Wikipedia is > > useful, for example. > > > > Naturally it fosters re-use if you make Wikidata CC0, but that's > precisely > > the point: you end up with a level of "market dominance" that just ain't > > healthy. > > > > [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights > >

Peter Southwood

11:19 a.m.

Depending on how broad you want to stretch it, that covers an encyclopaedia or even a public library. Not particularly helpful. Also there is the matter of how much is taken from it in the form of data, there is likely to be much more data available in the articles than is or will ever be used by Wikidata. You could equally, possibly more convincingly, argue that the sum of Wikipedia's infoboxes, templates etc does not constitute a database, particularly since that was not the intention, and they have not been applied consistently and/or systematically to the whole project. Cheers, P -----Original Message----- From: Wikimedia-l [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Andreas Kolbe Sent: Friday, 18 December 2015 1:05 PM To: Wikimedia Mailing List Subject: Re: [Wikimedia-l] Quality issues On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood < peter.southwood(a)telkomsa.net

...

wrote:

...

If you look at the Meta document I linked, you'll find that the definition of a database provided there is quite broad: ---o0o---

...

From a legal perspective, a database is any organized collection of materials — hard copy or electronic — that permits a user to search for and access individual pieces of information contained within the materials. No database software, as a programmer would understand it, is necessary. In the US, for example, Black’s Law Dictionary defines a database as a "compilation of information arranged in a systematic way and offering a means of finding specific elements it contains, often today by electronic means."[1] Databases may be protected by US copyright law as "compilations." In the EU, databases are protected by the Database Directive, which defines a database as "a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means."

---o0o--- You could argue that the sum of Wikipedia's harvestable infoboxes, templates etc. constitutes a database, according to those definitions. There is also the argument about the benefit of attribution, as opposed to having data appear out of nowhere in a way that is completely opaque to end users. On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com

...

wrote:

...

We do not care about market dominance, we care about doing our utmost to have the best data available.

...

wrote:

> > > Gerard, > > > > Of course you can't license or copyright facts, but as the WMF legal > team's > > page on this topic[1] outlines, there are database and compilation > > rights that exist independently of copyright. IANAL, but as I read > > that page, if you simply go ahead and copy all the infobox, template > > etc. content from > a > > Wikipedia, this "would likely be a violation" even under US law (not > > to mention EU law). > > > > I don't know why Wikipedia was set up with a CC BY-SA licence rather > than a > > CC0 licence, and the attribution required under CC BY-SA is unduly > > cumbersome, but attribution has always seemed to me like a useful > concept. > > The fact that people like VDM Publishing who sell Wikipedia articles > > as books are required to say that their material comes from > > Wikipedia is useful, for example. > > > > Naturally it fosters re-use if you make Wikidata CC0, but that's > precisely > > the point: you end up with a level of "market dominance" that just > > ain't healthy. > > > > [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights > > _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2016.0.7294 / Virus Database: 4489/11202 - Release Date: 12/18/15

Gerard Meijssen

12:05 p.m.

Hoi, I have made changes to Grasulf II and I believe it is better because of it. If you find fault, you can do what I often do: make a difference.. Yes, I do edit Wikipedia occasionally based on the info that I find. Thanks, GerardM On 18 December 2015 at 12:04, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood < peter.southwood(a)telkomsa.net> wrote:

your

interpretation is wrong.

If you look at the Meta document I linked, you'll find that the definition of a database provided there is quite broad: ---o0o--- From a legal perspective, a database is any organized collection of materials — hard copy or electronic — that permits a user to search for and access individual pieces of information contained within the materials. No database software, as a programmer would understand it, is necessary. In the US, for example, Black’s Law Dictionary defines a database as a "compilation of information arranged in a systematic way and offering a means of finding specific elements it contains, often today by electronic means."[1] Databases may be protected by US copyright law as "compilations." In the EU, databases are protected by the Database Directive, which defines a database as "a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means." ---o0o--- You could argue that the sum of Wikipedia's harvestable infoboxes, templates etc. constitutes a database, according to those definitions. There is also the argument about the benefit of attribution, as opposed to having data appear out of nowhere in a way that is completely opaque to end users. On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen < gerard.meijssen(a)gmail.com

wrote:

Hoi, The CC-0 license was set up with the express reason that everybody can

use

our data without any impediment. Our objective is to share in the sum of all knowledge and we are more effective in that way.

We do not care about market dominance, we care about doing our utmost to have the best data available.

Are these not just well-worn platitudes? If you cared so much about quality, you or someone else would have fixed the Grasulf II of Friuli entry by now.

On 18 December 2015 at 09:05, Andreas Kolbe <jayen466(a)gmail.com> wrote:

Gerard, Of course you can't license or copyright facts, but as the WMF legal

team's > page on this topic[1] outlines, there are database and compilation

rights

> that exist independently of copyright. IANAL, but as I read that page,

> you simply go ahead and copy all the infobox, template etc. content

from

Wikipedia, this "would likely be a violation" even under US law (not to mention EU law). I don't know why Wikipedia was set up with a CC BY-SA licence rather

than a

CC0 licence, and the attribution required under CC BY-SA is unduly cumbersome, but attribution has always seemed to me like a useful

concept.

The fact that people like VDM Publishing who sell Wikipedia articles as books are required to say that their material comes from Wikipedia is useful, for example. Naturally it fosters re-use if you make Wikidata CC0, but that's

precisely > the point: you end up with a level of "market dominance" that just

ain't

healthy. [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights

Andrea Zanni

11:28 a.m.

On Thu, Dec 17, 2015 at 3:17 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

A single point of failure.

Exactly: a single point of failure. A system where a single point of failure can have such consequences, potentially corrupting knowledge forever, is a bad system. It's not robust.

Andreas, you apparently did not read the following sentence: "Of course, the opposite is also true: it's a single point of openness, correction, information. " At last, I agree with Gerard: you seem not to accept people arguments and continue to reiterate yours again and again. The problem, to me, is that you don't like Wikis: you don't like that they are open, and prone to errors and vulnerable. Yet, this is our greatest weakness and strength, at the same time. The Wikimedia movement, at least for the last 15 years, believes in this, is one of our pillars. So, if you don't like it, maybe the Wikimedia movements is not suitable for you, maybe you'd like more working in Citizendium or something. There's no shame in it, and I really believe it: it's just a matter of choice. I personally choose to believe in openness as a way to leverage good will from people, willingness to share knowledge. I believe Wikidata is going in the same direction, and I have not found evidence yet that the "power and centralisation" of data make the openness a problem of a different magnitudo, different from Wikipedia. I'm happy to discuss this point specifically, as I think we can have a reasonable and constructive debate on this. But if you reiterate examples on Wikipedia, you lose me. We already have taken a choice, we believe that the payoff between openness and control is worth it.

...

Are these not just well-worn platitudes? If you cared so much about quality, you or someone else would have fixed the Grasulf II of Friuli entry by now.

You are included in the set of "someone else", you found all the errors, and you could have corrected them. You decided it was best to write a very long mail instead of correcting them. It's you're right, but it's not the wikimedia way. The Wikimedia way is wonderfully explained in three magical words: so fix it [1]. Aubrey [1] https://en.wikipedia.org/wiki/Template:Sofixit

Andreas Kolbe

3:06 p.m.

On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

Andreas, you apparently did not read the following sentence: "Of course, the opposite is also true: it's a single point of openness, correction, information. "

Andrea, I understand and appreciate your point, but I would like you to consider that what you say may be less true of Wikidata than it is for other Wikimedia wikis, for several reasons: Wikipedia, Wiktionary etc. are functionally open and correctable because people by and large view their content on Wikipedia, Wiktionary etc. itself (or in places where the provenance is clearly indicated, thanks to CC BY-SA). The place where you read it is the same place where you can edit it. There is an "Edit" tab, and it really *is* easy to change the content. (It is certainly easy to correct a typo, which is how many of us started.) With Wikidata, this is different. Wikidata, as a semantic wiki, is designed to be read by machines. These machines don't edit, they *propagate*. Wikidata is not a site that end users--human beings--will browse and consult the way people consult Wikipedia, Wiktionary, Commons, etc. Wikidata is, or will be, of interest mostly to re-users--search engines and other intermediaries who will use its machine-readable data as an input to build and design their own content. And when they use Wikidata as an input, they don't have to acknowledge the source. Allowing unattributed re-use may *seem* more open. But I contend that in practice it makes Wikidata *less* open as a wiki: because when people don't know where the information comes from, they are also unable to contribute at source. The underlying Wikimedia project effectively becomes invisible to them, a closed book. That is not good for a crowdsourced project from multiple points of view. Firstly, it impedes recruitment. Far fewer consumers of Wikidata information will become Wikidata editors, because they will typically find Wikidata content on other sites where Wikidata is not even mentioned. Secondly, it reduces transparency. Data provenance is important, as Mark Graham and Heather Ford have pointed out. Thirdly, it fails to encourage appropriate vigilance in the consumer. (The error propagation problems I've described in this thread all involved unattributed re-use of Wikimedia content.) There are other reasons why Wikidata is less open, besides CC0 and the lack of attribution. Wikidata is the least user-friendly Wikimedia wiki. The hurdle that newbies--even experienced Wikimedians--have to overcome to contribute is an order of magnitude higher than it is for other Wikimedia projects. For a start, there is no Edit tab at the top of the page. When you go to Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not to be found anywhere on the page. It does not look like a page you can edit (and indeed, members of the public can't edit it). It took me a while to figure out that the item is protected (just like the Jerusalem item). In other Wikimedia wikis that do have an "Edit" tab, that tab changes to "View source" if the page is protected, giving a visual indication of the page's status that people--Wikimedia insiders at least--can recognise. Unprotected Wikidata items do have "edit" and "add" links, but they are less prominent. (The "add" link for adding new properties is hidden away at the very bottom of the page.) And when you do click "edit" or "add", it is not obvious what you are supposed to do, the way it is in text-based wikis. The learning curve involved in actually editing a Wikidata item is far steeper than it is in other Wikimedia wikis. There is no Wikidata equivalent of the "correcting a typo" edit in Wikipedia. You need to go away and learn the syntax before you can do anything at all in Wikidata. For all of these reasons I believe the systemic balance between information delivery (output) and ease of contribution (input) is substantially different for Wikidata than it is for any other Wikimedia wiki.

...

So, if you don't like it, maybe the Wikimedia movements is not suitable for you, maybe you'd like more working in Citizendium or something. There's no shame in it, and I really believe it: it's just a matter of choice.

I have been contributing to Wikimedia projects for ten years now. I consider it an important movement to be involved in, exactly per your arguments about openness and public involvement above. If openness is a strength, then it follows that Wikimedia as a movement is stronger for debate and dissent. On a more personal level, I find the idea of free knowledge inspiring. At every Wikimedia event I have attended, that excitement and the joy of creation are in the air and communicate themselves. I relate to it, and share in it. There are many Wikimedia content creators whose I work I admire and respect, and who have become friends. But I don't share the quasi-religious zeal that seems to suffuse some of the public discourse in the Wikimedia movement around free knowledge. In fact I find it subtly troubling. In actual practice, I see substantial downsides as well as upsides to the work the Wikimedia community is doing. But to be honest, whenever I meet other Wikimedians, they seem to see plenty of downsides too. :) Keeping sight of the downsides is important if you want to provide a better service to the public.

...

I personally choose to believe in openness as a way to leverage good will from people, willingness to share knowledge. I believe Wikidata is going in the same direction, and I have not found evidence yet that the "power and centralisation" of data make the openness a problem of a different magnitudo, different from Wikipedia. I'm happy to discuss this point specifically, as I think we can have a reasonable and constructive debate on this.

Gerard Meijssen

6:20 p.m.

Hoi, Andreas you have a point. The point you make that Wikidata is only considered for re-use is compelling. I edit very much but I do NOT use Wikidata to understand what data is there. It is a mess and not fit for humans. This however is not necessarily true. Magnus created the "Reasonator" and it provides me with an environment that helps me understand what data is available. It makes information out of data and, it is actionable in many ways. It is not really hard to make a native Reasonator and, it will be usable in any language as it is. It will make a big difference because it does negate the negative arguments that you make. It is imho the biggest hurdle for Wikidata and it is totally unnecessary for the Wikidata team to persist in their lack of a usable user interface. It is a matter of priority. For your information, a German university is considering the use of Wikidata and for them a Reasonator like interface that allows them to edit as well is what is missing for them to go ahead with Wikidata at this time. They would use Wikidata for science and, for them the ability to link from Wikidata to any and all other resources is a relevant of consideration. They are interesting to share their data. They do not mind that it becomes available under CC-0, what they look for is a best practice where their data becomes available with a reference. We all agree that this IS a best practice. They are as interested to learn where Wikidata disagrees because to them it is a matter of quality to get things exactly right. Thanks, GerardM On 18 December 2015 at 16:06, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

Andreas, you apparently did not read the following sentence: "Of course, the opposite is also true: it's a single point of openness, correction, information. "

So, if you don't like it, maybe the Wikimedia movements is not suitable

for

you, maybe you'd like more working in Citizendium or something. There's

shame in it, and I really believe it: it's just a matter of choice.

I personally choose to believe in openness as a way to leverage good will from people, willingness to share knowledge. I believe Wikidata is going

the same direction, and I have not found evidence yet that the "power and centralisation" of data make the openness a problem of a different magnitudo, different from Wikipedia. I'm happy to discuss this point specifically, as I think we can have a reasonable and constructive debate on this.

In part, this will depend on how and by whom the content will be re-used, and how aware end users will be where the data comes from. I think right now, it is too early to say. Matters are not helped by the fact that without attribution, it will be very hard for us--or indeed anyone else--to track down who is and who is not using Wikidata content. Search engines in particular are very secretive about their sources of information. Regards, Andreas [1] https://www.wikidata.org/wiki/Q76 _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

19 Dec 19 Dec

6:06 p.m.

On Fri, Dec 18, 2015 at 6:20 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

Magnus created the "Reasonator" and it provides me with an environment that helps me understand what data is available. It makes information out of data and, it is actionable in many ways.

This may be taking us off-topic a little bit, but Magnus does great work. I've never understood why the WMF doesn't implement more of his ideas.

Lydia Pintscher

20 Dec 20 Dec

11:25 a.m.

On Fri, Dec 18, 2015 at 4:06 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

Andreas, you apparently did not read the following sentence: "Of course, the opposite is also true: it's a single point of openness, correction, information. "

You are used to the edit tab being there. Someone recently said on Twitter this is the most displayed invisible link on the internet. All a matter of perspective and what we are used to ;-)

...

With Wikidata, this is different. Wikidata, as a semantic wiki, is designed to be read by machines. These machines don't edit, they *propagate*. Wikidata is not a site that end users--human beings--will browse and consult the way people consult Wikipedia, Wiktionary, Commons, etc.

Machines (with people behind them) _do_ edit Wikidata. Wikidata is designed to be read and written my both humans and machines. And it is used that way.

...

Wikidata is, or will be, of interest mostly to re-users--search engines and other intermediaries who will use its machine-readable data as an input to build and design their own content. And when they use Wikidata as an input, they don't have to acknowledge the source. Allowing unattributed re-use may *seem* more open. But I contend that in practice it makes Wikidata *less* open as a wiki: because when people don't know where the information comes from, they are also unable to contribute at source. The underlying Wikimedia project effectively becomes invisible to them, a closed book. That is not good for a crowdsourced project from multiple points of view. Firstly, it impedes recruitment. Far fewer consumers of Wikidata information will become Wikidata editors, because they will typically find Wikidata content on other sites where Wikidata is not even mentioned.

That is why I am working with re-users of Wikidata's data on this. They can link to Wikidata. They can build ways to let their users edit in-place. inventaire and Histropedia are two projects that show the start of this. As I wrote in my Signpost piece it needs work and education that is ongoing.

...

Secondly, it reduces transparency. Data provenance is important, as Mark Graham and Heather Ford have pointed out. Thirdly, it fails to encourage appropriate vigilance in the consumer. (The error propagation problems I've described in this thread all involved unattributed re-use of Wikimedia content.) There are other reasons why Wikidata is less open, besides CC0 and the lack of attribution. Wikidata is the least user-friendly Wikimedia wiki. The hurdle that newbies--even experienced Wikimedians--have to overcome to contribute is an order of magnitude higher than it is for other Wikimedia projects.

Granted Wikidata isn't the most userfriendly at this point - which is why we are working on improvements in that area. Some of them have gone live just the other week. More will go live in January.

...

For a start, there is no Edit tab at the top of the page. When you go to Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not to be found anywhere on the page. It does not look like a page you can edit (and indeed, members of the public can't edit it).

Now please go to any other page that is not protected. It has edit links plastered all over it. Editing there is much much more obvious than on Wikipedia. I really encourage you to actually go and edit on Wikidata for longer than 2 minutes.

...

It took me a while to figure out that the item is protected (just like the Jerusalem item).

We have a lock icon in the top right corner to indicate protected items like this.

...

In other Wikimedia wikis that do have an "Edit" tab, that tab changes to "View source" if the page is protected, giving a visual indication of the page's status that people--Wikimedia insiders at least--can recognise. Unprotected Wikidata items do have "edit" and "add" links, but they are less prominent. (The "add" link for adding new properties is hidden away at the very bottom of the page.) And when you do click "edit" or "add", it is not obvious what you are supposed to do, the way it is in text-based wikis.

It is not a text-based wiki. So yes some things work differently. That doesn't necessarily mean they are worse. I dispute your claim that the edit links on Wikidata are less prominent than on Wikipedia.

...

The learning curve involved in actually editing a Wikidata item is far steeper than it is in other Wikimedia wikis. There is no Wikidata equivalent of the "correcting a typo" edit in Wikipedia. You need to go away and learn the syntax before you can do anything at all in Wikidata.

There is the equivalent of fixing a typo. All edits on Wikidata are much more similar to a typo fix on Wikipedia than not. Fixing the year in the date of birth of a person for example is pretty quick and I'd argue easy. And since it is editing in place it is arguably easier than finding the date in a Wikipedia article's infobox template. Please go and actually try it our without prejudice. I am the first to admit that we still have a long way to go when it comes to usability on Wikidata but the things you bring up are not it, Andreas. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Andrea Zanni

12:32 p.m.

I second all Lydia's answers. Also, I do think that there is a huge difference between usability/UX issues and core, fundamental, systemic issues. I personally think, Andreas, that you are displaying usability issues, which are solvable (not easy, and not trivial, but at least can be fixed). Regarding the CC0 vs CC-BY-SA problem, I don't think a single switch between license would solve all the attribution problem: it hasn't solved propagation of errors in the past with Wikipedia, I don't really get how it could solve propagation of errors for Wikidata (we do know, though, that it would bring a hell of issues for Wikidata itaself). Aubrey On Sun, Dec 20, 2015 at 12:25 PM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote:

...

On Fri, Dec 18, 2015 at 4:06 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni <zanni.andrea84(a)gmail.com wrote:

Andreas, you apparently did not read the following sentence: "Of course, the opposite is also true: it's a single point of openness, correction, information. "

itself

(or in places where the provenance is clearly indicated, thanks to CC BY-SA). The place where you read it is the same place where you can edit it. There is an "Edit" tab, and it really *is* easy to change the

content.

(It is certainly easy to correct a typo, which is how many of us

started.) You are used to the edit tab being there. Someone recently said on Twitter this is the most displayed invisible link on the internet. All a matter of perspective and what we are used to ;-)

With Wikidata, this is different. Wikidata, as a semantic wiki, is

designed

to be read by machines. These machines don't edit, they *propagate*. Wikidata is not a site that end users--human beings--will browse and consult the way people consult Wikipedia, Wiktionary, Commons, etc.

Machines (with people behind them) _do_ edit Wikidata. Wikidata is designed to be read and written my both humans and machines. And it is used that way.

Wikidata is, or will be, of interest mostly to re-users--search engines

and

other intermediaries who will use its machine-readable data as an input

build and design their own content. And when they use Wikidata as an

input,

they don't have to acknowledge the source. Allowing unattributed re-use may *seem* more open. But I contend that in practice it makes Wikidata *less* open as a wiki: because when people

don't

know where the information comes from, they are also unable to contribute at source. The underlying Wikimedia project effectively becomes invisible to them, a closed book. That is not good for a crowdsourced project from multiple points of view. Firstly, it impedes recruitment. Far fewer consumers of Wikidata information will become Wikidata editors, because they will typically

find

Wikidata content on other sites where Wikidata is not even mentioned.

Secondly, it reduces transparency. Data provenance is important, as Mark Graham and Heather Ford have pointed out. Thirdly, it fails to encourage appropriate vigilance in the consumer.

(The

error propagation problems I've described in this thread all involved unattributed re-use of Wikimedia content.) There are other reasons why Wikidata is less open, besides CC0 and the

lack

of attribution. Wikidata is the least user-friendly Wikimedia wiki. The hurdle that newbies--even experienced Wikimedians--have to overcome to contribute is

order of magnitude higher than it is for other Wikimedia projects.

Granted Wikidata isn't the most userfriendly at this point - which is why we are working on improvements in that area. Some of them have gone live just the other week. More will go live in January.

For a start, there is no Edit tab at the top of the page. When you go to Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not

be found anywhere on the page. It does not look like a page you can edit (and indeed, members of the public can't edit it).

It took me a while to figure out that the item is protected (just like

the

Jerusalem item).

We have a lock icon in the top right corner to indicate protected items like this.

the very bottom of the page.) And when you do click "edit" or "add", it

not obvious what you are supposed to do, the way it is in text-based

wikis. It is not a text-based wiki. So yes some things work differently. That doesn't necessarily mean they are worse. I dispute your claim that the edit links on Wikidata are less prominent than on Wikipedia.

Andreas Kolbe

1:18 p.m.

Lydia, I can only relate my impressions to you. The first two items I looked at (Jerusalem and Obama) happened to be protected, so on my first visit I was completely non-plussed as to how to edit anything on Wikidata. I never noticed the lock icon (whereas I would have noticed, say, a coloured box at the top of the page informing me that the item is locked). If I had been just a random user, I would not have been back. Once I got over that one, I found the order in which statements are listed completely confusing. I would have expected them to follow some logical order, but it seems they are permanently *listed in the order in which they were added to Wikidata*. So someone's date of birth can be the last statement on a Wikidata page, or the first. Compare for example the location of the date of birth for Angela Merkel in https://www.wikidata.org/wiki/Q567 to the location of Barack Obama's date of birth in https://www.wikidata.org/wiki/Q76 I tried to figure out a way to change the order, but couldn't find one. Again, profoundly demotivating. Machines may not be bothered by this, because they can instantly find what they are looking for, but people are. It might help to establish a default order for statements that makes logical sense to a human being, and that people can become used to. As for actual editing, a few weeks ago, figuring out how to add an IBM subsidiary to the IBM item, with a reference, must have taken me something like half an hour. I read Wikidata:Introduction, learned about properties, and then checked Help:Editing, which contained *nothing* about adding properties. The word is not even mentioned. After clicking "add" in the *existing* subsidiaries statement for IBM item, I saw a question mark icon with a "help text" that reads, ---o0o--- Enter a value corresponding to the property named "subsidiaries". If the property has no designated value or the actual value is not known, you may choose an alternative to specifying a custom value by clicking the icon next to the value input box. ---o0o--- I didn't find this text helpful at all. It could have simply said, "Enter the name of the subsidiary in the text box, and then add a reference." At any rate, this is what I did. After I clicked "add reference", I got a new field that came with a "property" drop down menu pre-populated with "sex or gender", "date of birth", "given name", "occupation", "country of citizenship", "GND identifier" and "image", none of which are remotely relevant to entering a reference. The single property that would be most useful to list in that drop down menu when people have said they want to add a reference is "reference URL". But it's not included. If newbies don't know this property exists, how are they supposed to discover it? Somehow I got there, but it was not enjoyable. These are indeed all user interface issues, and quite separate from the other aspects we have been talking about. But they contribute to making this wiki less attractive as a site that ordinary people might want to contribute to manually, on a casual basis. Yes, if you are sufficiently motivated, you can figure things out. But as things stand, I didn't find it inviting. On Sun, Dec 20, 2015 at 11:25 AM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote:

...

Use a licence that requires re-users to mention "Wikidata" on their sites, ideally with a link to the Wikidata disclaimer, and you won't have to do any education at all, and at the same time you'll have done a great thing for transparency of data provenance on the internet. Moreover, you will have ensured that hundreds of millions of Internet users are told where they can find Wikidata and edit it. Surely, if you actually *want* to have human beings visiting and editing your wiki, that's in your interest? Andreas

Lydia Pintscher

1:38 p.m.

On Sun, Dec 20, 2015 at 2:18 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Lydia, I can only relate my impressions to you. The first two items I looked at

Now we're getting somewhere ;-)

...

(Jerusalem and Obama) happened to be protected, so on my first visit I was completely non-plussed as to how to edit anything on Wikidata. I never noticed the lock icon (whereas I would have noticed, say, a coloured box at the top of the page informing me that the item is locked). If I had been just a random user, I would not have been back.

Ok. I think we can make the icon more colorful for example to draw more attention to it. Mind you the icon is on-line with what you see on Wikipedia as well. That is why we have it.

...

Once I got over that one, I found the order in which statements are listed completely confusing. I would have expected them to follow some logical order, but it seems they are permanently *listed in the order in which they were added to Wikidata*. So someone's date of birth can be the last statement on a Wikidata page, or the first. Compare for example the location of the date of birth for Angela Merkel in https://www.wikidata.org/wiki/Q567 to the location of Barack Obama's date of birth in https://www.wikidata.org/wiki/Q76 I tried to figure out a way to change the order, but couldn't find one. Again, profoundly demotivating. Machines may not be bothered by this, because they can instantly find what they are looking for, but people are. It might help to establish a default order for statements that makes logical sense to a human being, and that people can become used to.

Yes that is indeed one of the problems we have identified for quite some time already. It is high on the list for 2016. I hope we get to it in Q1.

...

As for actual editing, a few weeks ago, figuring out how to add an IBM subsidiary to the IBM item, with a reference, must have taken me something like half an hour. I read Wikidata:Introduction, learned about properties, and then checked Help:Editing, which contained *nothing* about adding properties. The word is not even mentioned.

Ok so on-wiki documentation is not good enough. Point taken. It has been written by editors who are familiar with Wikidata. Giving feedback on the talk pages for those help pages would be valuable.

...

After clicking "add" in the *existing* subsidiaries statement for IBM item, I saw a question mark icon with a "help text" that reads, ---o0o--- Enter a value corresponding to the property named "subsidiaries". If the property has no designated value or the actual value is not known, you may choose an alternative to specifying a custom value by clicking the icon next to the value input box. ---o0o--- I didn't find this text helpful at all. It could have simply said, "Enter the name of the subsidiary in the text box, and then add a reference."

It is not as easy as that unfortunately. Potentially no item exists for that subsidiary and then you need to create one. Also the explanation for no-value and some-value in the text is important (though we need to improve the UI for them). But point taken we can improve this message.

...

At any rate, this is what I did. After I clicked "add reference", I got a new field that came with a "property" drop down menu pre-populated with "sex or gender", "date of birth", "given name", "occupation", "country of citizenship", "GND identifier" and "image", none of which are remotely relevant to entering a reference.

Those should not have shown up for references and I am not aware of issues with that. Which statement was this specifically? The suggestions are not always perfect but at least the distinction between properties in the main part of the statement and its references should work very well.

...

The single property that would be most useful to list in that drop down menu when people have said they want to add a reference is "reference URL". But it's not included. If newbies don't know this property exists, how are they supposed to discover it? Somehow I got there, but it was not enjoyable.

As above this should have shown up.

...

These are indeed all user interface issues, and quite separate from the other aspects we have been talking about. But they contribute to making this wiki less attractive as a site that ordinary people might want to contribute to manually, on a casual basis. Yes, if you are sufficiently motivated, you can figure things out. But as things stand, I didn't find it inviting.

Sure. As I said we still have quite some work to do and feedback such as the above is what will help us make it better.

...

On Sun, Dec 20, 2015 at 11:25 AM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote:

I think we have to agree to disagree on the licensing part and what is best for Wikidata there. Yes I do want people to come to Wikidata but I do not want the license to be our forceful stick to achieve this. We have to work to build a project that people want to come to and contribute to. And we can do it as the number of editors for example shows. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Gnangarra

1:50 p.m.

there is a compromise license cc-by without the sa On 20 December 2015 at 21:38, Lydia Pintscher <lydia.pintscher(a)wikimedia.de> wrote:

...

On Sun, Dec 20, 2015 at 2:18 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

Lydia, I can only relate my impressions to you. The first two items I looked at

Now we're getting somewhere ;-)

(Jerusalem and Obama) happened to be protected, so on my first visit I

was

completely non-plussed as to how to edit anything on Wikidata. I never noticed the lock icon (whereas I would have noticed, say, a coloured box

the top of the page informing me that the item is locked). If I had been just a random user, I would not have been back.

Ok. I think we can make the icon more colorful for example to draw more attention to it. Mind you the icon is on-line with what you see on Wikipedia as well. That is why we have it.

Once I got over that one, I found the order in which statements are

listed

completely confusing. I would have expected them to follow some logical order, but it seems they are permanently *listed in the order in which

they

were added to Wikidata*. So someone's date of birth can be the last statement on a Wikidata page, or the first. Compare for example the location of the date of birth for Angela Merkel

https://www.wikidata.org/wiki/Q567 to the location of Barack Obama's

date

of birth in https://www.wikidata.org/wiki/Q76 I tried to figure out a way to change the order, but couldn't find one. Again, profoundly demotivating. Machines may not be bothered by this, because they can instantly find what they are looking for, but people

are.

It might help to establish a default order for statements that makes logical sense to a human being, and that people can become used to.

Yes that is indeed one of the problems we have identified for quite some time already. It is high on the list for 2016. I hope we get to it in Q1.

As for actual editing, a few weeks ago, figuring out how to add an IBM subsidiary to the IBM item, with a reference, must have taken me

something

like half an hour. I read Wikidata:Introduction, learned about

properties,

and then checked Help:Editing, which contained *nothing* about adding properties. The word is not even mentioned.

Ok so on-wiki documentation is not good enough. Point taken. It has been written by editors who are familiar with Wikidata. Giving feedback on the talk pages for those help pages would be valuable.

After clicking "add" in the *existing* subsidiaries statement for IBM

item,

I saw a question mark icon with a "help text" that reads, ---o0o--- Enter a value corresponding to the property named "subsidiaries". If the property has no designated value or the actual value is not known, you

may

choose an alternative to specifying a custom value by clicking the icon next to the value input box. ---o0o--- I didn't find this text helpful at all. It could have simply said, "Enter the name of the subsidiary in the text box, and then add a reference."

The single property that would be most useful to list in that drop down menu when people have said they want to add a reference is "reference

URL".

But it's not included. If newbies don't know this property exists, how

are

they supposed to discover it? Somehow I got there, but it was not

enjoyable. As above this should have shown up.

Sure. As I said we still have quite some work to do and feedback such as the above is what will help us make it better.

On Sun, Dec 20, 2015 at 11:25 AM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote:

Use a licence that requires re-users to mention "Wikidata" on their

sites,

ideally with a link to the Wikidata disclaimer, and you won't have to do any education at all, and at the same time you'll have done a great thing for transparency of data provenance on the internet. Moreover, you will have ensured that hundreds of millions of Internet

users

are told where they can find Wikidata and edit it. Surely, if you

actually

*want* to have human beings visiting and editing your wiki, that's in

your

interest?

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Andreas Kolbe

3:59 p.m.

On Sun, Dec 20, 2015 at 1:38 PM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote:

...

Just try it, Lydia. Click "add" in subsidiaries in https://www.wikidata.org/wiki/Q37156 -- enter a company name, and then click "add reference". When I do that, the text field contains a greyed-out "property", and the drop-down shows the unhelpful items I mentioned above. And it would be good if the help text actually *asked* people to cite a reference.

...

Use a licence that requires re-users to mention "Wikidata" on their

sites,

users

are told where they can find Wikidata and edit it. Surely, if you

actually

*want* to have human beings visiting and editing your wiki, that's in

your

interest?

Can you tell me just whose interests it serves if re-users do not have to indicate that the data they're showing their users come from Wikidata? Max Klein mused that the big search engines might be paying for Wikidata "to remove a blemish on their perceived omniscience", because they can present Wikidata content as though they had compiled it themselves.[1] That is at least a plausible line of thought; but whom else does it serve? It does not serve the end user, because they are left in the dark about the provenance of the data. Moreover, they may not understand that these are crowdsourced data, to which certain caveats always apply. It does not serve Wikidata's interests, because many consumers of Wikidata content who might otherwise come to edit the wiki, correct errors, refine information and so on, will lack the bridge that would take them there. We are a non-profit. The public good, the benefit to society, should be our only concern. So, who in society benefits, other than (arguably) the big commercial search engines? Please explain. Andreas [1] http://hblog.org/

geni

11:29 p.m.

On 20 December 2015 at 15:59, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Anyone who doesn't want to spend way too much of their time worrying about copyright law.

...

So, who in society benefits, other than (arguably) the big commercial search engines?

The big search engine companies could make their in house version with little effort. An open version means that they have to put up with everyone else having access to it. -- geni

Andreas Kolbe

21 Dec 21 Dec

3:25 p.m.

On Sun, Dec 20, 2015 at 11:29 PM, geni <geniice(a)gmail.com> wrote:

...

On 20 December 2015 at 15:59, Andreas Kolbe <jayen466(a)gmail.com> wrote:

Can you tell me just whose interests it serves if re-users do not have to indicate that the data they're showing their users come from Wikidata?

Max

Klein mused that the big search engines might be paying for Wikidata "to remove a blemish on their perceived omniscience", because they can

present

Wikidata content as though they had compiled it themselves.[1] That is at least a plausible line of thought; but whom else does it serve?

Anyone who doesn't want to spend way too much of their time worrying about copyright law.

Gerard Meijssen

3:29 p.m.

Hoi, You have no clue how this information is to be presented. Every bit of data may have its own source. Theoretically you have a point however, the average Joe will not care, will not seek this information and will be utterly bewildered by the ton of goobledegook that adds no value to him and makes the data only less informative. THanks, GerardM On 21 December 2015 at 16:25, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Sun, Dec 20, 2015 at 11:29 PM, geni <geniice(a)gmail.com> wrote:

On 20 December 2015 at 15:59, Andreas Kolbe <jayen466(a)gmail.com> wrote: > Can you tell me just whose interests it serves if re-users do not have

indicate that the data they're showing their users come from Wikidata?

Max > Klein mused that the big search engines might be paying for Wikidata

"to

remove a blemish on their perceived omniscience", because they can

present > Wikidata content as though they had compiled it themselves.[1] That is

least a plausible line of thought; but whom else does it serve?

Anyone who doesn't want to spend way too much of their time worrying

about

Re-users are very, very unlikely indeed to spend "way too much of their time worrying" about, say, having to add the words "Source: Wikidata. (Disclaimer.)" to their websites -- hyperlinked to wikidata.org and the Wikidata disclaimer. It's a one-minute job. And that's really all you need to keep the public informed, and provide them with an instant link to the wiki the data comes from -- so they can view it there, understand the history of how it was created, and make an input. Wikis are about openness and participation, not about presenting the public with faits accomplis. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

geni

22 Dec 22 Dec

8:10 a.m.

On 21 December 2015 at 15:25, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

You've broken say a CC-BY-SA license in at least two ways there. -- geni

Andreas Kolbe

12:27 p.m.

On Tue, Dec 22, 2015 at 8:10 AM, geni <geniice(a)gmail.com> wrote:

...

On 21 December 2015 at 15:25, Andreas Kolbe <jayen466(a)gmail.com> wrote:

You've broken say a CC-BY-SA license in at least two ways there.

I was unaware that you were in favour of CC BY-SA for Wikidata now. It's surely not beyond human skill to devise a licence for Wikidata that requires re-users to include the three words above on their website, while placing no other duties or restrictions on them.

geni

5:37 p.m.

On 22 December 2015 at 12:27, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

I was unaware that you were in favour of CC BY-SA for Wikidata now.

I'm not but you failed to specify a license and CC-BY-SA is one you might be vaguely familiar with

...

It's surely not beyond human skill to devise a licence for Wikidata that requires re-users to include the three words above on their website, while placing no other duties or restrictions on them.

You appear to be suggesting a homebrew license so we are already above the one minute mark. Worse still by talking about websites you are suffering from the classic problem of failing to consider all use cases. For example books, calendars or indeed any form of data transmission that isn't the web. -- geni

Pete Forsyth

6:15 p.m.

On Tue, Dec 22, 2015 at 9:37 AM, geni <geniice(a)gmail.com> wrote:

...

On 22 December 2015 at 12:27, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

It's surely not beyond human skill to devise a licence for Wikidata that requires re-users to include the three words above on their website,

while

placing no other duties or restrictions on them.

You appear to be suggesting a homebrew license

+1 Requiring that reusers credit the *web site* would be new in the Wikimedia world, and I don't see the advantage. Certainly, serious reusers who wish to establish credibility should be transparent about the source of their data; but it's not our proper role to compel them to do so. Attribution requirements in CC licenses are about crediting the *copyright holders*. Andreas, I realize this has been much discussed in this thread, but I don't think I've seen this angle addressed directly: In order for any copyright license to apply, somebody has to hold the copyright. Who do you imagine has a legitimate claim to copyright over the emergent database that grows as multiple individuals and automated processes add individual, non-copyrightable claims/statements/facts? -Pete [[User:Peteforsyth]]

Andreas Kolbe

26 Dec 26 Dec

9:37 a.m.

On Tue, Dec 22, 2015 at 6:15 PM, Pete Forsyth <peteforsyth(a)gmail.com> wrote:

...

On Tue, Dec 22, 2015 at 9:37 AM, geni <geniice(a)gmail.com> wrote:

On 22 December 2015 at 12:27, Andreas Kolbe <jayen466(a)gmail.com> wrote:

> It's surely not beyond human skill to devise a licence for Wikidata

that

requires re-users to include the three words above on their website,

while

placing no other duties or restrictions on them.

You appear to be suggesting a homebrew license

Pete, As I understand it, people here have raised the objection that in order to follow the letter of CC BY-SA, re-users would have to list all contributors, the way some of the Wikipedia-based books do for example. I think we all agree that this would be completely impractical for something like a Knowledge Graph box, and not in the end user's interest. What would make sense is the sort of attribution Bing uses today to credit Freebase and Wikipedia. Anyone wishing to argue that CC BY-SA requires all re-users to list all contributors has to realise that if that were true, Google, Bing and others infringe Wikipedia's CC BY-SA licence billions of times a year. As I've said before, I'm pretty sure that if you were to take them to court for not listing all the contributors who participated in creating the snippets and timelines they display in their SERPs' Knowledge Graph/Snapshot boxes, you would not prevail. As I understand it, the CC BY-SA licence only requires attribution that is "reasonable to the medium or means You are utilizing". I think a court would agree that given the inherent space limitations, Google and Bing are being "reasonable" by providing a link to the Wikipedia article they're excerpting, and providing no more attribution than that. Do you disagree? Is anyone arguing that Google are in fact breaking CC BY-SA by restricting their attribution to a link to Wikipedia? Because if not, we can lay that one to rest. Now, if that works for Wikipedia, why can't we have the same for Wikidata?

...

Requiring that reusers credit the *web site* would be new in the Wikimedia world, and I don't see the advantage.

The advantage is transparency about data provenance, as well as creating a path to Wikidata where users can contest, correct and refine the information. This is a benefit to the end user, and in line with Foundation values like transparency and user engagement. Do you disagree?

...

Certainly, serious reusers who wish to establish credibility should be transparent about the source of their data;

I have never seen Google credit Freebase (Bing does, probably because Freebase is a Google property), and I think neither Google nor Bing will credit Wikidata either.

...

but it's not our proper role to compel them to do so.

Could you explain why in your view it is not out proper role to do so?

...

Attribution requirements in CC licenses are about crediting the *copyright holders*. Andreas, I realize this has been much discussed in this thread, but I don't think I've seen this angle addressed directly: In order for any copyright license to apply, somebody has to hold the copyright. Who do you imagine has a legitimate claim to copyright over the emergent database that grows as multiple individuals and automated processes add individual, non-copyrightable claims/statements/facts?

See https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#The_legal_definit… ---o0o---

...

From a legal perspective, a database is any organized collection of

materials — hard copy or electronic — that permits a user to search for and access individual pieces of information contained within the materials. No database software, as a programmer would understand it, is necessary. In the US, for example, Black’s Law Dictionary defines a database as a "compilation of information arranged in a systematic way and offering a means of finding specific elements it contains, often today by electronic means."[1] <https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#cite_note-1> *Databases may be protected by US copyright law as "compilations."* In the EU, databases are protected by the Database Directive <https://en.wikipedia.org/wiki/Database_Directive>, which defines a database as "a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means." ---o0o--- So according to that page, created by Wikimedia legal staff, databases may be protected even by US copyright law as "compilations". In the EU (is Wikidata currently based in the EU, given that it's a Wikimedia Deutschland project?) the protections are still more stringent. As I understand it, the community as a whole holds the copyright, but you'd have to check with Foundation legal staff or some other lawyer to be sure. Best, Andreas

Pete Forsyth

5:46 p.m.

Andreas, Helpful questions and observations, thank you. My replies inline: On Sat, Dec 26, 2015 at 1:37 AM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Pete, <snip> Is anyone arguing that Google are in fact breaking CC BY-SA by restricting their attribution to a link to Wikipedia? Because if not, we can lay that one to rest.

No, I think we agree there. Now, if that works for Wikipedia, why can't we have the same for Wikidata? I'd say the better question, is "what legal or moral right would we call upon to *insist* on having the same for Wikidata?" If we had a clear answer to that one, it would really move forward; but I don't think we do, or if we do, it's not yet clear to me.

...

Requiring that reusers credit the *web site* would be new in the

Wikimedia

world, and I don't see the advantage. (-Pete)

No, and I should have been clearer -- I do see the general advantage in a site providing information about the source of information (of course). What I don't see is the advantage of requiring them to do so in a certain way.

...

Certainly, serious reusers who wish to establish credibility should be transparent about the source of their data; (-Pete)

I have never seen Google credit Freebase (Bing does, probably because Freebase is a Google property), and I think neither Google nor Bing will credit Wikidata either.

I don't think Google or Bing aspires to having the highest standard of credibility. If they are useful, their business interests have been served, and I would hope that no student or academic would be able to cite the Google Knowledge Graph in a formal paper, any more than they could cite Wikipedia. (caveat emptor)

...

but it's not our proper role to compel them to do so. (-Pete)

Could you explain why in your view it is not out proper role to do so?

I believe in the agency of multiple people and entities in curating knowledge. Individuals, and individual information projects, should have the ability to make their own judgment about how much, and what kind, of citation is required for their purposes. I don't believe that information curation can be perfected by anticipating all needs in policy and legal documents. If our users have a moral or legal right that needs to be defended, we should do so. But I don't see one in this case (perhaps a clear hypothetical example could help?)

...

Attribution requirements in CC licenses are about crediting the *copyright

holders*. Andreas, I realize this has been much discussed in this thread, but I

don't

think I've seen this angle addressed directly: In order for any copyright license to apply, somebody has to hold the copyright. Who do you imagine has a legitimate claim to copyright over the emergent database that grows as multiple individuals and automated processes add individual, non-copyrightable claims/statements/facts? (-Pete)

See https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#The_legal_definit… <snip quotes>

So according to that page, created by Wikimedia legal staff, databases may

...

be protected even by US copyright law as "compilations". In the EU (is Wikidata currently based in the EU, given that it's a Wikimedia Deutschland project?) the protections are still more stringent. As I understand it, the community as a whole holds the copyright, but you'd have to check with Foundation legal staff or some other lawyer to be sure.

Helpful link, thank you. My eye is drawn to the word "may." If databases MAY be protected, what conditions need to pertain in order for that to happen? I'd be very interested in hearing from a legal expert about that. My best guess is that a "database" like an edited compilation of papers about biology, or a compilation of Christmas songs, would be protected by copyright -- the people or organizations who curated the collection would hold the copyright to the collection, while the individual authors/artists would hold the copyright to the individual papers or songs. But the phone book would not carry copyright, because there was no editorial or creative judgment in assembling the list. "The Wikimedia community as a whole" is certainly not a legal entity, and I'm skeptical that it's an entity at all. How can something that is not a legal entity hold a copyright? Whose rights do you wish to protect? Pete [[User:Peteforsyth]]

Andreas Kolbe

28 Dec 28 Dec

11:47 a.m.

Pete, Thanks. Comments interspersed below. On Sat, Dec 26, 2015 at 5:46 PM, Pete Forsyth <peteforsyth(a)gmail.com> wrote:

...

I'd say the better question, is "what legal or moral right would we call upon to *insist* on having the same for Wikidata?" If we had a clear answer to that one, it would really move forward; but I don't think we do, or if we do, it's not yet clear to me.

The same as in the case of Wikipedia. Is Wikidata different because it aspires to listing machine-readable facts only, rather than written expositions? Not to my mind, because facts are frequently debatable, and their presentation and sourcing involves choice and expertise. Moreover, speaking somewhat less seriously for a moment, Wikidata doesn't actually just contain non-copyrightable facts. As we've seen, it contains some of the same hoaxes and errors Wikipedia contains, which are by definition creative. It's an entertaining fact that dictionary publishers would in the past (perhaps they still do it now) include a small number of hoax entries -- made-up words -- in their dictionaries, so they would be able to demonstrate that another dictionary publisher had simply copied their work. The Wikidata project is (involuntarily of course) doing the same.

...

Personally, I wouldn't insist on it being done in a certain way. I only feel, very strongly, that having no information at all about the source of information is very much undesirable, for the reasons previously mentioned (data provenance, providing a bridge to potential users, etc.).

...

The problem with free information is that it displaces non-free information, much like a cheaper product displaces a more expensive one. We've seen this with Wikipedia replacing professionally published encyclopedias. Free information tends to become pervasive. This pervasiveness creates a steady drip effect – if a certain item of information becomes ubiquitous, so you see it in Google, in Bing, and elsewhere, you don't question it any more after a while. And once information becomes unquestioned, it enters more credible sources, because the authors of those are human, too. People cannot be on their guard 24/7, questioning everything they see. This is how citogenesis happens. I'm currently thinking about the Kazakh Wikipedia again, as the topic has (rightly) reappeared on Jimmy Wales' talk page.[1] It provides a good example. I believe the reason the Kazakh dictatorship embraced Creative Commons, releasing its Kazakh National Encyclopedia under a free licence so its articles could be imported *en masse* into the Kazakh Wikipedia (by editors incentivised by the chance to win laptops etc.), was because that encyclopedia reflected the regime's political views and censorship criteria. If you make your information ubiquitous, ensuring it appears under different brand names, with its real provenance obscured, eventually it will not be questioned any more. The WMF allowed itself to be used there, enthusiastically so. To me it's one of the most shameful episodes in its history.

...

Some users certainly feel very strongly that they have moral rights they would like to see upheld. See the discussion at https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_d… for examples.

...

See

https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#The_legal_definit…

So according to that page, created by Wikimedia legal staff, databases may

be protected even by US copyright law as "compilations". In the EU (is Wikidata currently based in the EU, given that it's a Wikimedia

Deutschland

project?) the protections are still more stringent. As I understand it,

the

community as a whole holds the copyright, but you'd have to check with Foundation legal staff or some other lawyer to be sure.

I believe this largely depends on the amount taken. Taking an individual fact is not problematic; systematic mass imports are. But like you, I would be interested in hearing from legal experts.

...

My best guess is that a "database" like an edited compilation of papers about biology, or a compilation of Christmas songs, would be protected by copyright -- the people or organizations who curated the collection would hold the copyright to the collection, while the individual authors/artists would hold the copyright to the individual papers or songs. But the phone book would not carry copyright, because there was no editorial or creative judgment in assembling the list.

Yes. Now, is there editorial or creative judgment involved in creating a Wikipedia infobox? I would say there is. You select and reject potential sources, decide which entries to fill or leave blank, etc.

...

"The Wikimedia community as a whole" is certainly not a legal entity, and I'm skeptical that it's an entity at all. How can something that is not a legal entity hold a copyright? Whose rights do you wish to protect?

Jane Darnell

12:40 p.m.

If anything, the Kazakh thing just proves that the wiki model works. No shame in that. It's probably why the Chinese are blocking Wikipedia and not embracing it. You can't hide your propaganda, even from your own people. As far as the compilation of Christmas songs goes, the list of songs is not copyrightable, because the sort order of a list is not creative (unless it's something that becomes poetry when you read the titles as a list). On Mon, Dec 28, 2015 at 12:47 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Pete, Thanks. Comments interspersed below. On Sat, Dec 26, 2015 at 5:46 PM, Pete Forsyth <peteforsyth(a)gmail.com> wrote:

I'd say the better question, is "what legal or moral right would we call upon to *insist* on having the same for Wikidata?" If we had a clear

answer

to that one, it would really move forward; but I don't think we do, or if we do, it's not yet clear to me.

I don't think Google or Bing aspires to having the highest standard of credibility. If they are useful, their business interests have been

served,

and I would hope that no student or academic would be able to cite the Google Knowledge Graph in a formal paper, any more than they could cite Wikipedia. (caveat emptor)

See

https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#The_legal_definit…

So according to that page, created by Wikimedia legal staff, databases

may

be protected even by US copyright law as "compilations". In the EU (is Wikidata currently based in the EU, given that it's a Wikimedia

Deutschland

project?) the protections are still more stringent. As I understand it,

the

community as a whole holds the copyright, but you'd have to check with Foundation legal staff or some other lawyer to be sure.

I believe this largely depends on the amount taken. Taking an individual fact is not problematic; systematic mass imports are. But like you, I would be interested in hearing from legal experts.

authors/artists

would hold the copyright to the individual papers or songs. But the phone book would not carry copyright, because there was no editorial or

creative

judgment in assembling the list.

Those of contributors severally and jointly. The compilation of data always involves people. Their having done this work should be visible and traceable. We agree that if Google, Bing etc. attribute to "Wikipedia", this is sufficient to uphold Wikipedia contributors' rights in this regard. I'd like to see the same with Wikidata. [1] https://en.wikipedia.org/w/index.php?title=User_talk:Jimbo_Wales&oldid=… _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

3:30 p.m.

On Mon, Dec 28, 2015 at 12:40 PM, Jane Darnell <jane023(a)gmail.com> wrote:

...

Jane, You don't seem to understand what's happening here. Kazakhstan is in the process of replicating the Chinese "Great Firewall" for its own citizens, using slightly different means. From a recent report in the New York Times:[1] ---o0o--- Unlike with China, which filters data through an expensive and complex digital infrastructure known as the Great Firewall, security experts say Kazakhstan is trying to achieve the same effect at a lower cost. The country is mandating that its citizens install a new "national security certificate" on their computers and smartphones that will intercept requests to and from foreign websites. That gives officials the opportunity to read encrypted traffic between Kazakh users and foreign servers, in what security experts call a "man in the middle attack." As a result, Kazakh telecom operators, and government officials, will be privy to mobile and web traffic between Kazakh users and foreign servers, bypassing encryption protections known as S.S.L., or Secure Sockets Layer, and H.T.T.P.S., technology that encrypts browsing sessions and is familiar to users by the tiny padlock icon that appears in browsers. ---o0o--- Do you understand what this means? The Kazakh government will be *able to identify any Kazakh citizen who edits Wikipedia, and see what they did there.* Even if you go into an Internet café in that country, you have to give your name, and your activities will be monitored. That is a major chilling effect. So you now have a situation where the government-published encyclopedia, with its own bent on the country's history and government, is in the Kazakh Wikipedia, appearing under the Wikipedia brand name. It was put there by volunteers who were promised laptops and other prizes for their work transcribing these articles. This was an effort that WMF board members went out of their way to praise and reward, even though it's always been clear, since June 2011, when state support was announced, that Wikibilim was a Kazakh government-sponsored effort. Wikibilim's Kazakh Wikipedia project is publicly described as "implemented under the auspices of the Prime Minister of Kazakhstan."[2] Ting Chen, then chairman of the WMF board, even participated in a press conference with Kazakh government representatives and functionaries. Yet Wikibilim reportedly had a trademark licence agreement with the Wikimedia Foundation within a month of the organisation's founding,[3] something I believe most regular chapters have to wait a lot longer for, and was immediately hailed as a future chapter. At Wikimania 2011, this was followed by Wales' "Wikipedian of the Year" award for Wikibilim, which was widely publicised by the Kazakh government. What could be better PR for them than an endorsement by a free-speech figure like Jimmy Wales? Yet it's long been established that Wikibilim's leaders have been and are part of the Kazakh government machine. One is now the vice-governor of a major province in the country,[4] and the founding director of a Brussels-based think tank that human rights organisations consider a PR front for the regime.[5][6] Another went on to become Vice Chairman of the company that runs the Kazakh Prime Minister's website; he is at the same time an active editor and one of a small number of administrators in the Kazakh Wikipedia. The country's opposition press has been shut down. Even when remnants of it still existed, it was clear that opposition papers would not be considered "reliable sources" in the Kazakh Wikipedia. If this proves that the "wiki model works", then it can only mean that it "works" in the sense that dictatorships can very smartly exploit it for their own ends--in this case, with apparent WMF connivance. (I would really like to know who, if anyone, advised the WMF on this at the time.) China has its own internet encyclopedias that it controls in a similar manner. They have no need for Wikipedia. They have two crowdsourced internet encyclopedias that are bigger even than the English Wikipedia, and positively dwarf the Chinese Wikipedia. However, there is significant government interest in Wikipedia in other Asian countries. What the WMF should do is to start examining to what extent these Wikipedias are functionally censored, using the services of linguists and political/human rights experts. I have long advocated that there should be a Wikipedia Freedom Index[7] indicating to the reader how free of censorship any Wikipedia is. Where a Wikipedia is found to suffer from significant problems, the WMF should place server-side banners on its pages, in the local language and English, alerting readers to this fact and suggesting that they should also consult other language versions of Wikipedia for a more rounded view. In addition, extra caution should be exercised when importing politically sensitive data from such Wikipedias to Wikidata. [1] http://bits.blogs.nytimes.com/2015/12/03/kazakhstan-moves-to-tighten-contro… [2] http://inform.kz/eng/article/2480400 [3] https://meta.wikimedia.org/w/index.php?title=Talk:Wikimedia_Kazakhstan&… [4] http://www.inform.kz/eng/article/2730173 [5] http://www.independent.co.uk/news/world/politics/jack-straw-criticised-for-… [6] http://www.eurasianet.org/node/72831 [7] https://meta.wikimedia.org/wiki/Grants:IdeaLab/Wikipedia_Freedom_Index

Jane Darnell

4:22 p.m.

Anyone can exploit the content on WMF for their needs. What I mean by "it works" is that you can't fool people when you try to change Wikipedia to fit government policy. We can easily identify problematic edits. Never underestimate the diaspora of any country. Wikimedia is always bigger than any one government will ever estimate. On Mon, Dec 28, 2015 at 4:30 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Mon, Dec 28, 2015 at 12:40 PM, Jane Darnell <jane023(a)gmail.com> wrote:

If anything, the Kazakh thing just proves that the wiki model works. No shame in that. It's probably why the Chinese are blocking Wikipedia and

not

embracing it. You can't hide your propaganda, even from your own people.

Risker

4:58 p.m.

On 28 December 2015 at 11:22, Jane Darnell <jane023(a)gmail.com> wrote:

...

Well, yes, anyone can exploit the content of WMF projects; we don't usually give them kudos for doing so, though. And you most certainly CAN fool people when you change Wikipedia to change government policy, if the government overwhelms a small "traditional" Wikipedia community with bribes, threats to well-being and good old fashioned paid editing. The Wikipedia brand is perceived to be independent from such influences; that it isn't in this case (and who knows how many other cases) cannot be perceived by readers who do not have any alternative resources. Small communities with less than 50 active editors can be pretty easily swamped; a university class adding valuable, well sourced and researched content may have a positive effect, just as focused addition of heavily biased material by "editing for reward" (rewards including payment, gifts, or simply not being incarcerated) can turn a Wikipedia into a platform for third parties. This particular project was an easy target, and there are many others that could similarly be overwhelmed. We need to recognize that most of the world does not live under the conditions that encourage or even permit the development of freely available information. As a global community we need to stop pretending that the example of Kazakh Wikipedia is not a major and significant bellwether that requires very serious review of how we encourage and develop projects centered in countries with repressive regimes. Many of these regions are areas with significant potential for growth of our content - the major focus of the mission of the Wikimedia Foundation. Figuring out how to grow these projects within the founding principles is not just important, it's necessary. Risker/Anne

Jane Darnell

6 p.m.

All I said is that the wiki way works, that's all. You can't hide it when someone tries to take over a project, and that is the reason we shouldn't try to anticipate that with convoluted strategies. "Assume Good Faith" will always win out over any strange misguided takeover strategy, which is why governments that intend to do such things choose nowadays to just block wikimedia altogether. It is not our wake-up call to take, but that of the Kazakh people. On Mon, Dec 28, 2015 at 5:58 PM, Risker <risker.wp(a)gmail.com> wrote:

...

On 28 December 2015 at 11:22, Jane Darnell <jane023(a)gmail.com> wrote:

than

any one government will ever estimate.

Risker

6:25 p.m.

"Assume good faith" is actually what got Kazakh Wikipedia into the mess it is in. Wikimedia projects have been blocked by governments practically since their inception. Perverting the content is the new way of doing things. They've learned from the PR and SEO industries. And that leads us back to Wikidata. There has always been the recognized potential for use of Wikidata to build articles for smaller projects, based on properly-sourced, independently verified (or verifiable) data; it's one of the reasons that Wikidata has been accepted into the Wikimedia family. But the key problem with creating content in this way is that the contents of Wikidata are currently mostly unsourced or so poorly sourced that they can't be considered either verified or verifiable even in one's wildest dreams. My experience, based on reading about a hundred user talk pages on Wikidata recently, is that Wikidatians do not consider sourcing to be important or even desirable. This is a major problem for any group that wants to reuse the content, because bluntly put there's a fair amount of junk that got transferred to Wikidata, and it's currently not possible to sort the wheat from the chaff. The absence of references on Wikidata is a significant barrier to the reusability of its data. I despair every time someone says "it's like Wikipedia, it will get better!" Well, no. Huge swaths of existing Wikipedias have never improved despite being more than a decade old. Our first new Wikimedia project in years shouldn't be basing its practices on principles that have already been proved insufficient to maintain and curate major projects with thousands of active editors. So - Wikidata could play an important role in the development of core content on smaller-sized projects with a small editorial community. (I say "could" because we have seen unsuccessful experiments importing significant quantities of information into smaller projects. Swahili Wikipedia has still not completely recovered from its experience.) But without being able to provide provenance, its data doesn't even meet the minimal criteria for verifiability. Risker/Anne On 28 December 2015 at 13:00, Jane Darnell <jane023(a)gmail.com> wrote:

...

On 28 December 2015 at 11:22, Jane Darnell <jane023(a)gmail.com> wrote: > Anyone can exploit the content on WMF for their needs. What I mean by

"it

> works" is that you can't fool people when you try to change Wikipedia

fit government policy. We can easily identify problematic edits. Never underestimate the diaspora of any country. Wikimedia is always bigger

than

any one government will ever estimate.

Well, yes, anyone can exploit the content of WMF projects; we don't

usually

give them kudos for doing so, though. And you most certainly CAN fool people when you change Wikipedia to change government policy, if the government overwhelms a small "traditional" Wikipedia community with bribes, threats to well-being and good old fashioned paid editing. The Wikipedia brand is perceived to be independent from such influences; that it isn't in this case (and who knows how many other cases) cannot be perceived by readers who do not have any alternative resources. Small communities with less than 50 active editors can be pretty easily swamped; a university class adding valuable, well sourced and researched content may have a positive effect, just as focused addition of heavily biased material by "editing for reward" (rewards including payment,

gifts,

or simply not being incarcerated) can turn a Wikipedia into a platform

for

third parties. This particular project was an easy target, and there

are

many others that could similarly be overwhelmed. We need to recognize

that

most of the world does not live under the conditions that encourage or

even

permit the development of freely available information. As a global community we need to stop pretending that the example of Kazakh Wikipedia is not a major and significant bellwether that requires very serious

review

of how we encourage and develop projects centered in countries with repressive regimes. Many of these regions are areas with significant potential for growth of our content - the major focus of the mission of

the

Wikimedia Foundation. Figuring out how to grow these projects within the founding principles is not just important, it's necessary. Risker/Anne _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

7:13 p.m.

On Mon, Dec 28, 2015 at 6:00 PM, Jane Darnell <jane023(a)gmail.com> wrote:

...

Ah, I see. That's easy to say for people in the Western world. In Uzbekistan dissidents have been boiled alive.[1] In Kazakhstan, journalists are imprisoned and harassed; one was firebombed and had the decapitated carcass of a dog left outside her offices. (The dog's head later turned up at her home.)[2] In Azerbaijan, Wikipedians have been tortured and threatened with torture, according to posts on the WMCEE-l mailing list.[3] All respect to you if you run these risks in order to edit Wikipedia, and still do it regardless. But if you don't, please don't dispense blithe and jejune advice, and don't tell people who are concerned about remaining alive, preferably with their skin and fingernails intact, that they need a wake-up call. I'd rather you told the WMF not to reward the functionaries of such regimes with "Wikipedian of the Year" awards and trademark licence agreements. [1] http://www.rferl.org/content/uzbekistans-house-of-torture/24667200.html [2] https://en.wikipedia.org/wiki/Irina_Petrushova [3] http://listy.wikimedia.pl/pipermail/wmcee-l/2015-May/000839.html

Jane Darnell

7:43 p.m.

Well the chances of me being firebombed while on vacation in the states are probably higher than me being firebombed for editing Wikipedia, but that still doesn't mean we need to worry about changing the wiki model. I guess I have lost the thread of your point entirely now. On Mon, Dec 28, 2015 at 8:13 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Mon, Dec 28, 2015 at 6:00 PM, Jane Darnell <jane023(a)gmail.com> wrote:

will

always win out over any strange misguided takeover strategy, which is why governments that intend to do such things choose nowadays to just block wikimedia altogether. It is not our wake-up call to take, but that of the Kazakh people.

Lilburne

29 Dec 29 Dec

10:44 a.m.

On 28/12/2015 18:00, Jane Darnell wrote:

...

Gerard Meijssen

12:15 p.m.

Hoi, So you have determined that people can be manipulated. Good, then what? If this is the tack that you take you will be grounded because there is no plan. It is a negative attitude that only stifles. Quality is not only in sources, sources can be and are manipulations in their own right. Many important subjects are woefully underrepresented. The argument has it that it is because of a lack of sources.. Sources are relevant but we only are interested in particular subjects. We do not need to look at Kazakhstan to find fault. Amnest (reliable source) indicates that all USA police forces are not in compliance with international agreements on the use of force. NOW WHAT ?? When quality is the subject, it is important to decide how we effectively improve quality. VIAF provided Wikidata with a list of issues they found. Tom checked it out and our quality is better as a result. It means that more information is linked for people who visit a library. When awards are known, adding known recipients in Wikidata based on info from multiple Wikipedias improves the quality and in this way many incorrect links are exposed. When quality of our projects is the subject, decide how we can do a better job. When Facebook invites companies to manipulate people, it is why Facebook information is suspect. At most it is a reminder that manipulation is an important issue. It does not mean that people cannot add data on their hobby horse. Quality is important but quality is more than sources. When sources are used as an argument that is detrimental to the quality of Wikidata, then in my opinion we have forgotten why Wikipedia was possible in the first place. It was not because of sources, it was because of the web of information we created, a web that is of a NPOV. Wikidata does not have a NPOV. It represents facts found in many places. As the information becomes more extended, it becomes possible to find manipulations, errors. This is when sources truly become vital. But do remember, the POV of the USA and many of its sources are as suspect as those from Kazakhstan. Thanks, GerardM On 29 December 2015 at 11:44, Lilburne <lilburne(a)tygers-of-wrath.net> wrote:

...

On 28/12/2015 18:00, Jane Darnell wrote:

Facebook showed the other year that it could manipulate people by what it showed them in their feeds. http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-… http://www.bbc.co.uk/news/technology-28051930 They didn't do this for fun, they did it to show their clients (advertisers, governments) that they could manipulate millions of people. You only need a small push in one direction or another to influence a large population. Doesn't matter if the push is to buy a particular soap, vote one way or another, or how you see a particular minority, or issue. http://www.networkworld.com/article/2450825/big-data-business-intelligence/… Do it to a naively trusted source and you have a triple word score jackpot^H^H^Hboot. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gnangarra

12:27 p.m.

...

This is when sources truly become vital. But do remember, the POV of the USA and many of its sources are as suspect as those from Kazakhstan.

And that is why regardless of the fact a citation is so important, because the person receiving the information must able to make their own assessment of the sources reliability with a CC0 license and a significant selection of the information unsourced WikiDatas data lacks the quality, integrity we all expect of ourselves when we add content to any of the projects. On 29 December 2015 at 20:15, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

On 28/12/2015 18:00, Jane Darnell wrote: > All I said is that the wiki way works, that's all. You can't hide it

when

> someone tries to take over a project, and that is the reason we

shouldn't

> try to anticipate that with convoluted strategies. "Assume Good Faith" > will > always win out over any strange misguided takeover strategy, which is

why

> governments that intend to do such things choose nowadays to just block > wikimedia altogether. It is not our wake-up call to take, but that of

the

Kazakh people.

Facebook showed the other year that it could manipulate people by what it showed them in their feeds.

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

http://www.bbc.co.uk/news/technology-28051930 They didn't do this for fun, they did it to show their clients (advertisers, governments) that they could manipulate millions of people. You only need a small push in one direction or another to influence a

large

population. Doesn't matter if the push is to buy a particular soap, vote one way or another, or how you see a particular minority, or issue.

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

Do it to a naively trusted source and you have a triple word score jackpot^H^H^Hboot. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Gerard Meijssen

12:29 p.m.

Hoi, You do not get the point or you deliberately distort it. The point is that quality is not sources. Quality is more than that. Thanks, GerardM On 29 December 2015 at 13:27, Gnangarra <gnangarra(a)gmail.com> wrote:

...

This is when sources truly become vital. But do remember, the POV of the USA and many of its sources are as suspect as those from Kazakhstan.

Hoi, So you have determined that people can be manipulated. Good, then what? If this is the tack that you take you will be grounded because there is

plan. It is a negative attitude that only stifles. Quality is not only in sources, sources can be and are manipulations in their own right. Many important subjects are woefully underrepresented. The argument has it

that

it is because of a lack of sources.. Sources are relevant but we only are interested in particular subjects.

do not need to look at Kazakhstan to find fault. Amnest (reliable source) indicates that all USA police forces are not in compliance with international agreements on the use of force. NOW WHAT ?? When quality is the subject, it is important to decide how we effectively improve quality. VIAF provided Wikidata with a list of issues they found. Tom checked it out and our quality is better as a result. It means that more information is linked for people who visit a library. When awards

are

known, adding known recipients in Wikidata based on info from multiple Wikipedias improves the quality and in this way many incorrect links are exposed. When quality of our projects is the subject, decide how we can do a

better

job. When Facebook invites companies to manipulate people, it is why Facebook information is suspect. At most it is a reminder that

manipulation

is an important issue. It does not mean that people cannot add data on their hobby horse. Quality is important but quality is more than sources. When sources are used as an argument that is detrimental to the quality of Wikidata, then

my opinion we have forgotten why Wikipedia was possible in the first

place.

It was not because of sources, it was because of the web of information

created, a web that is of a NPOV. Wikidata does not have a NPOV. It represents facts found in many places.

the information becomes more extended, it becomes possible to find manipulations, errors. This is when sources truly become vital. But do remember, the POV of the USA and many of its sources are as suspect as those from Kazakhstan. Thanks, GerardM On 29 December 2015 at 11:44, Lilburne <lilburne(a)tygers-of-wrath.net> wrote:

On 28/12/2015 18:00, Jane Darnell wrote: > All I said is that the wiki way works, that's all. You can't hide it

when

> someone tries to take over a project, and that is the reason we

shouldn't

> try to anticipate that with convoluted strategies. "Assume Good Faith" > will > always win out over any strange misguided takeover strategy, which is

why >> governments that intend to do such things choose nowadays to just

block

> wikimedia altogether. It is not our wake-up call to take, but that of

the >> Kazakh people. >> >> > Facebook showed the other year that it could manipulate people by what

showed them in their feeds.

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

> http://www.bbc.co.uk/news/technology-28051930 > > They didn't do this for fun, they did it to show their clients > (advertisers, governments) that they could manipulate millions of

people.

You only need a small push in one direction or another to influence a

large > population. Doesn't matter if the push is to buy a particular soap,

vote

one way or another, or how you see a particular minority, or issue.

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gnangarra

12:33 p.m.

no I agree quality is more than just the sources, but without sources quality cannot be achieved On 29 December 2015 at 20:29, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

This is when sources truly become vital. But do remember, the POV of the USA and many of its sources are as suspect as those from Kazakhstan.

significant

selection of the information unsourced WikiDatas data lacks the quality, integrity we all expect of ourselves when we add content to any of the projects. On 29 December 2015 at 20:15, Gerard Meijssen <gerard.meijssen(a)gmail.com wrote:

Hoi, So you have determined that people can be manipulated. Good, then what? If this is the tack that you take you will be grounded because there is

no > plan. It is a negative attitude that only stifles. Quality is not only

sources, sources can be and are manipulations in their own right. Many important subjects are woefully underrepresented. The argument has it

that

it is because of a lack of sources.. Sources are relevant but we only are interested in particular subjects.

We > do not need to look at Kazakhstan to find fault. Amnest (reliable

source)

> indicates that all USA police forces are not in compliance with > international agreements on the use of force. NOW WHAT ?? > > When quality is the subject, it is important to decide how we

effectively

> improve quality. VIAF provided Wikidata with a list of issues they

found.

Tom checked it out and our quality is better as a result. It means that more information is linked for people who visit a library. When awards

are > known, adding known recipients in Wikidata based on info from multiple > Wikipedias improves the quality and in this way many incorrect links

are

exposed. When quality of our projects is the subject, decide how we can do a

better

job. When Facebook invites companies to manipulate people, it is why Facebook information is suspect. At most it is a reminder that

manipulation > is an important issue. It does not mean that people cannot add data on > their hobby horse. > > Quality is important but quality is more than sources. When sources are > used as an argument that is detrimental to the quality of Wikidata,

then

my opinion we have forgotten why Wikipedia was possible in the first

place.

It was not because of sources, it was because of the web of information

we > created, a web that is of a NPOV. > > Wikidata does not have a NPOV. It represents facts found in many

places.

As > the information becomes more extended, it becomes possible to find > manipulations, errors. This is when sources truly become vital. But do > remember, the POV of the USA and many of its sources are as suspect as > those from Kazakhstan. > Thanks, > GerardM > > On 29 December 2015 at 11:44, Lilburne <lilburne(a)tygers-of-wrath.net> > wrote: > > > On 28/12/2015 18:00, Jane Darnell wrote: > > > >> All I said is that the wiki way works, that's all. You can't hide it > when > >> someone tries to take over a project, and that is the reason we > shouldn't > >> try to anticipate that with convoluted strategies. "Assume Good

Faith"

> >> will > >> always win out over any strange misguided takeover strategy, which

why >> governments that intend to do such things choose nowadays to just

block > >> wikimedia altogether. It is not our wake-up call to take, but that

> the > >> Kazakh people. > >> > >> > > Facebook showed the other year that it could manipulate people by

what

showed them in their feeds.

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

> http://www.bbc.co.uk/news/technology-28051930 > > They didn't do this for fun, they did it to show their clients > (advertisers, governments) that they could manipulate millions of

people.

You only need a small push in one direction or another to influence a

large > population. Doesn't matter if the push is to buy a particular soap,

vote

one way or another, or how you see a particular minority, or issue.

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

> > > > Do it to a naively trusted source and you have a triple word score > > jackpot^H^H^Hboot. > > > > > > > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > > New messages to: Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Gerard Meijssen

12:36 p.m.

Hoi, That is a circular argument. Thanks, GerardM On 29 December 2015 at 13:33, Gnangarra <gnangarra(a)gmail.com> wrote:

...

no I agree quality is more than just the sources, but without sources quality cannot be achieved On 29 December 2015 at 20:29, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

Hoi, You do not get the point or you deliberately distort it. The point is

that

quality is not sources. Quality is more than that. Thanks, GerardM On 29 December 2015 at 13:27, Gnangarra <gnangarra(a)gmail.com> wrote: > > > > This is when sources truly become vital. But do > > remember, the POV of the USA and many of its sources are as suspect

> > those from Kazakhstan. > > > And that is why regardless of the fact a citation is so important, > > because the person receiving the information must able to make their

own

assessment of the sources reliability with a CC0 license and a

significant > selection of the information unsourced WikiDatas data lacks the

quality,

> integrity we all expect of ourselves when we add content to any of the > projects. > > On 29 December 2015 at 20:15, Gerard Meijssen <

gerard.meijssen(a)gmail.com

> > wrote: > > > Hoi, > > So you have determined that people can be manipulated. Good, then

what?

> > > > If this is the tack that you take you will be grounded because there

> no > > plan. It is a negative attitude that only stifles. Quality is not

only

in > > sources, sources can be and are manipulations in their own right.

Many

> > important subjects are woefully underrepresented. The argument has it > that > > it is because of a lack of sources.. > > > > Sources are relevant but we only are interested in particular

subjects.

We > do not need to look at Kazakhstan to find fault. Amnest (reliable

source)

> indicates that all USA police forces are not in compliance with > international agreements on the use of force. NOW WHAT ?? > > When quality is the subject, it is important to decide how we

effectively

> improve quality. VIAF provided Wikidata with a list of issues they

found. > > Tom checked it out and our quality is better as a result. It means

that

> > more information is linked for people who visit a library. When

awards

> are > > known, adding known recipients in Wikidata based on info from

multiple

> Wikipedias improves the quality and in this way many incorrect links

are > > exposed. > > > > When quality of our projects is the subject, decide how we can do a > better > > job. When Facebook invites companies to manipulate people, it is why > > Facebook information is suspect. At most it is a reminder that > manipulation > > is an important issue. It does not mean that people cannot add data

> > their hobby horse. > > > > Quality is important but quality is more than sources. When sources

are

> used as an argument that is detrimental to the quality of Wikidata,

then > in > > my opinion we have forgotten why Wikipedia was possible in the first > place. > > It was not because of sources, it was because of the web of

information

we > created, a web that is of a NPOV. > > Wikidata does not have a NPOV. It represents facts found in many

places. > As > > the information becomes more extended, it becomes possible to find > > manipulations, errors. This is when sources truly become vital. But

> > remember, the POV of the USA and many of its sources are as suspect

> those from Kazakhstan. > Thanks, > GerardM > > On 29 December 2015 at 11:44, Lilburne <lilburne(a)tygers-of-wrath.net

> > wrote: > > > > > On 28/12/2015 18:00, Jane Darnell wrote: > > > > > >> All I said is that the wiki way works, that's all. You can't hide

> when > >> someone tries to take over a project, and that is the reason we > shouldn't > >> try to anticipate that with convoluted strategies. "Assume Good

Faith"

> >> will > >> always win out over any strange misguided takeover strategy, which

why >> governments that intend to do such things choose nowadays to just

block > >> wikimedia altogether. It is not our wake-up call to take, but that

> the > >> Kazakh people. > >> > >> > > Facebook showed the other year that it could manipulate people by

what

> showed them in their feeds. > >

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

> > > http://www.bbc.co.uk/news/technology-28051930 > > > > > > They didn't do this for fun, they did it to show their clients > > > (advertisers, governments) that they could manipulate millions of > people. > > > You only need a small push in one direction or another to

influence a

large > population. Doesn't matter if the push is to buy a particular soap,

vote

> one way or another, or how you see a particular minority, or issue. > >

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Jane Darnell

12:30 p.m.

citation needed On Tue, Dec 29, 2015 at 1:27 PM, Gnangarra <gnangarra(a)gmail.com> wrote:

...

This is when sources truly become vital. But do remember, the POV of the USA and many of its sources are as suspect as those from Kazakhstan.

Hoi, So you have determined that people can be manipulated. Good, then what? If this is the tack that you take you will be grounded because there is

that

it is because of a lack of sources.. Sources are relevant but we only are interested in particular subjects.

are

better

job. When Facebook invites companies to manipulate people, it is why Facebook information is suspect. At most it is a reminder that

manipulation

my opinion we have forgotten why Wikipedia was possible in the first

place.

It was not because of sources, it was because of the web of information

created, a web that is of a NPOV. Wikidata does not have a NPOV. It represents facts found in many places.

On 28/12/2015 18:00, Jane Darnell wrote: > All I said is that the wiki way works, that's all. You can't hide it

when

> someone tries to take over a project, and that is the reason we

shouldn't

> try to anticipate that with convoluted strategies. "Assume Good Faith" > will > always win out over any strange misguided takeover strategy, which is

why >> governments that intend to do such things choose nowadays to just

block

> wikimedia altogether. It is not our wake-up call to take, but that of

the >> Kazakh people. >> >> > Facebook showed the other year that it could manipulate people by what

showed them in their feeds.

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

> http://www.bbc.co.uk/news/technology-28051930 > > They didn't do this for fun, they did it to show their clients > (advertisers, governments) that they could manipulate millions of

people.

You only need a small push in one direction or another to influence a

large > population. Doesn't matter if the push is to buy a particular soap,

vote

one way or another, or how you see a particular minority, or issue.

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Gerard Meijssen

12:35 p.m.

http://www.amnesty.nl/sites/default/files/public/ainl_guidelines_use_of_for… On 29 December 2015 at 13:30, Jane Darnell <jane023(a)gmail.com> wrote:

...

citation needed On Tue, Dec 29, 2015 at 1:27 PM, Gnangarra <gnangarra(a)gmail.com> wrote:

This is when sources truly become vital. But do remember, the POV of the USA and many of its sources are as suspect as those from Kazakhstan.

significant

Hoi, So you have determined that people can be manipulated. Good, then what? If this is the tack that you take you will be grounded because there is

no > plan. It is a negative attitude that only stifles. Quality is not only

sources, sources can be and are manipulations in their own right. Many important subjects are woefully underrepresented. The argument has it

that

it is because of a lack of sources.. Sources are relevant but we only are interested in particular subjects.

We > do not need to look at Kazakhstan to find fault. Amnest (reliable

source)

> indicates that all USA police forces are not in compliance with > international agreements on the use of force. NOW WHAT ?? > > When quality is the subject, it is important to decide how we

effectively

> improve quality. VIAF provided Wikidata with a list of issues they

found.

Tom checked it out and our quality is better as a result. It means that more information is linked for people who visit a library. When awards

are > known, adding known recipients in Wikidata based on info from multiple > Wikipedias improves the quality and in this way many incorrect links

are

exposed. When quality of our projects is the subject, decide how we can do a

better

job. When Facebook invites companies to manipulate people, it is why Facebook information is suspect. At most it is a reminder that

then

my opinion we have forgotten why Wikipedia was possible in the first

place.

It was not because of sources, it was because of the web of information

we > created, a web that is of a NPOV. > > Wikidata does not have a NPOV. It represents facts found in many

places.

Faith"

> >> will > >> always win out over any strange misguided takeover strategy, which

why >> governments that intend to do such things choose nowadays to just

block > >> wikimedia altogether. It is not our wake-up call to take, but that

> the > >> Kazakh people. > >> > >> > > Facebook showed the other year that it could manipulate people by

what

showed them in their feeds.

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

> http://www.bbc.co.uk/news/technology-28051930 > > They didn't do this for fun, they did it to show their clients > (advertisers, governments) that they could manipulate millions of

people.

You only need a small push in one direction or another to influence a

large > population. Doesn't matter if the push is to buy a particular soap,

vote

one way or another, or how you see a particular minority, or issue.

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Jane Darnell

5:33 p.m.

Interesting link, thanks Gerard! I was referring to a citation for this quote however: "and a

...

significant > selection of the information unsourced WikiDatas data lacks the quality, > integrity we all expect of ourselves when we add content to any of the > projects."

On Tue, Dec 29, 2015 at 1:35 PM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:

...

http://www.amnesty.nl/sites/default/files/public/ainl_guidelines_use_of_for… On 29 December 2015 at 13:30, Jane Darnell <jane023(a)gmail.com> wrote:

citation needed On Tue, Dec 29, 2015 at 1:27 PM, Gnangarra <gnangarra(a)gmail.com> wrote: > > > > This is when sources truly become vital. But do > > remember, the POV of the USA and many of its sources are as suspect

> > those from Kazakhstan. > > > And that is why regardless of the fact a citation is so important, > > because the person receiving the information must able to make their

own

assessment of the sources reliability with a CC0 license and a

significant > selection of the information unsourced WikiDatas data lacks the

quality,

> integrity we all expect of ourselves when we add content to any of the > projects. > > On 29 December 2015 at 20:15, Gerard Meijssen <

gerard.meijssen(a)gmail.com

> > wrote: > > > Hoi, > > So you have determined that people can be manipulated. Good, then

what?

> > > > If this is the tack that you take you will be grounded because there

> no > > plan. It is a negative attitude that only stifles. Quality is not

only

in > > sources, sources can be and are manipulations in their own right.

Many

> > important subjects are woefully underrepresented. The argument has it > that > > it is because of a lack of sources.. > > > > Sources are relevant but we only are interested in particular

subjects.

We > do not need to look at Kazakhstan to find fault. Amnest (reliable

source)

> indicates that all USA police forces are not in compliance with > international agreements on the use of force. NOW WHAT ?? > > When quality is the subject, it is important to decide how we

effectively

> improve quality. VIAF provided Wikidata with a list of issues they

found. > > Tom checked it out and our quality is better as a result. It means

that

> > more information is linked for people who visit a library. When

awards

> are > > known, adding known recipients in Wikidata based on info from

multiple

> Wikipedias improves the quality and in this way many incorrect links

> > their hobby horse. > > > > Quality is important but quality is more than sources. When sources

are

> used as an argument that is detrimental to the quality of Wikidata,

then > in > > my opinion we have forgotten why Wikipedia was possible in the first > place. > > It was not because of sources, it was because of the web of

information

we > created, a web that is of a NPOV. > > Wikidata does not have a NPOV. It represents facts found in many

places. > As > > the information becomes more extended, it becomes possible to find > > manipulations, errors. This is when sources truly become vital. But

> > remember, the POV of the USA and many of its sources are as suspect

> those from Kazakhstan. > Thanks, > GerardM > > On 29 December 2015 at 11:44, Lilburne <lilburne(a)tygers-of-wrath.net

> > wrote: > > > > > On 28/12/2015 18:00, Jane Darnell wrote: > > > > > >> All I said is that the wiki way works, that's all. You can't hide

> when > >> someone tries to take over a project, and that is the reason we > shouldn't > >> try to anticipate that with convoluted strategies. "Assume Good

Faith"

> >> will > >> always win out over any strange misguided takeover strategy, which

why >> governments that intend to do such things choose nowadays to just

block > >> wikimedia altogether. It is not our wake-up call to take, but that

> the > >> Kazakh people. > >> > >> > > Facebook showed the other year that it could manipulate people by

what

> showed them in their feeds. > >

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

influence a

large > population. Doesn't matter if the push is to buy a particular soap,

vote

> one way or another, or how you see a particular minority, or issue. > >

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org

?subject=unsubscribe>

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

2:18 p.m.

On Tue, Dec 29, 2015 at 10:44 AM, Lilburne <lilburne(a)tygers-of-wrath.net> wrote:

...

On 28/12/2015 18:00, Jane Darnell wrote:

I thought Epstein's and Robertson's paper, "The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections", was very interesting as well: http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-201… http://www.pnas.org/content/112/33/E4512.abstract On Mon, Dec 28, 2015 at 7:43 PM, Jane Darnell <jane023(a)gmail.com> wrote:

...

Jane Darnell

5:39 p.m.

...and you seem to think one can live by an encyclopedia. I can assure you, Wikipedia is a lot of things, but it is not a way of life. To answer your fear which I read between the lines of what you are saying, in order to create a Wikipedia project you need a basic list of 10,000 articles. The list as I am sure you are aware, is a pretty boring and strangely ordered grouping of fairly dry, non-political subjects. I believe there are very few articles on there that are worth firebombing someone over. [[Michael Jackson]] is on the list, among other notable Americans. Granted, you could get past the 10,000 article startup requirement somehow and then start creating lots of POV articles, but once you do this you will soon be discovered. There is just no way to hide it. On Tue, Dec 29, 2015 at 3:18 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Tue, Dec 29, 2015 at 10:44 AM, Lilburne <lilburne(a)tygers-of-wrath.net> wrote:

On 28/12/2015 18:00, Jane Darnell wrote: > All I said is that the wiki way works, that's all. You can't hide it

when

> someone tries to take over a project, and that is the reason we

shouldn't

> try to anticipate that with convoluted strategies. "Assume Good Faith" > will > always win out over any strange misguided takeover strategy, which is

why

> governments that intend to do such things choose nowadays to just block > wikimedia altogether. It is not our wake-up call to take, but that of

the

Kazakh people.

Facebook showed the other year that it could manipulate people by what it showed them in their feeds.

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

large

population. Doesn't matter if the push is to buy a particular soap, vote one way or another, or how you see a particular minority, or issue.

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

Do it to a naively trusted source and you have a triple word score jackpot^H^H^Hboot.

Well the chances of me being firebombed while on vacation in the states

are

probably higher than me being firebombed for editing Wikipedia, but that still doesn't mean we need to worry about changing the wiki model. I

guess

I have lost the thread of your point entirely now.

To be honest, I don't think you had ever gotten hold of it in the first place. To me, you seem to live in a very sheltered and naive world. If we have reports of Wikipedians being tortured in Azerbaijan (and there seems to have been some truth to these reports, as the sysop named in them was globally blocked by the WMF a short while later[1]), you should be able to understand that it is not quite as easy to live the wiki way there as it is in your country, and that some of the assumptions you have formed based on your own experiences of the wiki model may not hold in other locales. [1] https://meta.wikimedia.org/w/index.php?title=User:Irada&diff=12421543&a… _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Andreas Kolbe

8:06 p.m.

On Tue, Dec 29, 2015 at 5:39 PM, Jane Darnell <jane023(a)gmail.com> wrote:

...

Granted, you could get past the 10,000 article startup requirement somehow and then start creating lots of POV articles, but once you do this you will soon be discovered. There is just no way to hide it.

Jane, you're living in a fantasy world. We already have Wikipedias with these POV articles. They've been "discovered" long ago, and it makes zero difference. See e.g. the hagiography of the Uzbek President in the Uzbek Wikipedia[1] (him of the boiled dissidents). It hails him as the best thing since sliced bread. Then see what Human Rights organisations have to say about his regime[2], or compare the English Wikipedia article.[3] That train left the station a long time ago. The wiki model does *not* work in these contexts. [1] https://translate.google.com/translate?hl=en&sl=uz&tl=en&u=http… [2] https://www.hrw.org/europe/central-asia/uzbekistan [3] https://en.wikipedia.org/wiki/Islam_Karimov#Human_rights_and_press_freedom

...

On Tue, Dec 29, 2015 at 3:18 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

On Tue, Dec 29, 2015 at 10:44 AM, Lilburne <lilburne(a)tygers-of-wrath.net wrote:

On 28/12/2015 18:00, Jane Darnell wrote: > All I said is that the wiki way works, that's all. You can't hide it

when

> someone tries to take over a project, and that is the reason we

shouldn't

> try to anticipate that with convoluted strategies. "Assume Good Faith" > will > always win out over any strange misguided takeover strategy, which is

why >> governments that intend to do such things choose nowadays to just

block

> wikimedia altogether. It is not our wake-up call to take, but that of

the >> Kazakh people. >> >> > Facebook showed the other year that it could manipulate people by what

showed them in their feeds.

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

> http://www.bbc.co.uk/news/technology-28051930 > > They didn't do this for fun, they did it to show their clients > (advertisers, governments) that they could manipulate millions of

people.

You only need a small push in one direction or another to influence a

large > population. Doesn't matter if the push is to buy a particular soap,

vote

one way or another, or how you see a particular minority, or issue.

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

Do it to a naively trusted source and you have a triple word score jackpot^H^H^Hboot.

I thought Epstein's and Robertson's paper, "The search engine

manipulation

effect (SEME) and its possible impact on the outcomes of elections", was very interesting as well:

http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-201…

http://www.pnas.org/content/112/33/E4512.abstract On Mon, Dec 28, 2015 at 7:43 PM, Jane Darnell <jane023(a)gmail.com> wrote:

Well the chances of me being firebombed while on vacation in the states

are > probably higher than me being firebombed for editing Wikipedia, but

that

still doesn't mean we need to worry about changing the wiki model. I

guess

I have lost the thread of your point entirely now.

them

was globally blocked by the WMF a short while later[1]), you should be

able

to understand that it is not quite as easy to live the wiki way there as

is in your country, and that some of the assumptions you have formed

based

on your own experiences of the wiki model may not hold in other locales. [1]

https://meta.wikimedia.org/w/index.php?title=User:Irada&diff=12421543&a…

Jane Darnell

9:12 p.m.

Well I may live in a fantasy world, but that is entirely beside the point. When I say these things will be discovered, that's exactly what you are saying happened years ago. These things will always be discovered, because they are unhidable. In your example the Uzbek Wikipedians have learned to stay off certain pages in order to coexist with Uzbek authorities. Similar coping strategies exist on other projects. It doesn't mean the entire Uzbek encyclopedia is untrustworthy or that the wiki model is at fault. The trail of tears is in the talk pages. I don't see anything wrong with making such concessions, since after discovery it becomes public record and everyone knows it anyway. What I don't understand is what you are trying to say. If you are proposing something, just come out and propose it instead of complaining about what goes on in certain projects and jumping from one scare tactic to another. On Tue, Dec 29, 2015 at 9:06 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Tue, Dec 29, 2015 at 5:39 PM, Jane Darnell <jane023(a)gmail.com> wrote:

On Tue, Dec 29, 2015 at 3:18 PM, Andreas Kolbe <jayen466(a)gmail.com>

wrote:

> On Tue, Dec 29, 2015 at 10:44 AM, Lilburne <

lilburne(a)tygers-of-wrath.net

> > wrote: > > > On 28/12/2015 18:00, Jane Darnell wrote: > > > >> All I said is that the wiki way works, that's all. You can't hide it > when > >> someone tries to take over a project, and that is the reason we > shouldn't > >> try to anticipate that with convoluted strategies. "Assume Good

Faith"

> >> will > >> always win out over any strange misguided takeover strategy, which

why >> governments that intend to do such things choose nowadays to just

block > >> wikimedia altogether. It is not our wake-up call to take, but that

> the > >> Kazakh people. > >> > >> > > Facebook showed the other year that it could manipulate people by

what

showed them in their feeds.

http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-…

> http://www.bbc.co.uk/news/technology-28051930 > > They didn't do this for fun, they did it to show their clients > (advertisers, governments) that they could manipulate millions of

people.

You only need a small push in one direction or another to influence a

large > population. Doesn't matter if the push is to buy a particular soap,

vote

one way or another, or how you see a particular minority, or issue.

http://www.networkworld.com/article/2450825/big-data-business-intelligence/…

Do it to a naively trusted source and you have a triple word score jackpot^H^H^Hboot.

I thought Epstein's and Robertson's paper, "The search engine

manipulation > effect (SEME) and its possible impact on the outcomes of elections",

was

very interesting as well:

http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-201…

> > http://www.pnas.org/content/112/33/E4512.abstract > > > On Mon, Dec 28, 2015 at 7:43 PM, Jane Darnell <jane023(a)gmail.com>

wrote:

> > > Well the chances of me being firebombed while on vacation in the

states

are > probably higher than me being firebombed for editing Wikipedia, but

that > > still doesn't mean we need to worry about changing the wiki model. I > guess > > I have lost the thread of your point entirely now. > > > > To be honest, I don't think you had ever gotten hold of it in the first > place. To me, you seem to live in a very sheltered and naive world. > > If we have reports of Wikipedians being tortured in Azerbaijan (and

there

seems to have been some truth to these reports, as the sysop named in

them

was globally blocked by the WMF a short while later[1]), you should be

able > to understand that it is not quite as easy to live the wiki way there

is in your country, and that some of the assumptions you have formed

based > on your own experiences of the wiki model may not hold in other

locales.

[1]

https://meta.wikimedia.org/w/index.php?title=User:Irada&diff=12421543&a…

geni

20 Dec 20 Dec

3:44 p.m.

On 20 December 2015 at 13:18, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Both are semied on en. I think this mostly shows your ignorance of protection patterns. The first things you think of will pretty much always be protected since they are the ones that attract a lot of vandalism. You either use special:random or something closer to your personal interests. -- geni

Andreas Kolbe

14 Dec 14 Dec

12:02 a.m.

On Sun, Dec 13, 2015 at 6:10 PM, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

Wikidata's (envisaged) centralised nature certainly makes a difference, because the promise was that it would inform the Wikipedias. Wikipedia started out with people just writing from their personal knowledge. The early articles had no footnotes. Then after a while people noticed problems like cranks filling pages with their abstruse theories (hence the ban on original research), people adding material from their blogs, etc. Over the course of a decade, Wikipedia developed the idea and the culture that you have to cite a professionally published source for everything you add to Wikipedia. Wikidata is in its early stages. In a way it really is like Wikipedia in 2003. New content welcome! No references required! But at the same time, Wikidata is supposed to inform the Wikipedias, as a central data repository. This creates a mismatch between Wikidata's "early days -- anything goes, let's just get content in, we'll sort it out later" attitude and the relatively mature Wikipedias where editors insist on sources for any new content added. This out-of-synch-ness is a real problem if you want Wikipedias to actually use Wikidata content. Wikipedians will not accept content generation models that take Wikipedia back to its bad old days where you could write anything you liked without a source to back it up. Wikipedia is of course still a long way away from citing such sources for all its content. There are vast amounts of legacy material left over from the early days. But in the pages that are being created now (like developing news stories, an area where the quality of Wikipedia's coverage is often praised), pages that see a lot of traffic, pages that are controversial, etc., it is well established that you have to cite sources for any new assertions. Unsourced content is unceremoniously deleted. If Wikipedia's reputation for reliability has improved since 2003, that change in culture from the early days is the reason. The Age for example published an article the other day that is probably one of the most celebratory articles ever written about Wikipedia.[1] If you're a Wikipedian, you'll probably enjoy reading it. Among the aspects that the author, Elizabeth Farrelly, said she liked most about Wikipedia was "its ruthless commitment to the printed, demonstrable source." She ended the article as follows: ---o0o--- But most interesting to me is the ban on primary research. The demand that every input be traced to a published and authoritative source doesn't make it true, necessarily, but does enable genuine crowd-sourcing of scholarship. This is a revelation, and a revolution. So yes, Wikipedia is flawed. Above all, it needs more female input. But the obvious response, for you-and-me users who encounter something stupid or biased or just plain wrong, is to hop in there and fix it. I'll see you there, yes? Oh, and honey? Cite away! ---o0o--- Abandoning the principles that have elicited such praise -- traceability to published sources, verifiable citations -- is not something Wikipedians will entertain. To them, it would be a step back. If Wikidata wants to be an input to Wikipedia, it will have to bear that in mind. [1] http://www.theage.com.au/comment/why-wikipedia-at-15-is-a-beautiful-exercis…

...

I often say that the Wikimedia world made quality an "heisemberghian" feature: you always have to check if it's there. The point is: it's been always like this. We always had to check for quality, even when we used Britannica or authority controls or whatever "reliable" sources we wanted. Wikipedia, and now Wikidata, is made for everyone to contribute, it's open and honest in being open, vulnerable, prone to errors. But we are transparent, we say that in advance, we can claim any statement to the smallest detail. Of course it's difficult, but we can do it. Wikidata, as Lydia said, can actually have conflicting statements in every item: we "just" have to put them there, as we did to Wikipedia. If Google uses our data and they are wrong, that's bad for them. If they correct the errors and do not give us the corrections, that's bad for us and not ethical from them. The point is: there is no license (for what I know) that can force them to contribute to Wikidata. That is, IMHO, the problem with "over-the-top" actors: they can harness collective intelligent and "not give back." Even with CC-BY-SA, they could store (as they are probably already doing) all the data in their knowledge vault, which is secret as it is an incredible asset for them. I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA or CC0, but it's not there. So, as we are working via GLAMs with Wikipedia for getting reliable sources and content, we are working with them also for good statements and data. Putting good data in Wikidata makes it better, and I don't understand what is the problem here (I understand, again, the issue of putting too much data and still having a small community). For example: if we are importing different reliable databases, andthe institutions behind them find it useful and helpful to have an aggregator of identifiers and authority controls, what is the issue? There is value in aggregating data, because you can spot errors and inconsistencies. It's not easy, of course, to find a good workflow, but, again, that is *another* problem. So, in conclusion: I find many issues in Wikidata, but not on the mission/vision, just in the complexity of the project, the size of the dataset, the size of the community. Can we talk about those? Aubrey On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com>

wrote:

Jane, The issue is that you can't cite one Wikipedia article as a source in another.

However you can within the same article per [[WP:LEAD]].

Pete Forsyth

7:46 a.m.

On Sun, Dec 13, 2015 at 4:02 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

But at the same time, Wikidata is supposed to inform the Wikipedias, as a central data repository. This creates a mismatch between Wikidata's "early days -- anything goes, let's just get content in, we'll sort it out later" attitude and the relatively mature Wikipedias where editors insist on sources for any new content added. This out-of-synch-ness is a real problem if you want Wikipedias to actually use Wikidata content. Wikipedians will not accept content generation models that take Wikipedia back to its bad old days where you could write anything you liked without a source to back it up.

Andreas, I think there's an important piece you're missing (or at least not explicitly acknlowledging) here. Very few of the Wikipedias are "relatively mature." To the extent Wikidata is meant to help Wikipedia, I believe it is meant to help the less mature Wikipedias benefit from the more robust research into sources etc. that takes place at the big ones -- and help the big ones notice when they have out-of-sync information from one another, and make informed decisions about what to do about it. The analysis you offer here doesn't seem granular enough to capture this, and seems to miss the primary value of Wikidata when it comes to Wikipedia. Thoughts? Pete -- [[User:Peteforsyth]]

Andreas Kolbe

10:39 a.m.

On Mon, Dec 14, 2015 at 7:46 AM, Pete Forsyth <peteforsyth(a)gmail.com> wrote:

...

On Sun, Dec 13, 2015 at 4:02 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

But at the same time, Wikidata is supposed to inform the Wikipedias, as a central data repository. This creates a mismatch between Wikidata's

"early

days -- anything goes, let's just get content in, we'll sort it out

later"

attitude and the relatively mature Wikipedias where editors insist on sources for any new content added. This out-of-synch-ness is a real problem if you want Wikipedias to

actually

use Wikidata content. Wikipedians will not accept content generation

models

that take Wikipedia back to its bad old days where you could write

anything

you liked without a source to back it up.

Pete, Yes, those are good points I missed. Andreas

Gnangarra

12:10 a.m.

this is the issue in quality

...

If Google uses our data and they are wrong, that's bad for them.

Under CC) license when Google uses the information they dont need to attribute Wikidata, if that "wrong" data came from WD --> google ---> news source ---> WP not only has it been washed its now become a sourced fact in Wikipedia and there is no way to trace its orgins to WD... even if WD is changed to another source its unlike to be corrected in the rest of the chain, the whole WMF community have corrupted the data that is something we should be very concerned about. On 14 December 2015 at 02:10, Andrea Zanni <zanni.andrea84(a)gmail.com> wrote:

...

On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com>

wrote:

Jane, The issue is that you can't cite one Wikipedia article as a source in another.

However you can within the same article per [[WP:LEAD]].

-- GN. President Wikimedia Australia WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra Photo Gallery: http://gnangarra.redbubble.com

Gerard Meijssen

6:51 a.m.

Hoi, When an error exists in Wikidata, I can change it. When an error exists in Wikipedia I may change it. When an error exists in the Google info thingie, I can report it and, they DO change it. What we can do and should do is provide a two way channel to compare issues and work on improving the data. There is a reason to be concerned but it is not that data necessarily will always be wrong because Wikipedia or Wikidata or whoever said so. If anything it is in our attitude, I just found that one red link in the French Wikipedia could be a blue link. Do I need to remedy this or do we have ways to communicate/flag this. As long as we do not consider such workflows, you depend on the whim of people who see issues to improve it. I do understand sufficient French but what if it is in Farsi? Thanks, GerardM On 14 December 2015 at 01:10, Gnangarra <gnangarra(a)gmail.com> wrote:

...

this is the issue in quality

If Google uses our data and they are wrong, that's bad for them.

the

issue is maybe something worth discussing, and not the issue itself? Is

the

fact that Wikidata is centralised different from statements on

Wikipedia? I

and

intelligent

CC0, but it's not there. So, as we are working via GLAMs with Wikipedia for getting reliable sources and content, we are working with them also for good statements

and

data. Putting good data in Wikidata makes it better, and I don't

understand

aggregating data, because you can spot errors and inconsistencies. It's

not

wrote:

On Sun, Dec 13, 2015 at 5:32 PM, geni <geniice(a)gmail.com> wrote: > On 13 December 2015 at 15:57, Andreas Kolbe <jayen466(a)gmail.com>

wrote: > > > > > Jane, > > > > > > The issue is that you can't cite one Wikipedia article as a source

> another. > However you can within the same article per [[WP:LEAD]].

Gerard Meijssen

13 Dec 13 Dec

5:38 p.m.

Hoi, Wikidata is not Wikipedia. When it is imported from Wikipedia it often says so. It does not mean that all the related data is from one Wikipedia and consequently the composite data is information that may be relevantly different. Again you insist on your point of view. If you think that Wikidata is inferior for the reasons that you give; fine. Never mind, move on. In the mean time we will continually improve the quality of Wikidata and when Wikipedians fail to take notice they will find slowly but surely that the information in Wikidata is increasingly superior in the one area where it is most obvious: the silly mistakes that come to light when it is not only one Wikipedia that is the source of data. Thanks, GerardM On 13 December 2015 at 16:57, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Thanks for that essay, Lydia! You said it well, and I especially agree

with

what you wrote about trust and believing in ourselves. I had to laugh at some of the comments, because if you substitute "Wikipedia" for

"Wikidata"

those comments could have been written 3 years ago before Wikidata came

the scene. On Sat, Dec 12, 2015 at 10:18 PM, Lydia Pintscher < lydia.pintscher(a)wikimedia.de> wrote: > On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher > <lydia.pintscher(a)wikimedia.de> wrote: > > That is actually not correct. We have built Wikidata from the very > > beginning with some core believes. One of them is that Wikidata isn't > > supposed to have the one truth but instead is able to represent > > various different points of view and link to sources claiming these. > > Look for example at the country statements for Jerusalem: > > https://www.wikidata.org/wiki/Q1218 > > Now I am the first to say that this will not be able to capture the > > full complexity of the world around us. But that's not what it is > > meant to do. However please be aware that we have built more than

just

> > a dumb database with Wikidata and have gone to great length to make

possible to capture knowledge diversity.

I've taken the time and written a longer piece about data quality and knowledge diversity on Wikidata for the current edition of the Signpost:

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed

Jane Darnell

8:24 p.m.

Andreas, That's just not true. You can re-use and remix Wikimedia content as much as you like. When you say you "can't cite one Wikipedia article as a source in another", this is also not true, as we see this done in translated articles in the edit summary. Fortunately Wikipedia articles need sources, so those are translated along with the rest f the content and are perfectly valid to take from one project to another. In art history, when we are talking about paintings, we are all mostly talking about the same sources anyway, worldwide. This is probably true for most other disciplines as well. As far as citing goes, the ratio of cited vs. uncited statements in Wikipedia is probably much greater than in Wikidata, except we can't measure that. All we measure is the "reference" statement, but there are lots of sources in various properties and my guess is that most items with zero statements are early imports that have just not had anyone click on them yet. When we use images in Wikipedia articles, we do not "cite" Wikimedia Commons. Indeed, this is exactly the problem we have when we talk to GLAMs about image donations. The link itself is enough to allow the user with a few clicks to get at the image information on Commons, where there is more information, including sources. When I as a Wikipedian use images of paintings from Commons in a Wikipedia article, I am using multiple sources for that article, but some of those sources may be from the Commons image itself, as some of these are particularly well-sourced. When I am updating the associated Wikidata item, I add all of the sources that I have found, and for the more famous paintings, others add links from their own sources, making Wikidata much richer as a source of references than any single project. As Lydia explained however, not every individual statement in Wikidata is sourced, though each item may be sourced to multiple references. This is partially because we lack the tools to easily source each statement when we update multiple statements at a time, but it is also because we don't *need* to source obvious statements. The point is, that publishing on any Wikmedia project, whether it's Wikipedia, Wikimedia Commons, or Wikidata, is a manually-driven complex process done by volunteers. It is not and never will be automatic. Jane On Sun, Dec 13, 2015 at 4:57 PM, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Thanks for that essay, Lydia! You said it well, and I especially agree

with

what you wrote about trust and believing in ourselves. I had to laugh at some of the comments, because if you substitute "Wikipedia" for

"Wikidata"

those comments could have been written 3 years ago before Wikidata came

just

> > a dumb database with Wikidata and have gone to great length to make

possible to capture knowledge diversity.

I've taken the time and written a longer piece about data quality and knowledge diversity on Wikidata for the current edition of the Signpost:

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed

Henning Schlottmann

15 Dec 15 Dec

5:43 p.m.

On 08.12.2015 00:52, Craig Franklin wrote:

...

In a database, we are limited to saying that Jerusalem either is or is not the capital of Israel.

We are not. Wikidata is a repositum of statements. It can contain both a statement that Jerusalem is the caital of Israel and another statement that it is not. In a serious project both would exist and both had other statements linked to them, stating who thinks so. Ciao Henning

Andreas Kolbe

29 Nov 29 Nov

2:38 a.m.

On Sun, Nov 29, 2015 at 12:37 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com

...

wrote:

...

As to Grasulf, you failed to get the point. It was NOT about the data itself but about the presentation.

QED. :)

Gerard Meijssen

10:08 a.m.

Hoi, If anything it proves that you did not understand. Happy that you appreciate what you finally see. Thanks, GerardM On 29 November 2015 at 03:38, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

On Sun, Nov 29, 2015 at 12:37 AM, Gerard Meijssen < gerard.meijssen(a)gmail.com

wrote:

As to Grasulf, you failed to get the point. It was NOT about the data itself but about the presentation.

QED. :) _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Lilburne

10:40 a.m.

On 29/11/2015 00:37, Gerard Meijssen wrote:

...

Gerard Meijssen

10:42 a.m.

Hoi, When you do that all your data is removed and you are banned from Wikidata. Thanks, GerardM On 29 November 2015 at 11:40, Lilburne <lilburne(a)tygers-of-wrath.net> wrote:

...

On 29/11/2015 00:37, Gerard Meijssen wrote:

Isn't the point that the data was taken primarily because it was available, and that there was no attempt to verify its accuracy. If I give you 10,000 images of lichen but before hand randomly switch the names of 2000 of them and add misleading geodata randomly to another 2000 are the images useful as data? Would including them improve NPOV? _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Lilburne

6:13 p.m.

Then you've not understood the point have you. Whether it is freely available ought to be the first stage of a process that verifies the accuracy of the data. the accuracy of the On 29/11/2015 10:42, Gerard Meijssen wrote:

...

Hoi, When you do that all your data is removed and you are banned from Wikidata. Thanks, GerardM On 29 November 2015 at 11:40, Lilburne <lilburne(a)tygers-of-wrath.net <mailto:lilburne@tygers-of-wrath.net>> wrote: On 29/11/2015 00:37, Gerard Meijssen wrote: Hoi, It was from the Myanmar WIkipedia that a lot of data was imported to Wikidata. Data that did not exist elsewhere. I do not care really what "Freedom House" says. I do not know them, I do know that the data is relevant and useful It was even the subject on a blogpost.. You may ignore data that is not from a source that you like. This indiscriminate POV is not a NPOV. Isn't the point that the data was taken primarily because it was available, and that there was no attempt to verify its accuracy. If I give you 10,000 images of lichen but before hand randomly switch the names of 2000 of them and add misleading geodata randomly to another 2000 are the images useful as data? Would including them improve NPOV? _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org <mailto:Wikimedia-l@lists.wikimedia.org> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe>

Andreas Kolbe

7 Dec 7 Dec

11:02 p.m.

Hi Markus, On 1 December 2015 at 23:43, Markus Krötzsch <markus at semantic-mediawiki.org> <wikidata%40lists.wikimedia.org?Subject=Re%3A%20%5BWikidata%5D%20%5BWikimedia-l%5D%20Quality%20issues&In-Reply-To=%3C565E30AB.6000709%40semantic-mediawiki.org%3E> wrote:

...

[I continue cross-posting for this reply, but it would make sense to return the thread to the Wikidata list where it started, so as to avoid partial discussions happening in many places.]

Apologies for the late reply. While you indicated that you had crossposted this reply to Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after Atlasowa pointed it out on the Signpost op-ed's talk page.[1]

...

On 27.11.2015 12:08, Andreas Kolbe wrote:

...

>* Wikipedia content is considered a reliable source in Wikidata, and

*> >* Wikidata content is used as a reliable source by Google, where it *> >* appears without any indication of its provenance.*

...

This prompted me to reply. I wanted to write an email that merely says: >

"Really? Where did you get this from?" (Google using Wikidata content) Multiple sources, including what appears to be your own research group's writing:[2] ---o0o--- In December 2013, Google announced that their own collaboratively edited knowledge base, Freebase, is to be discontinued in favour of Wikidata, which gives Wikidata a prominent role as an in[p]ut for Google Knowledge Graph. The research group Knowledge Systems <https://ddll.inf.tu-dresden.de/web/Knowledge_Systems/en> is working in close cooperation with the development team behind Wikidata, and provides, e.g., the regular Wikidata RDF-Exports. ---o0o---

...

But then I read the rest ... so here you go ...

...

Your email mixes up many things and effects, some of which are important issues (e.g., the fact that VIAF is not a primary data source that should be used in citations). Many other of your remarks I find very hard to take serious, including but not limited to the following:

...

* A rather bizarre connection between licensing models and accountability (as if it would make content more credible if you are legally required to say that you found it on Wikipedia, or even give a list of user names and IPs who contributed)

Both Freebase and Wikipedia have attribution licences. When Bing's Snapshot displays information drawn from Freebase or Wikipedia, it's indicated thus at the bottom of the infobox[3]: ---o0o--- Data from Freebase · Wikipedia ---o0o--- I take this as a token gesture to these sources' attribution licences. Given the amount of space they have available, I would think most people would agree that this form of attribution is sufficient. You couldn't possibly expect them to list all contributors who have ever contributed to the lead of the Wikipedia article, for example, as the letter of the licence might require. However, I think it's proper and important that those minimal attributions are there. And given Wikidata's CC0 licence, I don't expect re-users to continue attributing in this manner. This view is shared by Max Klein for example, who is quoted to that effect in the Signpost op-ed.[4]

...

* Some stories that I think you really just made up for the sake of > argument (Denny alone has picked the Wikidata license?

Denny led the development team. There are multiple public instances and accounts of his having advocated this choice and convinced people of the wisdom of it, in Wikidata talk pages and elsewhere, including a recent post on the Wikidata mailing list.[5] Interestingly, he originally said that this would mean there could be no imports from Wikipedia, and that there was in fact no intention to import data from Wikipedias (see op-ed).[6] He also said, higher up on that page, that this was "for starters", and that that decision could easily be changed later on by the community.[7]

...

Google displays Wikidata content?

See above. If Wikidata plays "a prominent role as an in[p]ut for Google Knowledge Graph" then I would expect there to be correspondences between Knowledge Graph and Wikidata content.

...

Bing is fuelled by Wikimedia?)

I spoke of "Wikimedia-fuelled search engines like Google and Bing" in the context of the Google Knowledge Graph and Bing's Snapshot/Satori equivalent. We all know that in both cases, much of the content Google and Bing display in these infoboxes comes from Wikimedia projects (Wikipedia, Commons and now, apparently, Wikidata).

...

* Some disjointed remarks about the history of capitalism> * The assertion that content is worse just because the author who > created it used a bot for editing

I spoke of "bot users mass-importing unreliable data". It's not the bot method that makes the data unreliable: they are unreliable to begin with (because they are unsourced, nobody verifies the source, etc.). As I pointed out in this week's op-ed, of the top fifteen hoaxes in the English Wikipedia, six have active Wikidata items (or rather, had: they were deleted this morning, after the op-ed appeared). This is what I mean by unreliable data.

...

* The idea that engineers want to build systems with bad data because > they like the challenge of cleaning it up -- I mean: really! There is > nothing one can even say to this.

Again, this is not quite what I was trying to convey. My impression is that the current community effort at Wikidata emphasises speed: hence the mass imports of data from Wikipedia, whether verifiable or not, contrary to original intentions, as represented by Denny's quote above. As far as I can make out, present-day thinking among many Wikidatans is: let's get lots of data in fast even though we know some of it will be bad. Afterwards, we can then apply clever methods to check for inconsistencies and clean our data up -- which is a challenge people do seem to warm to. Meanwhile, others throw up their arms in dismay and say, "Stop! You're importing bad data." Wouldn't you agree that this characterises some of the recent discussions on the Wikidata Project Chat page? The two camps seem approximately evenly represented in the discussions I've seen. But while the one camp says "Stop!", the other camp continues importing. So in practice, the importers are getting their way.

...

* The complaint that Wikimedia employs too much engineering expertise > and too little content expertise (when, in reality, it is a key > principle of Wikimedia to keep out of content, and communities regularly > complain WMF would still meddle too much).

Is it not obvious that I was talking about community practices rather than the actions of Wikimedia staff?

...

* All those convincing arguments you make against open, anonymous > editing because of it being easy to manipulate (I've heard this from > Wikipedia critics ten years ago; wonder what became of them)

Such criticisms are still regularly levelled at Wikipedia, in top-quality publications. If you really want, I can send you a literature list, but you could begin with this article in Newsweek.[6]

...

* And, finally, the culminating conspiracy theory of total control over > political opinion, destroying all plurality by allowing only one > viewpoint (not exactly what I observe on the Web ...) -- and topping > this by blaming it all on the choice of a particular Creative Commons > license for Wikidata! Really, you can't make this up.

The information provided by default to billions of search engine users *matters*. You can never prevent an individual from going to a website that espouses a different view, but you don't have to for that information to have a measurable effect. Robert Epstein and Ronald E. Robertson recently published a paper on what they called "The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections".[9] It provides further detail.

...

Summing up: either this is an elaborate satire that tries to test how > serious an answer you will get on a Wikimedia list, or you should > *seriously* rethink what you wrote here, take back the things that are > obviously bogus, and have a down-to-earth discussion about the topics > you really care about (licenses and cyclic sourcing on Wikimedia > projects, I guess; "capitalist companies controlling public media" > should be discussed in another forum).

No satire was intended. I hope I have succeeded in making my points clearer. Regards, Andreas [1] https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2015-12-02/… [2] https://ddll.inf.tu-dresden.de/web/Wikidata/en [3] http://www.bing.com/search?q=jerusalem&go=Submit&qs=n&form=QBLH… [4] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed [5] https://lists.wikimedia.org/pipermail/wikidata/2015-December/007769.html [6] https://archive.is/ZbV5A#selection-2997.0-3009.26 [7] https://archive.is/ZbV5A#selection-2755.308-2763.27 [8] http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-bus… [9] http://www.pnas.org/content/112/33/E4512.abstract

Gerard Meijssen

30 Dec 30 Dec

12:47 p.m.

Hoi, "He who is without sin, throws the first stone". I read this article [1] in Wired and it seems to me that Wikipedians, English Wikipedians at that have plenty to do to get their own house in order. The topic was quality particularly in Wikidata and it degenerated in a conversation that included the Kazhak Wikipedia, the potential to manipulate information and whatever. I am happy to say that quality is an issue. It is an issue for all of us. However, I am firmly with Jane that once we have identified issues, we should either come up with ways to make them manageable and/or identifiable. The confrontation of 'sources or die' is easy" DIE. That is not to say that sources are important but they hide too much and they too are often and easily manipulated. When quality is at issue, concentrate on that subject and for a moment forget about secundary or tertiary caveats. If we can agree that our own efforts, positively applied, will help us improve quality, we have a way forward. There are micro and macro ways of improving quality. I give an example of both. Psychiatry and stigma are subjects woefully underdeveloped. I have added one person and connected her to two award, a book, a few organisations, people teaching at the University of Maastricht and several other people occupied in this field. I asked her for additional information to expand the field. This is a micro contribution and because of the links it has quality. A German University is interested to use Wikidata and wants to connect its content to our content. They are happy to share their data and it is important to them when their data is sourced to them. We are talking and it may become a reality. These are two ways of improving quality, one of them is explicitly about sourcing. To me it is less in them being a source as them including their reputation at the same time. The info I added about "ervaringsdeskundigheid" is likely to be kept because it is well connected and at some choice points sources are all too easy to include. Another reason why it will stay is that my reputation is such that it is more than likely correct. Even that is not so much of a concern because as more data becomes available in Wikidata possible errors will be found and corrected. (there are none as far as I am aware). The point of this all? Quality is a goal, it is something that you achieve by hard work. Wikipedia is a quality resource and it does have rough edges. Wikidata is immature, underdeveloped and in need of all the love and care it can get. Yes, there are secondary and tertiary concerns. But they should not remove our attention of what is our main concern; the improved quality that we can achieve only when we collaborate. At that Wikidata has plenty to offer to Wikipedia already. In my opinion the easiest results are not so much in the info boxes but more in revitalising the red links and removing the many many links that are plain wrong. Thanks, GerardM [1] http://arstechnica.com/staff/2015/12/editorial-wikipedia-fails-as-an-encycl… On 8 December 2015 at 00:02, Andreas Kolbe <jayen466(a)gmail.com> wrote:

...

Hi Markus, On 1 December 2015 at 23:43, Markus Krötzsch <markus at semantic-mediawiki.org> <wikidata% 40lists.wikimedia.org?Subject=Re%3A%20%5BWikidata%5D%20%5BWikimedia-l%5D%20Quality%20issues&In-Reply-To=%3C565E30AB.6000709%40semantic-mediawiki.org%3E

wrote:

[I continue cross-posting for this reply, but it would make sense to return the thread to the Wikidata list where it started, so as to avoid partial discussions happening in many places.]

On 27.11.2015 12:08, Andreas Kolbe wrote:

>* Wikipedia content is considered a reliable source in Wikidata, and

*> >* Wikidata content is used as a reliable source by Google, where it *> >* appears without any indication of its provenance.* > This prompted me to reply. I wanted to write an email that merely says:

But then I read the rest ... so here you go ...

argument (Denny alone has picked the Wikidata license? Denny led the development team. There are multiple public instances and accounts of his having advocated this choice and convinced people of the wisdom of it, in Wikidata talk pages and elsewhere, including a recent post on the Wikidata mailing list.[5] Interestingly, he originally said that this would mean there could be no imports from Wikipedia, and that there was in fact no intention to import data from Wikipedias (see op-ed).[6] He also said, higher up on that page, that this was "for starters", and that that decision could easily be changed later on by the community.[7]

Google displays Wikidata content?

See above. If Wikidata plays "a prominent role as an in[p]ut for Google Knowledge Graph" then I would expect there to be correspondences between Knowledge Graph and Wikidata content.

Bing is fuelled by Wikimedia?)

* Some disjointed remarks about the history of capitalism> * The

assertion that content is worse just because the author who > created it used a bot for editing I spoke of "bot users mass-importing unreliable data". It's not the bot method that makes the data unreliable: they are unreliable to begin with (because they are unsourced, nobody verifies the source, etc.). As I pointed out in this week's op-ed, of the top fifteen hoaxes in the English Wikipedia, six have active Wikidata items (or rather, had: they were deleted this morning, after the op-ed appeared). This is what I mean by unreliable data. > * The idea that engineers want to build systems with bad data because

they like the challenge of cleaning it up -- I mean: really! There is

nothing one can even say to this. Again, this is not quite what I was trying to convey. My impression is that the current community effort at Wikidata emphasises speed: hence the mass imports of data from Wikipedia, whether verifiable or not, contrary to original intentions, as represented by Denny's quote above. As far as I can make out, present-day thinking among many Wikidatans is: let's get lots of data in fast even though we know some of it will be bad. Afterwards, we can then apply clever methods to check for inconsistencies and clean our data up -- which is a challenge people do seem to warm to. Meanwhile, others throw up their arms in dismay and say, "Stop! You're importing bad data." Wouldn't you agree that this characterises some of the recent discussions on the Wikidata Project Chat page? The two camps seem approximately evenly represented in the discussions I've seen. But while the one camp says "Stop!", the other camp continues importing. So in practice, the importers are getting their way. > * The complaint that Wikimedia employs too much engineering expertise

and too little content expertise (when, in reality, it is a key > principle of Wikimedia to keep out of content, and communities regularly > complain WMF would still meddle too much). Is it not obvious that I was talking about community practices rather than the actions of Wikimedia staff? > * All those convincing arguments you make against open, anonymous

editing because of it being easy to manipulate (I've heard this from

Wikipedia critics ten years ago; wonder what became of them) Such criticisms are still regularly levelled at Wikipedia, in top-quality publications. If you really want, I can send you a literature list, but you could begin with this article in Newsweek.[6] > * And, finally, the culminating conspiracy theory of total control over > political opinion, destroying all plurality by allowing only one

viewpoint (not exactly what I observe on the Web ...) -- and topping > this by blaming it all on the choice of a particular Creative Commons > license for Wikidata! Really, you can't make this up. The information provided by default to billions of search engine users *matters*. You can never prevent an individual from going to a website that espouses a different view, but you don't have to for that information to have a measurable effect. Robert Epstein and Ronald E. Robertson recently published a paper on what they called "The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections".[9] It provides further detail. > Summing up: either this is an elaborate satire that tries to test how

serious an answer you will get on a Wikimedia list, or you should

*seriously* rethink what you wrote here, take back the things that are

obviously bogus, and have a down-to-earth discussion about the topics > you really care about (licenses and cyclic sourcing on Wikimedia > projects, I guess; "capitalist companies controlling public media" > should be discussed in another forum). No satire was intended. I hope I have succeeded in making my points clearer. Regards, Andreas [1] https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2015-12-02/… [2] https://ddll.inf.tu-dresden.de/web/Wikidata/en [3] http://www.bing.com/search?q=jerusalem&go=Submit&qs=n&form=QBLH… [4] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed [5] https://lists.wikimedia.org/pipermail/wikidata/2015-December/007769.html [6] https://archive.is/ZbV5A#selection-2997.0-3009.26 [7] https://archive.is/ZbV5A#selection-2755.308-2763.27 [8] http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-bus… [9] http://www.pnas.org/content/112/33/E4512.abstract _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

3040

days inactive

3080

days old

wikimedia-l@lists.wikimedia.org

Manage subscription

162 comments

28 participants

tags (0)

participants (28)

Andrea Zanni
Andreas Kolbe
Craig Franklin
Denny Vrandečić
Ed Erhart
Fæ
geni
Gerard Meijssen
Gergo Tisza
Gergő Tisza
Gnangarra
Henning Schlottmann
Jane Darnell
Leila Zia
Liam Wyatt
Lila Tretikov
Lilburne
Lydia Pintscher
Milos Rancic
Pete Forsyth
Peter Southwood
Petr Kadlec
Richard Symonds
Risker
Rob
WereSpielChequers
Wil Sinclair
Yaroslav M. Blanter