(Note: I'm creating a new thread which references several old ones; in the most recent, "Profile of Magnus Manske," the conversation has drifted back to Wikidata, so that subject line is no longer applicable.)
Andreas Kolbe has argued in multiple threads that Wikidata is fundamentally problematic, on the basis that it does not require citations. (Please correct me if I am mistaken about this core premise.) I've found these threads illuminating, and appreciate much of what has been said by all parties.
However, that core premise is problematic. If the possibility of people publishing uncited information were fundamentally problematic, here are several platforms that we would have to consider ethically problematic at the core: * Wikipedia (which for many years had very loose standards around citations) * Wikipediocracy (of which Andreas is a founding member) and all Internet forums * All blogs * YouTube * Facebook * The Internet itself * The printing press
Every one of the platforms listed above created opportunities for people -- even anonymously -- to publish information without a citation. If we are to fault Wikidata on this basis, it would be wrong not to apply the same standard to other platforms.
I'm addressing this now, because I think it is becoming problematic to paint Wikidata as a flawed project with a broad brush. Wikidata is an experiment, and it will surely lead to flawed information in some instances. But I think it would be a big problem to draw the conclusion that Wikidata is problematic overall.
That said, it is becoming ever more clear that the Wikimedia Foundation has developed big plans that involve Wikidata; and those big plans are not open to scrutiny.
THAT, I believe, is a problem.
Wikidata is not a problem; but it is something that could be leveraged in problematic ways (and/or highly beneficial ways).
I feel it is very important that we start looking at these issues from that perspective.
-Pete [[User:Peteforsyth]]
Hoi, Thanks for the FUD. You mention that the Wikimedia Foundation has plans. Really.. There are plans that are published and there has been time for you to consider them. They are the ones that the WMF has published, they are the only ones that exist as far as I know and I follow Wikidata closely. So where are your sources Pete?
When other plans exist, the WMF is not the party developing them. For instance: I am arguing for the use of Wikidata in links and redlinks. I have published about it and I welcome comments. I asked you personally and you were not even interested.
Why should anyone be interested now? Thanks, GerardM
On 26 January 2016 at 08:33, Pete Forsyth peteforsyth@gmail.com wrote:
(Note: I'm creating a new thread which references several old ones; in the most recent, "Profile of Magnus Manske," the conversation has drifted back to Wikidata, so that subject line is no longer applicable.)
Andreas Kolbe has argued in multiple threads that Wikidata is fundamentally problematic, on the basis that it does not require citations. (Please correct me if I am mistaken about this core premise.) I've found these threads illuminating, and appreciate much of what has been said by all parties.
However, that core premise is problematic. If the possibility of people publishing uncited information were fundamentally problematic, here are several platforms that we would have to consider ethically problematic at the core:
- Wikipedia (which for many years had very loose standards around
citations)
- Wikipediocracy (of which Andreas is a founding member) and all Internet
forums
- All blogs
- YouTube
- The Internet itself
- The printing press
Every one of the platforms listed above created opportunities for people -- even anonymously -- to publish information without a citation. If we are to fault Wikidata on this basis, it would be wrong not to apply the same standard to other platforms.
I'm addressing this now, because I think it is becoming problematic to paint Wikidata as a flawed project with a broad brush. Wikidata is an experiment, and it will surely lead to flawed information in some instances. But I think it would be a big problem to draw the conclusion that Wikidata is problematic overall.
That said, it is becoming ever more clear that the Wikimedia Foundation has developed big plans that involve Wikidata; and those big plans are not open to scrutiny.
THAT, I believe, is a problem.
Wikidata is not a problem; but it is something that could be leveraged in problematic ways (and/or highly beneficial ways).
I feel it is very important that we start looking at these issues from that perspective.
-Pete [[User:Peteforsyth]] _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On Jan 26, 2016 3:22 AM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
Thanks for the FUD.
"Fear, Uncertainty, and Doubt" are not the precise words I would choose, but they fairly adequately describe how I feel about the WMF these days.
Of course, as a bit of jargon, FUD typically implies that somebody is trying to use those emotions in a manipulative way.
All I can say to that is....nope, not my intention.
So where are your sources Pete?
First, the main point of my email was to challenge what I consider a poor argument against Wikidata. That point is, IMO, the important one.
However, you're right: I did talk about my beliefs. I do believe there is a problem to be considered; and I don't think I need to offer proof for what my own beliefs are.
But, i agree, some substantiation is worthwhile. I consider the following to be the most interesting published documents relating to these issues: https://commons.wikimedia.org/wiki/File:Discovery_Year_0-1-2.pdf https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2016-01-13/Op-ed
It is very clear that the WMF has big plans, and that we have only seen parts of those plans. What those plans are, and whether they are good ones, remains to be explored; but the opaqueness of the plans is itself a problem. That is my point.
When other plans exist, the WMF is not the party developing them. For instance: I am arguing for the use of Wikidata in links and redlinks. I have published about it and I welcome comments. I asked you personally and you were not even interested.
OK, this part is getting silly. You presented an idea to me in private that is obviously a good idea. But, as I explained to you, your single-minded interest in me expressing an opinion on it gave me pause. I explained to you that you seemed more interested in setting me up to be a part of your political point, than in actually having a discussion. So I declined to discuss your idea.
This message seems to prove that my instincts were correct.
Pete
On 26 January 2016 at 08:33, Pete Forsyth peteforsyth@gmail.com wrote:
(Note: I'm creating a new thread which references several old ones; in
the
most recent, "Profile of Magnus Manske," the conversation has drifted
back
to Wikidata, so that subject line is no longer applicable.)
Andreas Kolbe has argued in multiple threads that Wikidata is
fundamentally
problematic, on the basis that it does not require citations. (Please correct me if I am mistaken about this core premise.) I've found these threads illuminating, and appreciate much of what has been said by all parties.
However, that core premise is problematic. If the possibility of people publishing uncited information were fundamentally problematic, here are several platforms that we would have to consider ethically problematic
at
the core:
- Wikipedia (which for many years had very loose standards around
citations)
- Wikipediocracy (of which Andreas is a founding member) and all
Internet
forums
- All blogs
- YouTube
- The Internet itself
- The printing press
Every one of the platforms listed above created opportunities for
people --
even anonymously -- to publish information without a citation. If we
are to
fault Wikidata on this basis, it would be wrong not to apply the same standard to other platforms.
I'm addressing this now, because I think it is becoming problematic to paint Wikidata as a flawed project with a broad brush. Wikidata is an experiment, and it will surely lead to flawed information in some instances. But I think it would be a big problem to draw the conclusion that Wikidata is problematic overall.
That said, it is becoming ever more clear that the Wikimedia Foundation
has
developed big plans that involve Wikidata; and those big plans are not
open
to scrutiny.
THAT, I believe, is a problem.
Wikidata is not a problem; but it is something that could be leveraged
in
problematic ways (and/or highly beneficial ways).
I feel it is very important that we start looking at these issues from
that
perspective.
-Pete [[User:Peteforsyth]] _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On Tue, Jan 26, 2016 at 7:33 AM Pete Forsyth peteforsyth@gmail.com wrote:
(Note: I'm creating a new thread which references several old ones; in the most recent, "Profile of Magnus Manske," the conversation has drifted back to Wikidata, so that subject line is no longer applicable.)
Andreas Kolbe has argued in multiple threads that Wikidata is fundamentally problematic, on the basis that it does not require citations. (Please correct me if I am mistaken about this core premise.)
Every statement on Wikidata /should/ be referenced, unless the statement itself points to a reference (e.g. VIAF, images). However, at the moment, this is not a requirement, as Wikidata is still in a steep growth phase. Over the last few years, many statements were added by bots, which can process e.g. Wikipedia, but would be hard pressed to find the original reference for a statement.
Humans, bots, and tools increaingly add references to Wikidata statements; I wouldn't be surprised if Wikidata starts requiring references within the next few years on all (new) statements.
I've found these threads illuminating, and appreciate much of what has been said by all parties.
However, that core premise is problematic. If the possibility of people publishing uncited information were fundamentally problematic, here are several platforms that we would have to consider ethically problematic at the core:
- Wikipedia (which for many years had very loose standards around
citations)
- Wikipediocracy (of which Andreas is a founding member) and all Internet
forums
- All blogs
- YouTube
- The Internet itself
- The printing press
Every one of the platforms listed above created opportunities for people -- even anonymously -- to publish information without a citation. If we are to fault Wikidata on this basis, it would be wrong not to apply the same standard to other platforms.
I'm addressing this now, because I think it is becoming problematic to paint Wikidata as a flawed project with a broad brush. Wikidata is an experiment, and it will surely lead to flawed information in some instances. But I think it would be a big problem to draw the conclusion that Wikidata is problematic overall.
That said, it is becoming ever more clear that the Wikimedia Foundation has developed big plans that involve Wikidata; and those big plans are not open to scrutiny.
THAT, I believe, is a problem.
Well, I sure hope WMF has big plans for Wikidata! But do you know of any such plans that don't revolve around the usual suspects, such as importing/linking to extisting datasets, or re-using Wikidata in third-party sites and products? For example, a "secret" plan along the lines of "company X wants to use Wikidata, but they don't want to announce this publicly yet" would be perfectly fine by me. Wikidata is CC-0; technically, no one needs to even ask permission or link back. I simply do not see any sinister, nefarious plan the WMF /could/ have for Wikidata, given their long established policy of staying away from editing contents.
If you have even minimum indications of "evil" WMF plans for Wikidata, please share them! Saying "I know nothing about their plans, therefore they must be evil" doesn't really cut it.
Cheers, Magnus
Wikidata is not a problem; but it is something that could be leveraged in problematic ways (and/or highly beneficial ways).
I feel it is very important that we start looking at these issues from that perspective.
-Pete [[User:Peteforsyth]] _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On Jan 26, 2016 5:24 AM, "Magnus Manske" magnusmanske@googlemail.com wrote:
On Tue, Jan 26, 2016 at 7:33 AM Pete Forsyth peteforsyth@gmail.com
wrote:
<snipping most of Mangnus' message, which I appreciate and agree with>
If you have even minimum indications of "evil" WMF plans for Wikidata, please share them! Saying "I know nothing about their plans, therefore
they
must be evil" doesn't really cut it.
Indeed, if that were what I was saying...that would be nuts!
I do not have an opinion on the quality (or moral value, for that matter!) of whatever plans the senior leadership of WMF has around structured data, search, discovery, knowledge engines, etc.
But I do find the secretive approach to planning problematic.
The plans may very well turn out to be good ones (as I said in my original message). But that will not justify the level of secrecy we are seeing lately.
Pete [[User:Peteforsyth]]
Wikidata is not a problem; but it is something that could be leveraged
in
problematic ways (and/or highly beneficial ways).
I feel it is very important that we start looking at these issues from
that
perspective.
-Pete [[User:Peteforsyth]] _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, You write about your fear, uncertainty and doubt .. Why have us waste time on it? Do something useful. Thanks, GerardM
On 26 January 2016 at 11:33, Pete Forsyth peteforsyth@gmail.com wrote:
On Jan 26, 2016 5:24 AM, "Magnus Manske" magnusmanske@googlemail.com wrote:
On Tue, Jan 26, 2016 at 7:33 AM Pete Forsyth peteforsyth@gmail.com
wrote:
<snipping most of Mangnus' message, which I appreciate and agree with>
If you have even minimum indications of "evil" WMF plans for Wikidata, please share them! Saying "I know nothing about their plans, therefore
they
must be evil" doesn't really cut it.
Indeed, if that were what I was saying...that would be nuts!
I do not have an opinion on the quality (or moral value, for that matter!) of whatever plans the senior leadership of WMF has around structured data, search, discovery, knowledge engines, etc.
But I do find the secretive approach to planning problematic.
The plans may very well turn out to be good ones (as I said in my original message). But that will not justify the level of secrecy we are seeing lately.
Pete [[User:Peteforsyth]]
Wikidata is not a problem; but it is something that could be leveraged
in
problematic ways (and/or highly beneficial ways).
I feel it is very important that we start looking at these issues from
that
perspective.
-Pete [[User:Peteforsyth]] _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On Tue, Jan 26, 2016 at 12:56 PM, Gerard Meijssen <gerard.meijssen@gmail.com
wrote:
You write about your fear, uncertainty and doubt .. Why have us waste time on it? Do something useful. Thanks,
I, for one, think that the mail Pete sent (both in content and tone) is perfectly fine and helpful. I don't know if I share his concerns about WMF plans for Wikidata, but I perfectly agree on his position regarding Andreas' criticism on Wikidata. A distinction was needed. All in all, I think this thread is useful. M2c.
Aubrey
On 26 January 2016 at 11:24, Magnus Manske magnusmanske@googlemail.com wrote:
On Tue, Jan 26, 2016 at 7:33 AM Pete Forsyth peteforsyth@gmail.com wrote:
(Note: I'm creating a new thread which references several old ones; in
the
most recent, "Profile of Magnus Manske," the conversation has drifted
back
to Wikidata, so that subject line is no longer applicable.)
Andreas Kolbe has argued in multiple threads that Wikidata is
fundamentally
problematic, on the basis that it does not require citations. (Please correct me if I am mistaken about this core premise.)
Every statement on Wikidata /should/ be referenced, unless the statement itself points to a reference (e.g. VIAF, images). However, at the moment, this is not a requirement, as Wikidata is still in a steep growth phase. Over the last few years, many statements were added by bots, which can process e.g. Wikipedia, but would be hard pressed to find the original reference for a statement.
To extend Magnus' point... This is also the case on Wikipedia. Every Wikipedia sentence /should/ be verified to a reliable source, and those without footnotes can be removed. But, it is not a /requirement/ that every statement be verified. In short - 'verifiable not verified' is the minimum standard for inclusion of a sentence in Wikipedia. The ratio of footnotes-to-sentences in Wikipedia articles is on average probably much lower than the ratio of references-to-statements in Wikidata. It's just that we have more easily available /quantitative/ statistics for Wikidata that we do for Wikipedia, which makes it easy for Wikidata-critics to point to the number of un-referenced statements in Wikidata as a simple measure of quality, even though many of them DO meet the "verifiable, even if not yet verified" minimum standard that we accept for "stubs" on Wikipedia.
For example: even in a Feature Article Wikipedia biography, I've never seen a footnote /specifically/ for the fact that the subject is "a human". That reference is implied by other footnotes - citing for the birthdate, or occupation for example. By comparison, in Wikidata, some people seem to be a feeling that statements like "instance of -> human", "gender-> male" need to be given a specific reference before they can be considered reliable. This is even when there are other statements in the same Wikidata item that reference biography-authority control numbers (e.g. VIAF).
Yes, ideally, every statement could be given a reference in Wikidata, but ideally so should every sentence in Wikipedia. In reality we do accept "stub" Wikipedia articles that have 5 sentences and 1 Reliable Source footnote. Furthermore, we also do also have Wikidata properties that are, in effect, "self verifying": like the "VIAF identifier" property - which links to that authority control database, or the "image" property - which links directly to a file on Commons. So, simply counting the number of statements vs. the number of references in those statements on Wikidata and concluding that Wikidata is therefore inherently unreliable is both simplistic and quite misleading.
-Liam
wittylama.com Peace, love & metadata
People keep mentioning VIAF in the context. VIAF is a federated service, using the content of its various repositories--and is therefore no more accurate than they are. For example, a major component in VIAF is the Library of Congress Authority File. That file has always used author or publisher statements as the evidence for birth dates without further verification; in recent years, it has been also using information from WP articles. (I suppose that's an improvement--we at least occasionally look beyond what the person says about himself.)
On Tue, Jan 26, 2016 at 7:38 AM, Liam Wyatt liamwyatt@gmail.com wrote:
On 26 January 2016 at 11:24, Magnus Manske magnusmanske@googlemail.com wrote:
On Tue, Jan 26, 2016 at 7:33 AM Pete Forsyth peteforsyth@gmail.com wrote:
(Note: I'm creating a new thread which references several old ones; in
the
most recent, "Profile of Magnus Manske," the conversation has drifted
back
to Wikidata, so that subject line is no longer applicable.)
Andreas Kolbe has argued in multiple threads that Wikidata is
fundamentally
problematic, on the basis that it does not require citations. (Please correct me if I am mistaken about this core premise.)
Every statement on Wikidata /should/ be referenced, unless the statement itself points to a reference (e.g. VIAF, images). However, at the moment, this is not a requirement, as Wikidata is still in a steep growth phase. Over the last few years, many statements were added by bots, which can process e.g. Wikipedia, but would be hard pressed to find the original reference for a statement.
To extend Magnus' point... This is also the case on Wikipedia. Every Wikipedia sentence /should/ be verified to a reliable source, and those without footnotes can be removed. But, it is not a /requirement/ that every statement be verified. In short - 'verifiable not verified' is the minimum standard for inclusion of a sentence in Wikipedia. The ratio of footnotes-to-sentences in Wikipedia articles is on average probably much lower than the ratio of references-to-statements in Wikidata. It's just that we have more easily available /quantitative/ statistics for Wikidata that we do for Wikipedia, which makes it easy for Wikidata-critics to point to the number of un-referenced statements in Wikidata as a simple measure of quality, even though many of them DO meet the "verifiable, even if not yet verified" minimum standard that we accept for "stubs" on Wikipedia.
For example: even in a Feature Article Wikipedia biography, I've never seen a footnote /specifically/ for the fact that the subject is "a human". That reference is implied by other footnotes - citing for the birthdate, or occupation for example. By comparison, in Wikidata, some people seem to be a feeling that statements like "instance of -> human", "gender-> male" need to be given a specific reference before they can be considered reliable. This is even when there are other statements in the same Wikidata item that reference biography-authority control numbers (e.g. VIAF).
Yes, ideally, every statement could be given a reference in Wikidata, but ideally so should every sentence in Wikipedia. In reality we do accept "stub" Wikipedia articles that have 5 sentences and 1 Reliable Source footnote. Furthermore, we also do also have Wikidata properties that are, in effect, "self verifying": like the "VIAF identifier" property - which links to that authority control database, or the "image" property - which links directly to a file on Commons. So, simply counting the number of statements vs. the number of references in those statements on Wikidata and concluding that Wikidata is therefore inherently unreliable is both simplistic and quite misleading.
-Liam
wittylama.com Peace, love & metadata _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Pete,
On Tue, Jan 26, 2016 at 7:33 AM, Pete Forsyth peteforsyth@gmail.com wrote:
Andreas Kolbe has argued in multiple threads that Wikidata is fundamentally problematic, on the basis that it does not require citations. (Please correct me if I am mistaken about this core premise.) I've found these threads illuminating, and appreciate much of what has been said by all parties.
However, that core premise is problematic. If the possibility of people publishing uncited information were fundamentally problematic, here are several platforms that we would have to consider ethically problematic at the core:
- Wikipedia (which for many years had very loose standards around
citations)
- Wikipediocracy (of which Andreas is a founding member) and all Internet
forums
- All blogs
- YouTube
- The Internet itself
- The printing press
Every one of the platforms listed above created opportunities for people -- even anonymously -- to publish information without a citation. If we are to fault Wikidata on this basis, it would be wrong not to apply the same standard to other platforms.
In many countries, people have a right to free speech: to voice opinions, engage in speculation, and so on. I feel quite certain that we agree that the right to free speech is a good thing to have.
But Wikipedia and Wikidata are not experiments in free speech. They are designed to be reference works.
Wikipedia, in its early days, was faulted by Wikipedians – rightly so – for publishing material that could not be traced to professionally published sources, including much material that was plain wrong (crank theories etc.). That was considered unacceptable for a reference work; hence the requirement for references, the no-original-research rule, and all the rest of it.
I'm addressing this now, because I think it is becoming problematic to
paint Wikidata as a flawed project with a broad brush. Wikidata is an experiment, and it will surely lead to flawed information in some instances. But I think it would be a big problem to draw the conclusion that Wikidata is problematic overall.
Perhaps we can agree that reliable sources are a useful part of a crowdsourced reference project. The more citations Wikidata contains, the more useful it will be. Citations make data provenance transparent to the end user. They enable end users to verify, judge and correct the information they're given, if they so desire.
Data provenance is all the more important if Wikidata content comes to be spread far and wide, as seems possible, given major search engines' involvement.
In my opinion, Wikidata's CC-0 licence undermines that, because it allows re-users to cut the chain between the end user and the data's original source.
That said, it is becoming ever more clear that the Wikimedia Foundation has
developed big plans that involve Wikidata; and those big plans are not open to scrutiny.
THAT, I believe, is a problem.
I agree with you that there appears to be an undue amount of secrecy.
Jimmy Wales said[1] over two weeks ago, in response to questions about the Knight Foundation's Knowledge Engine grant, in the context of ousted board member James Heilman's complaints about a lack of transparency,
--------------------
"What sort of details do you want? I'll have to talk to others to make sure there are no contractural reasons not to do so, but in my opinion the grant letter should be published on meta. The Knight Grant is a red herring here, so it would be best to clear the air around that completely as soon as possible."
--------------------
That sounded reassuring. But to date neither the Knight Foundation grant letter nor the Foundation's grant application have been published on Meta.
The fact that nothing has happened following Jimmy Wales' statement has been discussed in the Wikipedia Weekly Facebook group. As you probably know, Jimmy Wales said there yesterday,
--------------------
"Assurances"? Please don't make things up out of thin air. I've expressed my opinion, but contrary to some people's fantasies, me expressing an opinion doesn't have the force of law.
--------------------
In the same discussion, a WMF staffer said last week that WMF staff would be delighted to publish that documentation, but haven't been given leave to do so.
That sounds to me like there is a continued intent to withhold the documentation of this restricted grant from public view. I believe that is a mistake.
If there is nothing objectionable in it, publication now will stop the rumour mill. If there is something objectionable in it, then it is better for that to come to light now, rather than six months or a year down the line.
Wikidata is not a problem; but it is something that could be leveraged in problematic ways (and/or highly beneficial ways).
I feel it is very important that we start looking at these issues from that perspective.
I agree. Thank you for raising the issue.
Andreas
[1] https://en.wikipedia.org/w/index.php?title=User_talk%3AJimbo_Wales&diff=...
Hi Andreas,
2016-01-26 13:17 GMT+01:00 Andreas Kolbe jayen466@gmail.com:
In my opinion, Wikidata's CC-0 licence undermines that, because it allows re-users to cut the chain between the end user and the data's original source.
If I understand, you are concerned about verifiability of information in Wikidata. What is completely unclear to me is why you are mixing verifiability and copyright or, in other words, why you think that you can solve the problem of verifiability with copyright.
TL;DR Licenses are for copyright, not verifiability. Using a different license will not solve your verifiability problems.
# Is CC-BY for Wikidata a good idea?
CC-0 or CC-BY (or any license) are based on copyright law. Broadly speaking (but IANAL), "facts" are not copyrightable because they lack originality which is one of the conditions required by copyright law. In this sense, no single statement that you find on Wikidata (e.g. Barack Obama was born on 4 August 1961) is copyrightable.
For collections of facts (i.e. datasets) the situation is much less clear and it is not easy to decide if collection of data/facts are copyrightable at all. The doctrine of the "Sweat of the Brow" [1a][1b] indeed the originality requirement is relaxed and the fact that "skill and labour" was put in creating a collection of data is sufficient to give rise to copyright. This view has been recently rejected in some court cases by the European Court of Justice (see Football Dataco & others v. Yahoo UK ! [2a][2b]) ruling that it is not sufficient to say that putting together a collection of facts required some sort of effort (even quantifiable in monetary terms) to give rise to copyright. In Football Dataco v. Yahoo the dataset consisted in sports event results, but the same applies also to other contexts such as the digitization of (public domain) photographs or OCR of (public domain) texts.
As a Wikimedian, I am more than eager to support the idea that scanned versions of PD photos and texts should remain in the public domain. I do not want to invoke this kind of principle to be able to claim copyright on the Wikidata dataset so to be able to apply the CC-BY license. This is also the position of other projects like Project Gutenberg [3].
On the other hand, in many jurisdictions the moral rights [4] associated with any work, e. g. among other the right of having the paternity of a work attributed, are perpetual and can not be transferred or waived. In fact the CC-0 legal code says: "A Work made available under CC0 may be protected by copyright and related or neighboring rights includ[ing]: moral rights retained by the original author(s) and/or performer(s); database rights; [...]".
So the problem of which is the justification for having Wikidata released under CC-BY remains.
# Licenses and verifiability
Besides the problem above, even if we could use CC-BY and make use of "Sui Generis Database Rights" (see section 4 of CC-BY legal code [5]) I am not sure your verifiability problem would be solved. CC-BY requires the reuser to provide "[...] attribution, in any reasonable manner requested by the Licensor".
This means that I could build a page replicating (part of) Wikidata data, maybe mix them with other sources and the add a link to the bottom of the saying "Data from Wikidata (c) Wikidata contributors CC-BY (+link to the item and item history for author names); source A; source B; ...".
This would completely satisfy the attribution requirement but do little to solve the verifiability problem because, basically, you can not use copyright to force anybody to use a particular design of their website and/or database and maintain the "verifiability chain" for each statement.
To conclude, the verifiability problem is very important for all the projects, but I am very skeptic to the idea that copyright licenses are the means to solve it
C
[1a] https://en.wikipedia.org/wiki/Sweat_of_the_brow [1b] https://meta.wikimedia.org/wiki/Wikilegal/Sweat_of_the_Brow [2a] http://curia.europa.eu/juris/liste.jsf?&num=C-604/10 [2b] http://kluwercopyrightblog.com/2012/03/01/football-dataco-skill-and-labour-i... [3] https://www.gutenberg.org/wiki/Gutenberg:No_Sweat_of_the_Brow_Copyright [4] https://en.wikipedia.org/wiki/Moral_rights [5] https://creativecommons.org/licenses/by/4.0/legalcode
wikimedia-l@lists.wikimedia.org