Gerard,
On Tue, Nov 24, 2015 at 7:15 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, To start of, results from the past are no indications of results in the future. It is the disclaimer insurance companies have to state in all their adverts in the Netherlands. When you continue and make it a "theological" issue, you lose me because I am not of this faith, far from it. Wikidata is its own project and it is utterly dissimilar from Wikipedia.To start of Wikidata has been a certified success from the start. The improvement it brought by bringing all interwiki links together is enormous.That alone should be a pointer that Wikipedia think is not realistic.
These benefits are internal to Wikimedia and a completely separate issue from third-party re-use of Wikidata content as a default reference source, which is the issue of concern here.
To continue, people have been importing data into Wikidata from the start.
They are the statements you know and, it was possible to import them from Wikipedia because of these interwiki links. So when you call for sources, it is fairly save to assume that those imports are supported by the quality of the statements of the Wikipedias
The quality of three-quarters of the 280+ Wikipedia language versions is about at the level the English Wikipedia had reached in 2002.
Even some of the larger Wikipedias have significant problems. The Kazakh Wikipedia for example is controlled by functionaries of an oppressive regime[1], and the Croatian one is reportedly[2] controlled by fascists rewriting history (unless things have improved markedly in the Croatian Wikipedia since that report, which would be news to me). The Azerbaijani Wikipedia seems to have problems as well.
The Wikimedia movement has always had an important principle: that all content should be traceable to a "reliable source". Throughout the first decade of this movement and beyond, Wikimedia content has never been considered a reliable source. For example, you can't use a Wikipedia article as a reference in another Wikipedia article.
Another important principle has been the disclaimer: pointing out to people that the data is anonymously crowdsourced, and that there is no guarantee of reliability or fitness for use.
Both of these principles are now being jettisoned.
Wikipedia content is considered a reliable source in Wikidata, and Wikidata content is used as a reliable source by Google, where it appears without any indication of its provenance. This is a reflection of the fact that Wikidata, unlike Wikipedia, comes with a CC0 licence. That decision was, I understand, made by Denny, who is both a Google employee and a WMF board member.
The benefit to Google is very clear: this free, unattributed content adds value to Google's search engine result pages, and improves Google's revenue (currently running at about $10 million an hour, much of it from ads).
But what is the benefit to the end user? The end user gets information of undisclosed provenance, which is presented to them as authoritative, even though it may be compromised. In what sense is that an improvement for society?
To me, the ongoing information revolution is like the 19th century industrial revolution done over. It created whole new categories of abuse, which it took a century to (partly) eliminate. But first, capitalists had a field day, and the people who were screwed were the common folk. Could we not try to learn from history?
and if anything, that is also where they typically fail because many assumptions at Wikipedia are plain wrong at Wikidata. For instance a listed building is not the organisation the building is known for. At Wikidata they each need their own item and associated statements.
Wikidata is already a success for other reasons. VIAF no longer links to Wikipedia but to Wikidata. The biggest benefit of this move is for people who are not interested in English. Because of this change VIAF links through Wikidata to all Wikipedias not only en.wp. Consequently people may find through VIAF Wikipedia articles in their own language through their library systems.
At the recent Wikiconference USA, a Wikimedia veteran and professional librarian expressed the view to me that
* circular referencing between VIAF and Wikidata will create a humongous muddle that nobody will be able to sort out again afterwards, because – unlike wiki mishaps in other topic areas – here it's the most authoritative sources that are being corrupted by circular referencing;
* third parties are using Wikimedia content as a *reference standard *when that was never the intention (see above).
I've seen German Wikimedians express concerns that quality assurance standards have dropped alarmingly since the project began, with bot users mass-importing unreliable data.
So do not forget about Wikipedia and the lessons learned. These lessons are important to Wikipedia. However, they do not necessarily apply to Wikidata particularly when you approach Wikidata as an opportunity to do things in a different way. Set theory, a branch of mathematics, is exactly what we need. When we have data at Wikidata of a given quality.. eg 90% and we have data at another source with a given quality eg 90%, we can compare the two and find a subset where the two sources do not match. When we curate the differences, it is highly likely that we improve quality at Wikidata or at the other source.
This sounds like "Let's do it quick and dirty and worry about the problems later".
I sometimes get the feeling software engineers just love a programming challenge, because that's where they can hone and display their skills. Dirty data is one of those challenges: all the clever things one can do to clean up the data! There is tremendous optimism about what can be done. But why have bad data in the first place, starting with rubbish and then proving that it can be cleaned up a bit using clever software?
The effort will make the engineer look good, sure, but there will always be collateral damage as errors propagate before they are fixed. The engineer's eyes are not typically on the content, but on their software. The content their bots and programs manipulate at times seems almost incidental, something for "others" to worry about – "others" who don't necessarily exist in sufficient numbers to ensure quality.
In short, my feeling is that the engineering enthusiasm and expertise applied to Wikidata aren't balanced by a similar level of commitment to scholarship in generating the data, and getting them right first time.
We've seen where that approach can lead with Wikipedia. Wikipedia hoaxes and falsehoods find their way into the blogosphere, the media, even the academic literature. The stakes with Wikidata are potentially much higher, because I fear errors in Wikidata stand a good chance of being massively propagated by Google's present and future automated information delivery mechanisms, which are completely opaque. Most internet users aren't even aware to what extent the Google Knowledge Graph relies on anonymously compiled, crowdsourced data; they will just assume that if Google says it, it must be true.
In addition to honest mistakes, transcription errors, outdated info etc., the whole thing is a propagandist's wet dream. Anonymous accounts! Guaranteed identity protection! Plausible deniability! No legal liability! Automated import and dissemination without human oversight! Massive impact on public opinion![3]
If information is power, then this provides the best chance of a power grab humanity has seen since the invention of the newspaper. In the media landscape, you at least have right-wing, centrist and left-wing publications each presenting their version of the truth, and you know who's publishing what and what agenda they follow. You can pick and choose, compare and contrast, read between the lines. We won't have that online. Wikimedia-fuelled search engines like Google and Bing dominate the information supply.
The right to enjoy a pluralist media landscape, populated by players who are accountable to the public, was hard won in centuries past. Some countries still don't enjoy that luxury today. Are we now blithely giving it away, in the name of progress, and for the greater glory of technocrats?
I don't trust the way this is going. I see a distinct possibility that we'll end up with false information in Wikidata (or, rather, the Google Knowledge Graph) being used to "correct" accurate information in other sources, just because the Google/Wikidata content is ubiquitous. If you build circular referencing loops fuelled by spurious data, you don't provide access to knowledge, you destroy it. A lie told often enough etc.
To quote Heather Ford and Mark Graham, "We know that the engineers and developers, volunteers and passionate technologists are often trying to do their best in difficult circumstances. But there need to be better attempts by people working on these platforms to explain how decisions are made about what is represented. These may just look like unimportant lines of code in some system somewhere, but they have a very real impact on the identities and futures of people who are often far removed from the conversations happening among engineers."
I agree with that. The "what" should be more important than the "how", and at present it doesn't seem to be.
It's well worth thinking about, and having a debate about what can be done to prevent the worst from happening.
In particular, I would like to see the decision to publish Wikidata under a CC0 licence revisited. The public should know where the data it gets comes from; that's a basic issue of transparency.
Andreas
[1] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed [2] http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-controv... [3] http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-2016...
2015-11-27 12:08 GMT+01:00 Andreas Kolbe jayen466@gmail.com:
just because the Google/Wikidata content is ubiquitous. If you build circular referencing loops fuelled by spurious data, you don't provide access to knowledge, you destroy it. A lie told often enough etc.
You're implying that "Google" can be showed as a reliable source in Wikidata ? It's never the case. Even databases such as VIAF are not really to see as really reliable ...
[I continue cross-posting for this reply, but it would make sense to return the thread to the Wikidata list where it started, so as to avoid partial discussions happening in many places.]
Andreas,
On 27.11.2015 12:08, Andreas Kolbe wrote:
Gerard,
(I should note that my reply has nothing to do with what Gerard said, or to the high-level "quality" debate in this thread.)
[...]
Wikipedia content is considered a reliable source in Wikidata, and Wikidata content is used as a reliable source by Google, where it appears without any indication of its provenance.
This prompted me to reply. I wanted to write an email that merely says:
"Really? Where did you get this from?" (Google using Wikidata content)
But then I read the rest ... so here you go ...
Your email mixes up many things and effects, some of which are important issues (e.g., the fact that VIAF is not a primary data source that should be used in citations). Many other of your remarks I find very hard to take serious, including but not limited to the following:
* A rather bizarre connection between licensing models and accountability (as if it would make content more credible if you are legally required to say that you found it on Wikipedia, or even give a list of user names and IPs who contributed) * Some stories that I think you really just made up for the sake of argument (Denny alone has picked the Wikidata license? Google displays Wikidata content? Bing is fuelled by Wikimedia?) * Some disjointed remarks about the history of capitalism * The assertion that content is worse just because the author who created it used a bot for editing * The idea that engineers want to build systems with bad data because they like the challenge of cleaning it up -- I mean: really! There is nothing one can even say to this. * The complaint that Wikimedia employs too much engineering expertise and too little content expertise (when, in reality, it is a key principle of Wikimedia to keep out of content, and communities regularly complain WMF would still meddle too much). * All those convincing arguments you make against open, anonymous editing because of it being easy to manipulate (I've heard this from Wikipedia critics ten years ago; wonder what became of them) * And, finally, the culminating conspiracy theory of total control over political opinion, destroying all plurality by allowing only one viewpoint (not exactly what I observe on the Web ...) -- and topping this by blaming it all on the choice of a particular Creative Commons license for Wikidata! Really, you can't make this up.
Summing up: either this is an elaborate satire that tries to test how serious an answer you will get on a Wikimedia list, or you should *seriously* rethink what you wrote here, take back the things that are obviously bogus, and have a down-to-earth discussion about the topics you really care about (licenses and cyclic sourcing on Wikimedia projects, I guess; "capitalist companies controlling public media" should be discussed in another forum).
Kind regards,
Markus
I for one had some discussions with Denny about licensing, and even if it hurt my feelings to say this (at least two of them) he was right. Facts can't be copyrighted and because of that CC0 is the natural choice for data in the database.
Still in Europe databases can be given a protection, and that can limit the access to the site. By using the CC0 license on the whole thing reuse are much easier.
Database protection and copyright is different issues and should not be mixed.
John
On Wed, Dec 2, 2015 at 12:43 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
[I continue cross-posting for this reply, but it would make sense to return the thread to the Wikidata list where it started, so as to avoid partial discussions happening in many places.]
Andreas,
On 27.11.2015 12:08, Andreas Kolbe wrote:
Gerard,
(I should note that my reply has nothing to do with what Gerard said, or to the high-level "quality" debate in this thread.)
[...]
Wikipedia content is considered a reliable source in Wikidata, and
Wikidata content is used as a reliable source by Google, where it appears without any indication of its provenance.
This prompted me to reply. I wanted to write an email that merely says:
"Really? Where did you get this from?" (Google using Wikidata content)
But then I read the rest ... so here you go ...
Your email mixes up many things and effects, some of which are important issues (e.g., the fact that VIAF is not a primary data source that should be used in citations). Many other of your remarks I find very hard to take serious, including but not limited to the following:
- A rather bizarre connection between licensing models and accountability
(as if it would make content more credible if you are legally required to say that you found it on Wikipedia, or even give a list of user names and IPs who contributed)
- Some stories that I think you really just made up for the sake of
argument (Denny alone has picked the Wikidata license? Google displays Wikidata content? Bing is fuelled by Wikimedia?)
- Some disjointed remarks about the history of capitalism
- The assertion that content is worse just because the author who created
it used a bot for editing
- The idea that engineers want to build systems with bad data because they
like the challenge of cleaning it up -- I mean: really! There is nothing one can even say to this.
- The complaint that Wikimedia employs too much engineering expertise and
too little content expertise (when, in reality, it is a key principle of Wikimedia to keep out of content, and communities regularly complain WMF would still meddle too much).
- All those convincing arguments you make against open, anonymous editing
because of it being easy to manipulate (I've heard this from Wikipedia critics ten years ago; wonder what became of them)
- And, finally, the culminating conspiracy theory of total control over
political opinion, destroying all plurality by allowing only one viewpoint (not exactly what I observe on the Web ...) -- and topping this by blaming it all on the choice of a particular Creative Commons license for Wikidata! Really, you can't make this up.
Summing up: either this is an elaborate satire that tries to test how serious an answer you will get on a Wikimedia list, or you should *seriously* rethink what you wrote here, take back the things that are obviously bogus, and have a down-to-earth discussion about the topics you really care about (licenses and cyclic sourcing on Wikimedia projects, I guess; "capitalist companies controlling public media" should be discussed in another forum).
Kind regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Quality is a lot of possible metrics, and coherence between content and sources is just one of them. If you expose your sources it is possible to check if they are coherent with your own claim, and that is a very effective measure to stop propagation of false claims. If you don't give any sources you may propagate an error, possibly introduced by some evil regime.
If you are afraid of false claims, start to give references on your content both on Wikipedia, Wikidata, and whatever site your on.
On Wed, Dec 2, 2015 at 1:03 AM, John Erling Blad jeblad@gmail.com wrote:
I for one had some discussions with Denny about licensing, and even if it hurt my feelings to say this (at least two of them) he was right. Facts can't be copyrighted and because of that CC0 is the natural choice for data in the database.
Still in Europe databases can be given a protection, and that can limit the access to the site. By using the CC0 license on the whole thing reuse are much easier.
Database protection and copyright is different issues and should not be mixed.
John
On Wed, Dec 2, 2015 at 12:43 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
[I continue cross-posting for this reply, but it would make sense to return the thread to the Wikidata list where it started, so as to avoid partial discussions happening in many places.]
Andreas,
On 27.11.2015 12:08, Andreas Kolbe wrote:
Gerard,
(I should note that my reply has nothing to do with what Gerard said, or to the high-level "quality" debate in this thread.)
[...]
Wikipedia content is considered a reliable source in Wikidata, and
Wikidata content is used as a reliable source by Google, where it appears without any indication of its provenance.
This prompted me to reply. I wanted to write an email that merely says:
"Really? Where did you get this from?" (Google using Wikidata content)
But then I read the rest ... so here you go ...
Your email mixes up many things and effects, some of which are important issues (e.g., the fact that VIAF is not a primary data source that should be used in citations). Many other of your remarks I find very hard to take serious, including but not limited to the following:
- A rather bizarre connection between licensing models and accountability
(as if it would make content more credible if you are legally required to say that you found it on Wikipedia, or even give a list of user names and IPs who contributed)
- Some stories that I think you really just made up for the sake of
argument (Denny alone has picked the Wikidata license? Google displays Wikidata content? Bing is fuelled by Wikimedia?)
- Some disjointed remarks about the history of capitalism
- The assertion that content is worse just because the author who created
it used a bot for editing
- The idea that engineers want to build systems with bad data because
they like the challenge of cleaning it up -- I mean: really! There is nothing one can even say to this.
- The complaint that Wikimedia employs too much engineering expertise and
too little content expertise (when, in reality, it is a key principle of Wikimedia to keep out of content, and communities regularly complain WMF would still meddle too much).
- All those convincing arguments you make against open, anonymous editing
because of it being easy to manipulate (I've heard this from Wikipedia critics ten years ago; wonder what became of them)
- And, finally, the culminating conspiracy theory of total control over
political opinion, destroying all plurality by allowing only one viewpoint (not exactly what I observe on the Web ...) -- and topping this by blaming it all on the choice of a particular Creative Commons license for Wikidata! Really, you can't make this up.
Summing up: either this is an elaborate satire that tries to test how serious an answer you will get on a Wikimedia list, or you should *seriously* rethink what you wrote here, take back the things that are obviously bogus, and have a down-to-earth discussion about the topics you really care about (licenses and cyclic sourcing on Wikimedia projects, I guess; "capitalist companies controlling public media" should be discussed in another forum).
Kind regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Markus,
On 1 December 2015 at 23:43, Markus Krötzsch <markus at semantic-mediawiki.org> <wikidata%40lists.wikimedia.org?Subject=Re%3A%20%5BWikidata%5D%20%5BWikimedia-l%5D%20Quality%20issues&In-Reply-To=%3C565E30AB.6000709%40semantic-mediawiki.org%3E> wrote:
[I continue cross-posting for this reply, but it would make sense to return the thread to the Wikidata list where it started, so as to avoid partial discussions happening in many places.]
Apologies for the late reply.
While you indicated that you had crossposted this reply to Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after Atlasowa pointed it out on the Signpost op-ed's talk page.[1]
On 27.11.2015 12:08, Andreas Kolbe wrote:
- Wikipedia content is considered a reliable source in Wikidata, and
*> >* Wikidata content is used as a reliable source by Google, where it *> >* appears without any indication of its provenance.*
This prompted me to reply. I wanted to write an email that merely says: >
"Really? Where did you get this from?" (Google using Wikidata content)
Multiple sources, including what appears to be your own research group's writing:[2]
---o0o---
In December 2013, Google announced that their own collaboratively edited knowledge base, Freebase, is to be discontinued in favour of Wikidata, which gives Wikidata a prominent role as an in[p]ut for Google Knowledge Graph. The research group Knowledge Systems https://ddll.inf.tu-dresden.de/web/Knowledge_Systems/en is working in close cooperation with the development team behind Wikidata, and provides, e.g., the regular Wikidata RDF-Exports.
---o0o---
But then I read the rest ... so here you go ...
Your email mixes up many things and effects, some of which are important issues (e.g., the fact that VIAF is not a primary data source that should be used in citations). Many other of your remarks I find very hard to take serious, including but not limited to the following:
- A rather bizarre connection between licensing models and
accountability (as if it would make content more credible if you are legally required to say that you found it on Wikipedia, or even give a list of user names and IPs who contributed)
Both Freebase and Wikipedia have attribution licences. When Bing's Snapshot displays information drawn from Freebase or Wikipedia, it's indicated thus at the bottom of the infobox[3]:
---o0o---
Data from Freebase · Wikipedia
---o0o---
I take this as a token gesture to these sources' attribution licences.
Given the amount of space they have available, I would think most people would agree that this form of attribution is sufficient. You couldn't possibly expect them to list all contributors who have ever contributed to the lead of the Wikipedia article, for example, as the letter of the licence might require.
However, I think it's proper and important that those minimal attributions are there. And given Wikidata's CC0 licence, I don't expect re-users to continue attributing in this manner. This view is shared by Max Klein for example, who is quoted to that effect in the Signpost op-ed.[4]
- Some stories that I think you really just made up for the sake of > argument (Denny alone has picked the Wikidata license?
Denny led the development team. There are multiple public instances and accounts of his having advocated this choice and convinced people of the wisdom of it, in Wikidata talk pages and elsewhere, including a recent post on the Wikidata mailing list.[5]
Interestingly, he originally said that this would mean there could be no imports from Wikipedia, and that there was in fact no intention to import data from Wikipedias (see op-ed).[6] He also said, higher up on that page, that this was "for starters", and that that decision could easily be changed later on by the community.[7]
Google displays Wikidata content?
See above. If Wikidata plays "a prominent role as an in[p]ut for Google Knowledge Graph" then I would expect there to be correspondences between Knowledge Graph and Wikidata content.
Bing is fuelled by Wikimedia?)
I spoke of "Wikimedia-fuelled search engines like Google and Bing" in the context of the Google Knowledge Graph and Bing's Snapshot/Satori equivalent.
We all know that in both cases, much of the content Google and Bing display in these infoboxes comes from Wikimedia projects (Wikipedia, Commons and now, apparently, Wikidata).
- Some disjointed remarks about the history of capitalism> * The assertion that content is worse just because the author who > created it used a bot for editing
I spoke of "bot users mass-importing unreliable data". It's not the bot method that makes the data unreliable: they are unreliable to begin with (because they are unsourced, nobody verifies the source, etc.).
As I pointed out in this week's op-ed, of the top fifteen hoaxes in the English Wikipedia, six have active Wikidata items (or rather, had: they were deleted this morning, after the op-ed appeared).
This is what I mean by unreliable data.
- The idea that engineers want to build systems with bad data because > they like the challenge of cleaning it up -- I mean: really! There is > nothing one can even say to this.
Again, this is not quite what I was trying to convey. My impression is that the current community effort at Wikidata emphasises speed: hence the mass imports of data from Wikipedia, whether verifiable or not, contrary to original intentions, as represented by Denny's quote above.
As far as I can make out, present-day thinking among many Wikidatans is: let's get lots of data in fast even though we know some of it will be bad. Afterwards, we can then apply clever methods to check for inconsistencies and clean our data up -- which is a challenge people do seem to warm to. Meanwhile, others throw up their arms in dismay and say, "Stop! You're importing bad data."
Wouldn't you agree that this characterises some of the recent discussions on the Wikidata Project Chat page?
The two camps seem approximately evenly represented in the discussions I've seen. But while the one camp says "Stop!", the other camp continues importing. So in practice, the importers are getting their way.
- The complaint that Wikimedia employs too much engineering expertise > and too little content expertise (when, in reality, it is a key > principle of Wikimedia to keep out of content, and communities regularly > complain WMF would still meddle too much).
Is it not obvious that I was talking about community practices rather than the actions of Wikimedia staff?
- All those convincing arguments you make against open, anonymous > editing because of it being easy to manipulate (I've heard this from > Wikipedia critics ten years ago; wonder what became of them)
Such criticisms are still regularly levelled at Wikipedia, in top-quality publications. If you really want, I can send you a literature list, but you could begin with this article in Newsweek.[6]
- And, finally, the culminating conspiracy theory of total control over > political opinion, destroying all plurality by allowing only one > viewpoint (not exactly what I observe on the Web ...) -- and topping > this by blaming it all on the choice of a particular Creative Commons > license for Wikidata! Really, you can't make this up.
The information provided by default to billions of search engine users *matters*. You can never prevent an individual from going to a website that espouses a different view, but you don't have to for that information to have a measurable effect.
Robert Epstein and Ronald E. Robertson recently published a paper on what they called "The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections".[9] It provides further detail.
Summing up: either this is an elaborate satire that tries to test how > serious an answer you will get on a Wikimedia list, or you should > *seriously* rethink what you wrote here, take back the things that are > obviously bogus, and have a down-to-earth discussion about the topics > you really care about (licenses and cyclic sourcing on Wikimedia > projects, I guess; "capitalist companies controlling public media" > should be discussed in another forum).
No satire was intended. I hope I have succeeded in making my points clearer.
Regards,
Andreas
[1] https://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2015-12-02/O... [2] https://ddll.inf.tu-dresden.de/web/Wikidata/en [3] http://www.bing.com/search?q=jerusalem&go=Submit&qs=n&form=QBLH&... [4] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed [5] https://lists.wikimedia.org/pipermail/wikidata/2015-December/007769.html [6] https://archive.is/ZbV5A#selection-2997.0-3009.26 [7] https://archive.is/ZbV5A#selection-2755.308-2763.27 [8] http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-busi... [9] http://www.pnas.org/content/112/33/E4512.abstract
On 08.12.2015 00:02, Andreas Kolbe wrote:
Hi Markus,
...
Apologies for the late reply.
While you indicated that you had crossposted this reply to Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after Atlasowa pointed it out on the Signpost op-ed's talk page.[1]
Yes, we have too many communication channels. Let me only reply briefly now, to the first point:
This prompted me to reply. I wanted to write an email that merely
says: > "Really? Where did you get this from?" (Google using Wikidata content)
Multiple sources, including what appears to be your own research group's writing:[2]
What this page suggested was that that Freebase being shutdown means that Google will use Wikidata as a source. Note that the short intro text on the page did not say anything else about the subject, so I am surprised that this sufficed to convince you about the truth of that claim (it seems that other things I write with more support don't have this effect). Anyway, I am really sorry to hear that this quickly-written intro on the web has misled you. When I wrote this after Google had made their Freebase announcement last year, I really believed that this was the obvious implication. However, I was jumping to conclusions there without having first-hand evidence. I guess many people did the same. I fixed the statement now.
To be clear: I am not saying that Google is not using Wikidata. I just don't know. However, if you make a little effort, there is a lot of evidence that Google is not using Wikidata as a source, even when it could. For example, population numbers are off, even in cases where they refer to the same source and time, and Google also shows many statements and sources that are not in Wikidata at all (and not even in Primary Sources).
I still don't see any problem if Google would be using Wikidata, but that's another discussion.
You mention "multiple sources". {{Which}}?
Markus
P.S. Meanwhile, your efforts in other channels are already leading some people to vandalise Wikidata just to make a point [1].
Markus
[1] http://forums.theregister.co.uk/forum/1/2015/12/08/wikidata_special_report/
On 09.12.2015 11:32, Markus Krötzsch wrote:
On 08.12.2015 00:02, Andreas Kolbe wrote:
Hi Markus,
...
Apologies for the late reply.
While you indicated that you had crossposted this reply to Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after Atlasowa pointed it out on the Signpost op-ed's talk page.[1]
Yes, we have too many communication channels. Let me only reply briefly now, to the first point:
This prompted me to reply. I wanted to write an email that merely
says: > "Really? Where did you get this from?" (Google using Wikidata content)
Multiple sources, including what appears to be your own research group's writing:[2]
What this page suggested was that that Freebase being shutdown means that Google will use Wikidata as a source. Note that the short intro text on the page did not say anything else about the subject, so I am surprised that this sufficed to convince you about the truth of that claim (it seems that other things I write with more support don't have this effect). Anyway, I am really sorry to hear that this quickly-written intro on the web has misled you. When I wrote this after Google had made their Freebase announcement last year, I really believed that this was the obvious implication. However, I was jumping to conclusions there without having first-hand evidence. I guess many people did the same. I fixed the statement now.
To be clear: I am not saying that Google is not using Wikidata. I just don't know. However, if you make a little effort, there is a lot of evidence that Google is not using Wikidata as a source, even when it could. For example, population numbers are off, even in cases where they refer to the same source and time, and Google also shows many statements and sources that are not in Wikidata at all (and not even in Primary Sources).
I still don't see any problem if Google would be using Wikidata, but that's another discussion.
You mention "multiple sources". {{Which}}?
Markus
Andreas Kolbe have one point,a reference to a Wikipedia article should point to the correct article, and should preferably point to the revision introducing the value. It should be pretty easy to do this for most of the statements...
On Wed, Dec 9, 2015 at 11:35 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
P.S. Meanwhile, your efforts in other channels are already leading some people to vandalise Wikidata just to make a point [1].
Markus
[1] http://forums.theregister.co.uk/forum/1/2015/12/08/wikidata_special_report/
On 09.12.2015 11:32, Markus Krötzsch wrote:
On 08.12.2015 00:02, Andreas Kolbe wrote:
Hi Markus,
...
Apologies for the late reply.
While you indicated that you had crossposted this reply to Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after Atlasowa pointed it out on the Signpost op-ed's talk page.[1]
Yes, we have too many communication channels. Let me only reply briefly now, to the first point:
This prompted me to reply. I wanted to write an email that merely says: > "Really? Where did you get this from?" (Google using Wikidata content)
Multiple sources, including what appears to be your own research group's writing:[2]
What this page suggested was that that Freebase being shutdown means that Google will use Wikidata as a source. Note that the short intro text on the page did not say anything else about the subject, so I am surprised that this sufficed to convince you about the truth of that claim (it seems that other things I write with more support don't have this effect). Anyway, I am really sorry to hear that this quickly-written intro on the web has misled you. When I wrote this after Google had made their Freebase announcement last year, I really believed that this was the obvious implication. However, I was jumping to conclusions there without having first-hand evidence. I guess many people did the same. I fixed the statement now.
To be clear: I am not saying that Google is not using Wikidata. I just don't know. However, if you make a little effort, there is a lot of evidence that Google is not using Wikidata as a source, even when it could. For example, population numbers are off, even in cases where they refer to the same source and time, and Google also shows many statements and sources that are not in Wikidata at all (and not even in Primary Sources).
I still don't see any problem if Google would be using Wikidata, but that's another discussion.
You mention "multiple sources". {{Which}}?
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, If anything that would be the only point. It is a very sad piece of FUD. It is not that easy.. Thanks, GerardM
http://ultimategerardm.blogspot.nl/2015/12/wikipedia-signpost-yeah-right.htm...
On 9 December 2015 at 23:51, John Erling Blad jeblad@gmail.com wrote:
Andreas Kolbe have one point,a reference to a Wikipedia article should point to the correct article, and should preferably point to the revision introducing the value. It should be pretty easy to do this for most of the statements...
On Wed, Dec 9, 2015 at 11:35 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
P.S. Meanwhile, your efforts in other channels are already leading some people to vandalise Wikidata just to make a point [1].
Markus
[1] http://forums.theregister.co.uk/forum/1/2015/12/08/wikidata_special_report/
On 09.12.2015 11:32, Markus Krötzsch wrote:
On 08.12.2015 00:02, Andreas Kolbe wrote:
Hi Markus,
...
Apologies for the late reply.
While you indicated that you had crossposted this reply to Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after Atlasowa pointed it out on the Signpost op-ed's talk page.[1]
Yes, we have too many communication channels. Let me only reply briefly now, to the first point:
This prompted me to reply. I wanted to write an email that merely says: > "Really? Where did you get this from?" (Google using Wikidata content)
Multiple sources, including what appears to be your own research group's writing:[2]
What this page suggested was that that Freebase being shutdown means that Google will use Wikidata as a source. Note that the short intro text on the page did not say anything else about the subject, so I am surprised that this sufficed to convince you about the truth of that claim (it seems that other things I write with more support don't have this effect). Anyway, I am really sorry to hear that this quickly-written intro on the web has misled you. When I wrote this after Google had made their Freebase announcement last year, I really believed that this was the obvious implication. However, I was jumping to conclusions there without having first-hand evidence. I guess many people did the same. I fixed the statement now.
To be clear: I am not saying that Google is not using Wikidata. I just don't know. However, if you make a little effort, there is a lot of evidence that Google is not using Wikidata as a source, even when it could. For example, population numbers are off, even in cases where they refer to the same source and time, and Google also shows many statements and sources that are not in Wikidata at all (and not even in Primary Sources).
I still don't see any problem if Google would be using Wikidata, but that's another discussion.
You mention "multiple sources". {{Which}}?
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata