Wikidata December 2015

wikidata@lists.wikimedia.org

56 participants
46 discussions

Re: [Wikidata] [Wikimedia-l] Quality issues

by Andreas Kolbe

Gerard, On Tue, Nov 24, 2015 at 7:15 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote: > Hoi, > To start of, results from the past are no indications of results in the > future. It is the disclaimer insurance companies have to state in all their > adverts in the Netherlands. When you continue and make it a "theological" > issue, you lose me because I am not of this faith, far from it. Wikidata is > its own project and it is utterly dissimilar from Wikipedia.To start of > Wikidata has been a certified success from the start. The improvement it > brought by bringing all interwiki links together is enormous.That alone > should be a pointer that Wikipedia think is not realistic. > These benefits are internal to Wikimedia and a completely separate issue from third-party re-use of Wikidata content as a default reference source, which is the issue of concern here. To continue, people have been importing data into Wikidata from the start. > They are the statements you know and, it was possible to import them from > Wikipedia because of these interwiki links. So when you call for sources, > it is fairly save to assume that those imports are supported by the quality > of the statements of the Wikipedias The quality of three-quarters of the 280+ Wikipedia language versions is about at the level the English Wikipedia had reached in 2002. Even some of the larger Wikipedias have significant problems. The Kazakh Wikipedia for example is controlled by functionaries of an oppressive regime[1], and the Croatian one is reportedly[2] controlled by fascists rewriting history (unless things have improved markedly in the Croatian Wikipedia since that report, which would be news to me). The Azerbaijani Wikipedia seems to have problems as well. The Wikimedia movement has always had an important principle: that all content should be traceable to a "reliable source". Throughout the first decade of this movement and beyond, Wikimedia content has never been considered a reliable source. For example, you can't use a Wikipedia article as a reference in another Wikipedia article. Another important principle has been the disclaimer: pointing out to people that the data is anonymously crowdsourced, and that there is no guarantee of reliability or fitness for use. Both of these principles are now being jettisoned. Wikipedia content is considered a reliable source in Wikidata, and Wikidata content is used as a reliable source by Google, where it appears without any indication of its provenance. This is a reflection of the fact that Wikidata, unlike Wikipedia, comes with a CC0 licence. That decision was, I understand, made by Denny, who is both a Google employee and a WMF board member. The benefit to Google is very clear: this free, unattributed content adds value to Google's search engine result pages, and improves Google's revenue (currently running at about $10 million an hour, much of it from ads). But what is the benefit to the end user? The end user gets information of undisclosed provenance, which is presented to them as authoritative, even though it may be compromised. In what sense is that an improvement for society? To me, the ongoing information revolution is like the 19th century industrial revolution done over. It created whole new categories of abuse, which it took a century to (partly) eliminate. But first, capitalists had a field day, and the people who were screwed were the common folk. Could we not try to learn from history? > and if anything, that is also where > they typically fail because many assumptions at Wikipedia are plain wrong > at Wikidata. For instance a listed building is not the organisation the > building is known for. At Wikidata they each need their own item and > associated statements. > > Wikidata is already a success for other reasons. VIAF no longer links to > Wikipedia but to Wikidata. The biggest benefit of this move is for people > who are not interested in English. Because of this change VIAF links > through Wikidata to all Wikipedias not only en.wp. Consequently people may > find through VIAF Wikipedia articles in their own language through their > library systems. > At the recent Wikiconference USA, a Wikimedia veteran and professional librarian expressed the view to me that * circular referencing between VIAF and Wikidata will create a humongous muddle that nobody will be able to sort out again afterwards, because – unlike wiki mishaps in other topic areas – here it's the most authoritative sources that are being corrupted by circular referencing; * third parties are using Wikimedia content as a *reference standard *when that was never the intention (see above). I've seen German Wikimedians express concerns that quality assurance standards have dropped alarmingly since the project began, with bot users mass-importing unreliable data. > So do not forget about Wikipedia and the lessons learned. These lessons are > important to Wikipedia. However, they do not necessarily apply to Wikidata > particularly when you approach Wikidata as an opportunity to do things in a > different way. Set theory, a branch of mathematics, is exactly what we > need. When we have data at Wikidata of a given quality.. eg 90% and we have > data at another source with a given quality eg 90%, we can compare the two > and find a subset where the two sources do not match. When we curate the > differences, it is highly likely that we improve quality at Wikidata or at > the other source. This sounds like "Let's do it quick and dirty and worry about the problems later". I sometimes get the feeling software engineers just love a programming challenge, because that's where they can hone and display their skills. Dirty data is one of those challenges: all the clever things one can do to clean up the data! There is tremendous optimism about what can be done. But why have bad data in the first place, starting with rubbish and then proving that it can be cleaned up a bit using clever software? The effort will make the engineer look good, sure, but there will always be collateral damage as errors propagate before they are fixed. The engineer's eyes are not typically on the content, but on their software. The content their bots and programs manipulate at times seems almost incidental, something for "others" to worry about – "others" who don't necessarily exist in sufficient numbers to ensure quality. In short, my feeling is that the engineering enthusiasm and expertise applied to Wikidata aren't balanced by a similar level of commitment to scholarship in generating the data, and getting them right first time. We've seen where that approach can lead with Wikipedia. Wikipedia hoaxes and falsehoods find their way into the blogosphere, the media, even the academic literature. The stakes with Wikidata are potentially much higher, because I fear errors in Wikidata stand a good chance of being massively propagated by Google's present and future automated information delivery mechanisms, which are completely opaque. Most internet users aren't even aware to what extent the Google Knowledge Graph relies on anonymously compiled, crowdsourced data; they will just assume that if Google says it, it must be true. In addition to honest mistakes, transcription errors, outdated info etc., the whole thing is a propagandist's wet dream. Anonymous accounts! Guaranteed identity protection! Plausible deniability! No legal liability! Automated import and dissemination without human oversight! Massive impact on public opinion![3] If information is power, then this provides the best chance of a power grab humanity has seen since the invention of the newspaper. In the media landscape, you at least have right-wing, centrist and left-wing publications each presenting their version of the truth, and you know who's publishing what and what agenda they follow. You can pick and choose, compare and contrast, read between the lines. We won't have that online. Wikimedia-fuelled search engines like Google and Bing dominate the information supply. The right to enjoy a pluralist media landscape, populated by players who are accountable to the public, was hard won in centuries past. Some countries still don't enjoy that luxury today. Are we now blithely giving it away, in the name of progress, and for the greater glory of technocrats? I don't trust the way this is going. I see a distinct possibility that we'll end up with false information in Wikidata (or, rather, the Google Knowledge Graph) being used to "correct" accurate information in other sources, just because the Google/Wikidata content is ubiquitous. If you build circular referencing loops fuelled by spurious data, you don't provide access to knowledge, you destroy it. A lie told often enough etc. To quote Heather Ford and Mark Graham, "We know that the engineers and developers, volunteers and passionate technologists are often trying to do their best in difficult circumstances. But there need to be better attempts by people working on these platforms to explain how decisions are made about what is represented. These may just look like unimportant lines of code in some system somewhere, but they have a very real impact on the identities and futures of people who are often far removed from the conversations happening among engineers." I agree with that. The "what" should be more important than the "how", and at present it doesn't seem to be. It's well worth thinking about, and having a debate about what can be done to prevent the worst from happening. In particular, I would like to see the decision to publish Wikidata under a CC0 licence revisited. The public should know where the data it gets comes from; that's a basic issue of transparency. Andreas [1] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed [2] http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-contro… [3] http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-201…

8 years, 5 months

provenance tracking for high volume edit sources (was Data model explanation and protection)

by Benjamin Good

In another thread, we are discussing the preponderance of problematic merges of gene/protein items. One of the hypotheses raised to explain the volume and nature of these merges (which are often by fairly inexperienced editors and/or people that seem to only do merges) was that they were coming from the wikidata game. It seems to me that anything like the wikidata game that has the potential to generate a very large volume of edits - especially from new editors - ought to tag its contributions so that they can easily be tracked by the system. It should be easy to answer the question of whether an edit came from that game (or any of what I hope to be many of its descendants). This will make it possible to debug what could potentially be large swathes of problems and to make it straightforward to 'reward' game/other developers with information about the volume of the edits that they have enabled directly from the system (as opposed to their own tracking data). Please don't misunderstand me. I am a big fan of the wikidata game and actually am pushing for our group to make a bio-specific version of it that will build on that code. I see a great potential here - but because of the potential scale of edits this could quickly generate, we (the whole wikidata community) need ways to keep an eye on what is going on. -Ben

8 years, 5 months

Does a REST services exist that converts Wikipedia url to Wikidata Q id, and the converse?

by james＠j1w.xyz

I'm creating an app that provides Wikidata info in slide-out menus on top of Wikipedia pages. Here's a video of a prototype: https://vimeo.com/146061825 Much of the app will be implemented as REST services in the cloud, and one item of functionality required will be a REST service that returns the Q id given a Wikipedia URL (in any language). Another REST service required will return a Wikipedia URL given a Wikidata Q id and language code (e.g. "en" or "pt-br"). Does anything like this currently exist? Regards, James Weaver http://JavaFXpert.com http://CulturedEar.com

8 years, 5 months

content for wikidata tutorial?

by Benjamin Good

The gene wiki people are hosting a tutorial on wikidata in Cambridge, UK next Monday [1]. In the interest of making the best tutorial in the least amount of preparation time.. I was wondering if anyone on the list had content (slides, handouts, cheatsheets) that they had already used successfully and might want to share? We are assembling the structure of the 90 minute session in a google doc [2], feel free to chime in there ! And of course everything we generate for that will be available online as soon as it exists. cheers -Ben [1] http://www.swat4ls.org/workshops/cambridge2015/programme/tutorials/ [2] https://docs.google.com/document/d/1dSgm90SbQBpHqEMa17t5zQL0PB2waIKD3LKTPPk…

8 years, 5 months

weekly summary #187

by Lydia Pintscher

Hey folks :) Here's your summary for what happened around Wikidata over the past week. Sorry for sending it late. I spent some much needed time with friends over a long weekend in Barcelona. Have a great week and keep rocking! Events <https://www.wikidata.org/wiki/Wikidata:Events>/Press/Blogs <https://www.wikidata.org/wiki/Wikidata:Press_coverage> - Past: Semantic Web Application and Tools 4 Life Science <http://www.swat4ls.org/> - notes <https://docs.google.com/document/d/1dSgm90SbQBpHqEMa17t5zQL0PB2waIKD3LKTPPk…> - example queries <https://bitbucket.org/sulab/wikidatasparqlexamples/src> - a few images <https://twitter.com/bgood/status/673872975024816128> - Ongoing: Wikidata for Beginners session at DISH 2015, Rotterdam <http://www.dish2015.nl/programme/improvisation-session/day-2-lose-control-g…>, 8 December 2015 - Upcoming: Wikidata pour la science <http://www.meetup.com/SemanticCampParis/events/227054374/?eventId=227054374> - Upcoming: Workshop on "Wikidata as a platform for biocuration" at Biocuration 2016 <http://www.isb-sib.ch/events/biocuration2016/workshops#WS6> - Slides for talk "Building the sum of all human citations" <http://www.slideshare.net/dartar/building-the-sum-of-all-human-citations> - Op-ed in Signpost <https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op…> (Lydia is working on a piece to address some of the points) Other Noteworthy Stuff - Closing in towards 1 million links between Commons categories and Wikidata article-type items <https://commons.wikimedia.org/wiki/Commons:Village_pump#Closing_in_towards_…> -- 8,000 more by next week? - Help labeling edits to improve the vandalism detection on Wikidata <https://www.wikidata.org/wiki/Wikidata:Edit_labels> - Open Science Prize <https://www.openscienceprize.org/> is looking for cool ideas for Open Data and health. How about something awesome with Wikidata? - The Individual Engagement Grant for StrepHit <https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…> has been accepted - Are you between 13 and 17? Join us for Google Code-in <https://codein.withgoogle.com> and do some Wikidata tasks with Wikimedia - We want your feedback on how to improve the process of showcasing Wikidata's best content <https://lists.wikimedia.org/pipermail/wikidata/2015-December/007751.html> - I dreamed of a perfect database <https://newrepublic.com/article/124425/dreamed-perfect-database> - Wikidata? ;-) - Histropedia timeline of National Library of Wales <http://histropedia.com/timeline/lhblyhhr091/NLW-Collections> - Wikinews, Wikispecies and MediaWiki now have access to the data on Wikidata. Do awesome things with it, sisters! Meta will follow on 15th. - Map of narrative locations in Denmark <https://fnielsen.github.io/littar/> - Experimental REST API for Wikidata <http://queryr.wmflabs.org/about/> Did you know? - Newest properties <https://www.wikidata.org/wiki/Special:ListProperties>: deprecated in version <https://www.wikidata.org/wiki/Property:P2379>, issued by <https://www.wikidata.org/wiki/Property:P2378>, MediaWiki hooks used <https://www.wikidata.org/wiki/Property:P2377>, superpartner of <https://www.wikidata.org/wiki/Property:P2376>, has superpartner <https://www.wikidata.org/wiki/Property:P2375>, natural abundance <https://www.wikidata.org/wiki/Property:P2374>, Genius artist ID <https://www.wikidata.org/wiki/Property:P2373>, ODIS ID <https://www.wikidata.org/wiki/Property:P2372>, FAO risk status <https://www.wikidata.org/wiki/Property:P2371>, conversion to SI base unit <https://www.wikidata.org/wiki/Property:P2370>, Soccerway player ID <https://www.wikidata.org/wiki/Property:P2369>, Sandbox-Property <https://www.wikidata.org/wiki/Property:P2368> - Newest Database reports: WikiProject Movies/new films <https://www.wikidata.org/wiki/Wikidata:WikiProject_Movies/new_films> - Showcase items <https://www.wikidata.org/wiki/Wikidata:Showcase_items>: Bertus Aafjes <https://www.wikidata.org/wiki/Q353003> Development - Polished the patch that adds icons to the item pages (for actions like edit, remove etc) and makes them cleaner. This will go live later today. - Worked more on a separate section for identifiers - Started working on the sorting of statements on the ArticlePlaceholder pages - Made it possible to use unknown language, no linguistic content and more as languages for the monolingual text datatype - Experimented more with improvements to the ranking on Special:Search - Worked on showing more languages in the in other languages box than the ones defined in your babel boxes - Small improvements to recent changes/watchlist integration on Wikipedia and co - Worked on making it possible to create a redirect over a deleted item - Worked on PHP7 support You can see all open tickets related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R>. Monthly Tasks - Hack on one of these <https://phabricator.wikimedia.org/maniphest/query/R8GRzX1eH0tb/#R>. - Help develop the next summary here! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> - Contribute to a Showcase item <https://www.wikidata.org/wiki/Wikidata:Showcase_items> - Help translate <https://www.wikidata.org/wiki/Special:LanguageStats> or proofread pages in your own language! - Add labels, in your own language(s), for the new properties listed above. Anything to add? Please share! :) Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

8 years, 5 months

Lua error when I use wikidata property in wikipedia

by Ali Ismayilov

Dear all, I create wikipedia articles using wikidata properties. I have not had this problem before. Today, when I create article I got this error: "*Lua error in Module:WikidataCoords at line 44: attempt to call field 'formatProperty' (a nil value)." *(https://az.wikipedia.org/wiki/Aalen) I checked my old articles, this error appeared on them also ( https://az.wikipedia.org/wiki/Boxum). I did not insert property number for the values which Lua gave error. Can you please help me to solve this issue? Otherwise, my community will ban me from our wikipedia page for creating mess. Thanks a lot. -- Best regards, Ali Ismayilov

8 years, 5 months

StrepHit won the IEG grants selection

by Marco Fossati

Dear all, I have no words to say how much I'm happy: StrepHit [1] has been selected as an IEG project [2]!!! I'd like to express my gratitude to all the community members that have provided feedback and endorsements. Thanks, thanks, thanks for believing in the idea. Cheers! -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j [1] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… [2] https://blog.wikimedia.org/2015/12/04/ieg-funds-fourteen-projects/

8 years, 5 months

Help needed to improve anti-vandalism tools

by Amir Ladsgroup

Hello, You may know ORES <https://www.wikidata.org/wiki/Wikidata:ORES>, We use ORES to build anti-vandalism tools (Learn more <https://meta.wikimedia.org/wiki/ORES/What>). Based on automatic revert detection we were able to build an MVP and we have some high quality classifiers online you can use (WD:ORES <https://www.wikidata.org/wiki/Wikidata:ORES>). In order to improve the anti-vandalism classifier we need you to go through some edits and determine whether they are damaging to Wikidata and if they are ill-intended edits or they are just newbies/honest mistakes. This would help us distinguish between newbies and vandals and also improves our data to make precise and adequate vandalism detection classifier. Please go to Wikidata:Edit labels <https://www.wikidata.org/wiki/Wikidata:Edit_labels> install the gadget and do a workset. Thanks Best

8 years, 5 months

REST API for Wikidata

by Jeroen De Dauw

Hey all, I've created a very rough REST API for Wikidata and am looking for your feedback. * About this API: http://queryr.wmflabs.org * Documentation: http://queryr.wmflabs.org/about/docs * API root: http://queryr.wmflabs.org/api At present this is purely a demo. The data it serves is stale and potentially incomplete, the endpoints and formats they use are very much liable to change, the server setup is not reliable and I'm not 100% sure I'll continue with this little project. The main thing I'm going for with this API compared to the existing one is greater ease of use for common use cases. Several factors make this a lot easier to do in a new API than in the existing one: no need the serve all use cases, no need to retain compatibility with existing users and no framework imposed restrictions. You can read more about the difference on the website. You are invited to comment on the concept and on the open questions mentioned on the website. Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate ~=[,,_,,]:3

8 years, 5 months

Fwd: "Wikipedia as the front matter to all research": A brown bag on scholarly citations in Wikipedia this Friday 12/4 @ 12 PT

by Dario Taraborelli

A reminder that this will be streamed today at 9pm CET / 12pm PST We’ll be talking <https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_resear…> about unique identifiers and bibliographic/citation data in general as well as https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData <https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData> You can join the conversation via IRC on #wikimedia-office Dario > Begin forwarded message: > > From: Dario Taraborelli <dtaraborelli(a)wikimedia.org <mailto:dtaraborelli@wikimedia.org>> > Date: December 2, 2015 at 11:01:51 AM PST > To: wikimedia-l(a)lists.wikimedia.org <mailto:wikimedia-l@lists.wikimedia.org>, Research into Wikimedia content and communities <wiki-research-l(a)lists.wikimedia.org <mailto:wiki-research-l@lists.wikimedia.org>> > Subject: "Wikipedia as the front matter to all research": A brown bag on scholarly citations in Wikipedia this Friday 12/4 @ 12 PT > > Come and join us for a brown bag this Friday December 4 at 12 PT to learn about unique identifiers and scholarly citations in Wikipedia, why they matter and how we can bridge the gap between the Wikimedia, research and librarian communities. > > Wikipedia as the front matter to all research > > YouTube stream: http://www.youtube.com/watch?v=mB_oexqz8pA <http://www.youtube.com/watch?v=mB_oexqz8pA> > Event information on Meta: https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_resear… <https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_resear…> > > Measuring citizen engagement with the scholarly literature through Wikipedia citations. > Geoffrey Bilder, CrossRef > > Wikipedia (in toto) is probably the 5th largest referrer of citations to the scholarly literature. That is, more Wikipedia users click on and follow citations to the scholarly literature *from* Wikipedia domains than from any single scholarly publisher in the world. What does this tell us about general interest in the scholarly literature? What does this tell us about scholarly engagement with editing Wikipedia articles? The short answer is “we don’t know.” But we are actively working with Wikimedia to find out. > > Building the sum of all human citations > Dario Taraborelli, WIkimedia Foundation > > As sourcing and verifiability of online information are threatened <http://www.slideshare.net/dartar/citing-as-a-public-service-building-the-su…> by the explosion of answer engines and the changing habits of web users, Wikimedia has an outstanding opportunity to extract and store source data for every conceivable statement and make it transparently verifiable by its users. In this talk, I’ll present a grassroots effort <https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData> to create a human-curated, comprehensive repository of all human citations in Wikidata. > > ––––––––––––– > Bonus read: a real-time tracker of scholarly citations added to Wikipedia, built with Raspberry Pi > http://blog.crossref.org/2015/12/crossref-labs-plays-with-the-raspberry-pi-… <http://blog.crossref.org/2015/12/crossref-labs-plays-with-the-raspberry-pi-…> > > > > Dario Taraborelli Head of Research, Wikimedia Foundation > wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org <http://nitens.org/> • @readermeter <http://twitter.com/readermeter> Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org <http://nitens.org/> • @readermeter <http://twitter.com/readermeter>

8 years, 5 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata December 2015