Wikidata December 2017

wikidata@lists.wikimedia.org

63 participants
40 discussions

Fwd: Returning the favor???
by Andy Mabbett 07 Dec '17

07 Dec '17

[Note cross-posting] I've been asked by the PIDapalooza team, to forward this: #~#~#~#~#~#~#~#~#~#~# PIDs’R’Us and if they’re you, too, please join us for the second PIDapalooza <https://pidapalooza.org/> in Girona, Spain on January 23-24, for a two-day celebration of persistent identifiers. Together, we will do the impossible - make a meeting about persistent identifiers and networked research fun! Brought to you by California Digital Library, Crossref, DataCite, and ORCID, this year’s sessions are organized around eight broad topics: - PID myths - Achieving persistence - PIDs for emerging uses - Legacy PIDs - Bridging worlds - PIDagogy - PID stories - Kinds of persistence The program <https://pidapalooza18.sched.com/> is close to final and there’s something for everyone - from Do Researchers Need to Care about PID Systems? <https://pidapalooza18.sched.com/event/Cwmj/do-researchers-need-to-care-abou…> to Stories from the PID Roadies: Scholix <https://pidapalooza18.sched.com/event/Cwml/stories-from-the-pid-roadies-sch…>; and from The Bollockschain and other PID Hallucinations <https://pidapalooza18.sched.com/event/CwnA/the-bollockschain-and-other-pid-…> to #ResInfoCitizenshipIs? <https://pidapalooza18.sched.com/event/Cwmk/resinfocitizenshipis#> There will also be plenaries by Johanna McEntyre <http://orcid.org/0000-0002-1611-6935> on As a [biologist] I want to [reuse and remix data] so that I can [do my research] <https://pidapalooza18.sched.com/event/CwnI/as-a-biologist-i-want-to-reuse-a…> and Melissa Haendel <https://orcid.org/0000-0001-9114-8737> (title to be confirmed). With more than half the places already booked now’s the time to register <https://www.eventbrite.com/e/pidapalooza-2018-registration-35176831851> - we hope to see you there! -- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

1 0

The WikiCite 2017 annual report is live
by Dario Taraborelli 07 Dec '17

07 Dec '17

We just published the WikiCite 2017 annual report, giving a comprehensive – while definitely not complete – overview of what the community has achieved in the past 12 months in building a structured repository of sources in Wikidata. https://doi.org/10.6084/m9.figshare.5648233 https://twitter.com/wikicite/status/938778592653332480 Thanks to everyone who contributed, to the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, the Science Sandbox initiative at the Simons Foundation for their generous support, and everyone at Wikimedia Austria and WMF who helped getting this off the ground. We’ll be posting soon an update about our plans for 2018. Stay tuned! Dario, on behalf of the WikiCite organizers.

1 0

Problematic SPARQL-FED Query
by Kingsley Idehen 06 Dec '17

06 Dec '17

Hi Everyone, Does anyone know why the SPARQL-FED query at: http://tinyurl.com/ycc2tkp3, is failing? Query Text: PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX schema: <http://schema.org/> PREFIX bd: <http://www.bigdata.com/rdf#> PREFIX psn: <http://www.wikidata.org/prop/statement/value-normalized/> PREFIX pq: <http://www.wikidata.org/prop/qualifier/> PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX dct: <http://purl.org/dc/terms/> SELECT ?item ?dbpediaID ?label ?subjectName ?prominence ?image WHERE { { SELECT ?item ?itemLabel ?coord ?prominence ?layer ?image WHERE { ?item wdt:P31 wd:Q8502. # a mountain ?item wdt:P625 ?coord. ?item wdt:P17 wd:Q39. # in Switzerland ?item wdt:P2660 ?prominence . BIND( IF(?prominence < 1000, "<1000 metres", IF(?prominence < 2000, "1000 - 2000 metres", IF(?prominence < 3000, "2000 - 3000 metres", IF(?prominence < 4000, "3000 - 4000 metres", "> 4000 metres")))) AS ?layer). OPTIONAL {?item wdt:P18 ?image.} SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } LIMIT 200 } SERVICE <http://dbpedia.org/sparql> { SELECT DISTINCT ?dbpediaID ?name ?label ?subjectName WHERE { ?dbpediaID owl:sameAs ?item ; rdfs:label ?label ; dct:subject ?subject. FILTER (LANG(?label) = "en") ?subject rdfs:label ?subjectName . } } } -- Regards, Kingsley Idehen Founder & CEO OpenLink Software (Home Page: http://www.openlinksw.com) Weblogs (Blogs): Legacy Blog: http://www.openlinksw.com/blog/~kidehen/ Blogspot Blog: http://kidehen.blogspot.com Medium Blog: https://medium.com/@kidehen Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

2 1

Wikidata JSON dumps missing or just delayed?
by Ariel Glenn WMF 06 Dec '17

06 Dec '17

This week's run started a little late due to tryig to get a bug fix tested and deployed. We've been having memory issues, see https://phabricator.wikimedia.org/T181385 You should see the json files sometime today or tomorrow though. Ariel

1 0

Wikidata JSON dumps missing or just delayed?
by Gerhard Gonter 06 Dec '17

06 Dec '17

Last weeks Wikidata JSON dump did not appear and this weeks version seems to be late, at least, where I checked [1], [2]. Is there a problem or are dumps now elsewhere? * [1] https://dumps.wikimedia.org/other/wikidata/ * [2] https://dumps.wikimedia.org/other/wikibase/wikidatawiki/ regards, Gerhard Gonter

1 0

Cleaning up bibliographic collections in Wikidata
by Dario Taraborelli 04 Dec '17

04 Dec '17

Hey all, I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata. As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons: - to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) - to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) - to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) - to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5]) While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems. I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications. 1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import). Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query). If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers. Dario [1] http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidataco…

15 15

Wikidata page for data publishers including overview of open data and publishing best practices
by john cummings 04 Dec '17

04 Dec '17

Hi all I'm putting together a page on Wikidata that provides an overview of open data, information on database rights and best practices for publishing open data. I often meet people from organisations who produce data and want it to be reused but have a lower level of data literacy and don't know where to start. Please take a look and see what is missing, feel free to braindump, I can tidy things up :) https://www.wikidata.org/wiki/User:John_Cummings/Publishing_open_data Thanks John

3 2

Weekly Summary #289
by Léa Lacroix 04 Dec '17

04 Dec '17

*Here's your quick overview of what has been happening around Wikidata over the last week.* Events <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Events>/ Press/Blogs <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Press_coverage> - Past: Using the Digital to Engage Archival Radio Collections: Part II (Wikidata Workshop) <https://vimeo.com/album/4856500/video/242765979>, Washington, D.C., November 2, 2017 with Andrew Lih and Alex Stinson - Past: Wikidata Clinic <https://en.wikipedia.org/wiki/Wikipedia:Meetup/DC/Wikidata_Clinic_2017> (slides <https://docs.google.com/presentation/d/1qN14gFf5I24QG8DC5cFt3PJKSvS7tI9Ytvu…>) in Washington, D.C., December 1, 2017, with Andrew Lih and Rosie Stephenson-Goodknight - Past: Wikikonference in Prague with a Wikidata workshop in Czech and a introduction of Wikidata and its community <https://commons.wikimedia.org/wiki/File:Wikiconference_Prague_2017_-_L%C3%A…> - Wikipedia Weekly audio podcast coverage of Wikidata: - Episode 126 - Introduction to Wikidata <http://wikipediaweekly.org/podcast/wikipedia-weekly-126-introduction-to-wik…>, with Andrew Lih and Rob Fernandez - Episode 127 - WikidataCon 2017 roundtable discussion <http://wikipediaweekly.org/podcast/wikipedia-weekly-127-wikidatacon-2017/>, with Andrew Lih, Liam Wyatt, Stacy Allison-Cassin, Rosie Stephenson-Goodknight, Rob Fernandez - Wikidata as authority linking hub: Connecting RePEc and GND researcher identifiers <http://zbw.eu/labs/en/blog/wikidata-as-authority-linking-hub-connecting-rep…> by Joachim Neubert - Importing data into Wikidata - Current challenges and ideas future development <http://histropedia.com/blog/importing-data-wikidata-current-challenges-idea…> by Navino Evans <https://www.wikidata.org/wiki/User:NavinoEvans> - The Wikidata map in November 2017 <https://addshore.com/2017/12/wikidata-map-november-2017/> and what changed during the last four months, by Addshore Other Noteworthy Stuff - You can now vote for your favorite proposals on the Community Wishlist Survey <https://meta.wikimedia.org/wiki/2017_Community_Wishlist_Survey>. The voting phase is open until December 10th. - If you run any functionality on Wikimedia sites that uses queries to the Wikidata Query Service, please add it here <https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Usage> (more information <https://lists.wikimedia.org/pipermail/wikidata-tech/2017-December/001215.ht…> ) - The 600,000,000th edit <https://www.wikidata.org/wiki/Special:Diff/600000000> has been made. - The first content made *specifically* for Wikimedia projects *in space* has been added to Wikidata <https://www.wikidata.org/w/index.php?title=Q17177&type=revision&diff=600549…> (see *Close encounters of the Wikipedia kind <https://blog.wikimedia.org/2017/11/29/astronaut-spoken-voice/>*) - The Aaron Swartz Fellowship <http://www.osaarchivum.org/press-room/announcements/Aaron-Swartz-and-Price-…> at OSA, Budapest, Hungary, is open for applications (deadline Dec. 31). The second focus area of the fellowship <http://www.osaarchivum.org/work-with-us/fellowship/aaron-swartz-fellowship> may be of interest to Wikidata folks. Did you know? - Newest properties <https://www.wikidata.org/wiki/Special:ListProperties>: set designer <https://www.wikidata.org/wiki/Property:P4608>, Swedish Musical Heritage composer ID <https://www.wikidata.org/wiki/Property:P4607>, National Film Board of Canada movie ID <https://www.wikidata.org/wiki/Property:P4606>, South Dakota Sports Hall of Fame ID <https://www.wikidata.org/wiki/Property:P4605>, World Rugby Hall of Fame ID <https://www.wikidata.org/wiki/Property:P4604>, Microsoft Store album ID <https://www.wikidata.org/wiki/Property:P4603>, date of burial or cremation <https://www.wikidata.org/wiki/Property:P4602>, Lives of WWI ID <https://www.wikidata.org/wiki/Property:P4601>, polymer of <https://www.wikidata.org/wiki/Property:P4600>, monomer of <https://www.wikidata.org/wiki/Property:P4599>, FAPESP researcher ID <https://www.wikidata.org/wiki/Property:P4598>, FAPESP institution ID <https://www.wikidata.org/wiki/Property:P4597>, NIOSH Publication Number <https://www.wikidata.org/wiki/Property:P4596>, post town <https://www.wikidata.org/wiki/Property:P4595>, arXiv author ID <https://www.wikidata.org/wiki/Property:P4594>, CPE athlete ID <https://www.wikidata.org/wiki/Property:P4593>, Mountain Project ID <https://www.wikidata.org/wiki/Property:P4592>, National Inventory of Canadian Military Memorials ID <https://www.wikidata.org/wiki/Property:P4591>, Atomic Heritage Foundation ID <https://www.wikidata.org/wiki/Property:P4590>, Dreadnought Project page <https://www.wikidata.org/wiki/Property:P4589>, IWGA athlete ID <https://www.wikidata.org/wiki/Property:P4588>, Argentinian Historic Heritage ID <https://www.wikidata.org/wiki/Property:P4587> - Query examples: - Timeline for the writer Robert Louis Stephenson <https://query.wikidata.org/#%23defaultView%3ATimeline%0ASELECT%20DISTINCT%2…> (source <https://twitter.com/SciHiBlog/status/937270130672795648>) - Movies with Bud Spencer and Terence Hill <https://query.wikidata.org/#%23Movies%20with%20Bud%20Spencer%20and%20Terenc…> (source <https://twitter.com/UselessBread/status/936749708760027136>) - Photographers born before the inception of the earliest photograph <https://query.wikidata.org/#%23%20photographers%20born%20before%20the%20inc…> (source <https://twitter.com/WikidataFacts/status/936592930894221312>) - Newest WikiProjects <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:WikiProjects>: Stolpersteine <https://www.wikidata.org/wiki/Wikidata:WikiProject_Stolpersteine>, Jasmerah <https://www.wikidata.org/wiki/Wikidata:WikiProject_Jasmerah> - Newest database reports: Q5 with identical P18 <https://www.wikidata.org/wiki/Wikidata:WikiProject_Q5/reports/identical_P18> Development - Wikidata will get dedicated database resources, and go read-only for 30 minutes on 9th January 2018 (phabricator:T181645 <https://phabricator.wikimedia.org/T181645>) - There were no RDF dumps last week due to problems generating them, investigation is still going on (phabricator:T181385 <https://phabricator.wikimedia.org/T181385>) - Improved the threshold for ORES on Wikidata (phabricator:T180450 <https://phabricator.wikimedia.org/T180450>) - Working on fixing a regression after a change in MediaWiki core that makes edit links show up on diff pages (phabricator:T181807 <https://phabricator.wikimedia.org/T181807>) - More work on persistent editing of statements on Forms of a Lexeme (specifically phabricator:T180467 <https://phabricator.wikimedia.org/T180467>) - Improved size of the diff that we sent to Wikipedia and co for changes happening on Wikidata. This is one more needed step towards only showing meaningful edits in the watchlists and recent changes there. ( phabricator:T113468 <https://phabricator.wikimedia.org/T113468>) You can see all open tickets related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R>. Monthly Tasks - Add labels, in your own language(s), for the new properties listed above. - Comment on property proposals: all open proposals <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Overview> - Suggested and open tasks <https://www.wikidata.org/wiki/Wikidata:Contribute/Suggested_and_open_tasks> ! - Contribute to a Showcase item <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Showcase_items> . - Help translate <https://www.wikidata.org/wiki/Special:LanguageStats> or proofread the interface and documentation pages, in your own language! - Help merge identical items <https://www.wikidata.org/wiki/User:Pasleim/projectmerge> across Wikimedia projects. - Help write the next summary! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> Cheers, -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0
by Mathieu Stumpf Guntz 03 Dec '17

03 Dec '17

Saluton ĉiuj, I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_…>, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold. Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue. Bellow is a copy/paste of the above linked message: Thank you Lydia Pintscher <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for taking the time to answer. Unfortunately this answer <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0> miss too many important points to solve all concerns which have been raised. Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_o…> advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_…>. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics. Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in /Our strategic direction: Service and //*Equity*/ <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/…>. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_…> of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights <https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights> for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering <https://en.wikipedia.org/wiki/license_laundering>. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as /a way to try to make it harder for big companies to profit from openly available resources/ *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library <https://www.gnu.org/licenses/why-not-lgpl.html> is stated /proprietary software developers have the advantage of money; free software developers need to make advantages for each other/. This might be generalised as /big companies have the advantage of money; free/libre culture contributors need to make advantages for each other/. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to /collect and use different forms of free, trusted knowledge/ that /focus efforts on the knowledge and communities that have been left out by structures of power and privilege/, as stated in /Our strategic direction: Service and Equity/. CC-0 is becoming more and more common. Just like economic inequality <https://en.wikipedia.org/wiki/economic_inequality>. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people. * [1] Wikipedia Signpost 2015, 2nd december <https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op…> * [2] according to the next statement of Lydia Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects. Kun multe da vikiamo, mathieu

22 46

RDF: All vs Truthy
by Laura Morales 03 Dec '17

03 Dec '17

Can somebody please explain (in simple terms) what's the difference between "all" and "truthy" RDF dumps? I've read the explanation available on the wiki [1] but I still don't get it. If I'm just a user of the data, because I want to retrieve information about a particular item and link items with other graphs... what am I missing/leaving-out by using "truthy" instead of "all"? A practical example would be appreciated since it will clarify things, I suppose. [1] https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps

8 12

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata December 2017