Wikidata June 2018

wikidata@lists.wikimedia.org

32 participants
26 discussions

by Gintautas Sulskus

Hi, I have a couple of questions regarding the Wiki Page ID. Does it always stay unique for the page, where the page itself is just a placeholder for any kind of information that might change over time? Consider the following cases: 1. The first time someone creates page "Moon" it is assigned ID=1. If at some point the page is renamed to "The_Moon", the ID=1 remains intact. Is this correct? 2. What if we have page "Moon" with ID=1. Someone creates a second-page "The_Moon" with ID=2. Is it possible that page "Moon" is transformed into a redirect? Then, "Moon" would be redirecting to page "The_Moon"? 3. Is it possible for page "Moon" to become a category "Category:Moon" with the same ID=1? Thanks, Gintas

5 years, 6 months

Wikidata HDT dump

by Laura Morales

Hello everyone, I'd like to ask if Wikidata could please offer a HDT [1] dump along with the already available Turtle dump [2]. HDT is a binary format to store RDF data, which is pretty useful because it can be queried from command line, it can be used as a Jena/Fuseki source, and it also uses orders-of-magnitude less space to store the same data. The problem is that it's very impractical to generate a HDT, because the current implementation requires a lot of RAM processing to convert a file. For Wikidata it will probably require a machine with 100-200GB of RAM. This is unfeasible for me because I don't have such a machine, but if you guys have one to share, I can help setup the rdf2hdt software required to convert Wikidata Turtle to HDT. Thank you. [1] http://www.rdfhdt.org/ [2] https://dumps.wikimedia.org/wikidatawiki/entities/

5 years, 6 months

Machine translation efforts for underserved languages

by Olya Irzak

Dear Wikidata community, We're working on a project called Wikibabel to machine-translate parts of Wikipedia into underserved languages, starting with Swahili. In hopes that some of our ideas can be helpful to machine translation projects, we wrote a blogpost about how we prioritized which pages to translate, and what categories need a human in the loop: https://medium.com/@oirzak/wikibabel-equalizing-information-access-on-a-bud… Rumor has it that the Wikidata community has thought deeply about information access. We'd love your feedback on our work. Please let us know about past / ongoing machine translation related projects so we can learn from & collaborate with them. Best regards, Olya & the Wikibabel crew

5 years, 7 months

Wikidata in the LOD Cloud

by Léa Lacroix

Hello all, Thanks to Lucas who filled the necessary requirements, Wikidata now appears in the LOD cloud graph: http://lod-cloud.net Currently, the graph doesn't display all the actual connections of Wikidata. The only connections that show up are the properties that link to other projects or databases, and having a specific statement on them to link to an RDF endpoint. If you see something missing, you can contribute by adding the statement “formatter URI for RDF resource” on properties where the resource supports RDF (example <https://www.wikidata.org/wiki/Property:P214#P1921>). You can learn more about the procedure to update the graph and a list of the existing and missing datasets here <https://www.wikidata.org/wiki/User:Lucas_Werkmeister_(WMDE)/LOD_Cloud>, Thanks to Lucas and John for making this happening! -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

5 years, 8 months

Re: [Wikidata] [Wikimedia-l] Solve legal uncertainty of Wikidata

by Denny Vrandečić

Rob Speer wrote: > The result of this, by the way, is that commercial entities sell modified > versions of Wikidata with impunity. It undermines the terms of other > resources such as DBPedia, which also contains facts extracted from > Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia > and have to agree to share alike, when they can get similar data from > Wikidata which promises them it's CC-0? The comparison to DBpedia is interesting: the terms for DBpedia state "Attribution in this case means keep DBpedia URIs visible and active through at least one (preferably all) of @href, <link />, or "Link:". If live links are impossible (e.g., when printed on paper), a textual blurb-based attribution is acceptable." http://wiki.dbpedia.org/terms-imprint So according to these terms, when someone displays data from DBpedia, it is entirely sufficient to attribute DBpedia. What that means is that DBpedia follows exactly the same theory as Wikidata: it is OK to extract data from Wikipedia and republish it as your own dataset under your own copyright without requiring attribution to the original source of the extraction. (A bit more problematic might be the fact that DBpedia also republishes whole paragraphs of Text under these terms, but that's another story) My understanding is that all that Wikidata has extracted from Wikipedia is non-copyrightable in the first place and thus republishing it under a different license (or, as in the case of DBpedia for simple triples, with a different attribution) is legally sound. If there is disagreement with that, I would be interested which content exactly is considered to be under copyright and where license has not been followed on Wikidata. For completion: the discussion is going on in parallel on the Wikidata project chat and in Phabricator: https://phabricator.wikimedia.org/T193728#4212728 https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikipedia_and_other_Wik… I would appreciate if we could keep the discussion in a single place. Gnom1 on Phabricator has offered to actually answer legal questions, but we need to come up with the questions that we want to ask. If it should be, for example, as Rob Speer states on the bug, "has the copyright of interwiki links been breached by having them be moved to Wikidata?", I'd be quite happy with that question - if that's the disagreement, let us ask Legal help and see if my understanding or yours is correct. Does this sound like a reasonable question? Or which other question would you like to ask instead? On Thu, May 17, 2018 at 4:15 PM Rob Speer <rob(a)luminoso.com> wrote: > > As always, copyright is predatory. As we can prove that copyright is the > enemy of science and knowledge > > Well, this kind of gets to the heart of the issue, doesn't it. > > I support the Creative Commons license, including the share-alike term, > which requires copyright in order to work, and I've contributed to multiple > Wikimedia projects with the understanding that my work would be protected > by CC-By-SA. > > Wikidata is engaged in a project-wide act of disobedience against CC-By-SA. > I would say that GerardM has provided an excellent summary of the attitude > toward Creative Commons that I've encountered on Wikidata: "it's holding us > back", "it's the enemy", "you can't copyright knowledge", "you can't make > us follow it", etc. > > The result of this, by the way, is that commercial entities sell modified > versions of Wikidata with impunity. It undermines the terms of other > resources such as DBPedia, which also contains facts extracted from > Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia > and have to agree to share alike, when they can get similar data from > Wikidata which promises them it's CC-0? > > On Wed, 16 May 2018 at 21:43 Gerard Meijssen <gerard.meijssen(a)gmail.com> > wrote: > > > Hoi, > > Thank you for the overly broad misrepresentation. As always, copyright is > > predatory. As we can prove that copyright is the enemy of science and > > knowledge we should not be upset that *copyright *is abused we should > > welcome it as it proves the point. Also when we use texts from everywhere > > and rephrase it in Wikipedia articles "we" are not lily white either. > > > > In "them old days" generally we felt that when people would use > Wikipedia, > > it would only serve our purpose; share the sum of all knowledge. I still > > feel really good about that. And, it has been shown that what we do; > > maintain / curate / update that data that it is not easily given to do as > > well as "we" do it. > > > > When we are to be more precise with our copyright, there are a few things > > we could do to make copyright more transparent. When data is to be > uploaded > > (Commons / Wikipedia or Wikidata) we should use a user that is OWNED and > > operated by the copyright holder. The operation may be by proxy and as a > > consequence there is no longer a question about copyright as the > copyright > > holder can do as we wants. This makes any future noises just that, > > annoying. > > > > As to copyright on Wikidata, when you consider copyright using data from > > Wikipedia. The question is: "What Wikipedia" I have copied a lot of data > > from several Wikipedias and believe me, from a quality point of view > there > > is much to be gained by using Wikidata as an instrument for good because > it > > is really strong in identifying friends and false friends. It is superior > > as a tool for disambiguation. > > > > About the copyright on data, the overriding question with data is: do you > > copy data wholesale in Wikidata. That is what a database copyright is > > about. As I wrote on my blog [1], the best data to include is data that > is > > corroborated by the fact that it is present in multiple sources. This > > negates the notion of a single source, it also underscores that much of > the > > data everywhere is replicated a lot. It also underscores, again, the > notion > > that data that is only present in single sources is what needs attention. > > It needs tender loving care, it needs other sources to establish > > credentials. That is in its own right what makes any claim of copyright > > moot. It is in this process that it becomes a "creative" process negating > > the copyright held on databases. > > > > I welcome the attention that is given to copyright in Wikidata. However > our > > attention to copyright is predatory in two ways. It is how can we get > > around existing copyright and how can we protect our own. As argued, > > Wikidata shines when it is used for what it is intended to be; the place > > that brings data, of Wikipedias first and elsewhere second, together to > be > > used as a repository of quality, open and linked data. > > Thanks, > > GerardM > > > > [1] > > > > > https://ultimategerardm.blogspot.nl/2018/05/wikidata-copyright-and-linked-d… > > > > On 11 May 2018 at 23:10, Rob Speer <rob(a)luminoso.com> wrote: > > > > > Wow, thanks for the heads up. When I was getting upset about projects > > that > > > change the license on Wikimedia content and commercialize it, I had no > > idea > > > that Wikidata was providing them the cover to do so. The Creative > Commons > > > violation is coming from inside the house! > > > > > > On Tue, 8 May 2018 at 03:48 mathieu stumpf guntz < > > > psychoslave(a)culture-libre.org> wrote: > > > > > > > Hello everybody, > > > > > > > > There is a phabricator ticket on Solve legal uncertainty of Wikidata > > > > <https://phabricator.wikimedia.org/T193728> that you might be > > interested > > > > to look at and participate in. > > > > > > > > As Denny suggested in the ticket to give it more visibility through > the > > > > discussion on the Wikidata chat > > > > < > > > > https://www.wikidata.org/wiki/Wikidata:Project_chat# > > > Importing_datasets_under_incompatible_licenses>, > > > > > > > > I thought it was interesting to highlight it a bit more. > > > > > > > > Cheers > > > > > > > > _______________________________________________ > > > > Wikimedia-l mailing list, guidelines at: > > > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > > > > https://meta.wikimedia.org/wiki/Wikimedia-l > > > > New messages to: Wikimedia-l(a)lists.wikimedia.org > > > > Unsubscribe: > https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > > _______________________________________________ > > > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ > > > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > > > wiki/Wikimedia-l > > > New messages to: Wikimedia-l(a)lists.wikimedia.org > > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > > https://meta.wikimedia.org/wiki/Wikimedia-l > > New messages to: Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > https://meta.wikimedia.org/wiki/Wikimedia-l > New messages to: Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

5 years, 9 months

Re: [Wikidata] [Wikimedia-l] Solve legal uncertainty of Wikidata

by Info WorldUniversity

Hi Mathieu, Rob, Denny, and Wikidatans, I'm writing to inquire about further Wikidata CC licensing clarifications. Wikidata may be heading to https://creativecommons.org/licenses/by-sa/4.0/ which allows for a) sharing b) adapting and even c) commercially MIT OCW uses, by way of comparison, https://creativecommons.org/licenses/by-nc-sa/4.0/ which allows for a) sharing b) adapting but c) non-commercially At a Wikimedia conference in early 2017, with Lydia and Dario present, I think I learned that all books / WikiCitations in all 301 of Wikipedia languages could be licensed, or heading to be licensed, with CC-0 licensing - https://creativecommons.org/share-your-work/public-domain/cc0/ - and per - https://phabricator.wikimedia.org/T193728 - which would allow them to be data sources for online bookstores even. Is this the case. Could some of Wikidata's data be licensed with CC-SA-4 ( https://creativecommons.org/licenses/by-sa/4.0/) and other data be licensed with CC-0? Thanks. Cheers, Scott On Thu, May 17, 2018 at 8:39 AM, Rob Speer <rob(a)luminoso.com> wrote: > > As always, copyright is predatory. As we can prove that copyright is the > enemy of science and knowledge > > Well, this kind of gets to the heart of the issue, doesn't it. > > I support the Creative Commons license, including the share-alike term, > which requires copyright in order to work, and I've contributed to multiple > Wikimedia projects with the understanding that my work would be protected > by CC-By-SA. > > Wikidata is engaged in a project-wide act of disobedience against CC-By-SA. > I would say that GerardM has provided an excellent summary of the attitude > toward Creative Commons that I've encountered on Wikidata: "it's holding us > back", "it's the enemy", "you can't copyright knowledge", "you can't make > us follow it", etc. > > The result of this, by the way, is that commercial entities sell modified > versions of Wikidata with impunity. It undermines the terms of other > resources such as DBPedia, which also contains facts extracted from > Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia > and have to agree to share alike, when they can get similar data from > Wikidata which promises them it's CC-0? > > On Wed, 16 May 2018 at 21:43 Gerard Meijssen <gerard.meijssen(a)gmail.com> > wrote: > > > Hoi, > > Thank you for the overly broad misrepresentation. As always, copyright is > > predatory. As we can prove that copyright is the enemy of science and > > knowledge we should not be upset that *copyright *is abused we should > > welcome it as it proves the point. Also when we use texts from everywhere > > and rephrase it in Wikipedia articles "we" are not lily white either. > > > > In "them old days" generally we felt that when people would use > Wikipedia, > > it would only serve our purpose; share the sum of all knowledge. I still > > feel really good about that. And, it has been shown that what we do; > > maintain / curate / update that data that it is not easily given to do as > > well as "we" do it. > > > > When we are to be more precise with our copyright, there are a few things > > we could do to make copyright more transparent. When data is to be > uploaded > > (Commons / Wikipedia or Wikidata) we should use a user that is OWNED and > > operated by the copyright holder. The operation may be by proxy and as a > > consequence there is no longer a question about copyright as the > copyright > > holder can do as we wants. This makes any future noises just that, > > annoying. > > > > As to copyright on Wikidata, when you consider copyright using data from > > Wikipedia. The question is: "What Wikipedia" I have copied a lot of data > > from several Wikipedias and believe me, from a quality point of view > there > > is much to be gained by using Wikidata as an instrument for good because > it > > is really strong in identifying friends and false friends. It is superior > > as a tool for disambiguation. > > > > About the copyright on data, the overriding question with data is: do you > > copy data wholesale in Wikidata. That is what a database copyright is > > about. As I wrote on my blog [1], the best data to include is data that > is > > corroborated by the fact that it is present in multiple sources. This > > negates the notion of a single source, it also underscores that much of > the > > data everywhere is replicated a lot. It also underscores, again, the > notion > > that data that is only present in single sources is what needs attention. > > It needs tender loving care, it needs other sources to establish > > credentials. That is in its own right what makes any claim of copyright > > moot. It is in this process that it becomes a "creative" process negating > > the copyright held on databases. > > > > I welcome the attention that is given to copyright in Wikidata. However > our > > attention to copyright is predatory in two ways. It is how can we get > > around existing copyright and how can we protect our own. As argued, > > Wikidata shines when it is used for what it is intended to be; the place > > that brings data, of Wikipedias first and elsewhere second, together to > be > > used as a repository of quality, open and linked data. > > Thanks, > > GerardM > > > > [1] > > > > https://ultimategerardm.blogspot.nl/2018/05/wikidata- > copyright-and-linked-data.html > > > > On 11 May 2018 at 23:10, Rob Speer <rob(a)luminoso.com> wrote: > > > > > Wow, thanks for the heads up. When I was getting upset about projects > > that > > > change the license on Wikimedia content and commercialize it, I had no > > idea > > > that Wikidata was providing them the cover to do so. The Creative > Commons > > > violation is coming from inside the house! > > > > > > On Tue, 8 May 2018 at 03:48 mathieu stumpf guntz < > > > psychoslave(a)culture-libre.org> wrote: > > > > > > > Hello everybody, > > > > > > > > There is a phabricator ticket on Solve legal uncertainty of Wikidata > > > > <https://phabricator.wikimedia.org/T193728> that you might be > > interested > > > > to look at and participate in. > > > > > > > > As Denny suggested in the ticket to give it more visibility through > the > > > > discussion on the Wikidata chat > > > > < > > > > https://www.wikidata.org/wiki/Wikidata:Project_chat# > > > Importing_datasets_under_incompatible_licenses>, > > > > > > > > I thought it was interesting to highlight it a bit more. > > > > > > > > Cheers > > > > > > > > _______________________________________________ > > > > Wikimedia-l mailing list, guidelines at: > > > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > > > > https://meta.wikimedia.org/wiki/Wikimedia-l > > > > New messages to: Wikimedia-l(a)lists.wikimedia.org > > > > Unsubscribe: https://lists.wikimedia.org/ > mailman/listinfo/wikimedia-l, > > > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > > _______________________________________________ > > > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ > > > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > > > wiki/Wikimedia-l > > > New messages to: Wikimedia-l(a)lists.wikimedia.org > > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > > https://meta.wikimedia.org/wiki/Wikimedia-l > > New messages to: Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > _______________________________________________ > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > wiki/Wikimedia-l > New messages to: Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > -- -- - Scott MacLeod - Founder & President - https://twitter.com/WorldUnivAndSch - World University and School - http://worlduniversityandschool.org - http://scottmacleod.com - CC World University and School - like CC Wikipedia with best STEM-centric CC OpenCourseWare - incorporated as a nonprofit university and school in California, and is a U.S. 501 (c) (3) tax-exempt educational organization.

5 years, 9 months

How can I find out all food & cooking related articles

by Quintin Par

Hello all, I tried looking at some examples but couldn’t quite figure out. if I were to write a simple query like this SELECT ?food ?cooking WHERE { [] wdt:xx ? food. OPTIONAL{?food wdt:xx wd:xxx.} SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } } What are the correct ids that I should be using? - Quintin

5 years, 10 months

Structured Data on Commons - Properties needed

by Keegan Peterzell

Hello, Coming soon: prototypes with the first structured statements on files. Before this, however, the software team needs to know what are the basic properties that Wikidata has, or will need, to support Commons. There is more information and an exercise to help find the properties up on the Structured Data hub on Commons: https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Fee… Please stop by and participate, the workshop will be open for all of the month of July at a minimum. Contact me if you have any questions. Thanks! -- Keegan Peterzell Technical Collaboration Specialist Wikimedia Foundation

5 years, 10 months

Wikibase’s maxlag now takes dispatch lag in account

by Léa Lacroix

*This change impacts people running bots and semi-automated tools to edit Wikidata.* Hello all, Based on the previous discussions that happened around the limitation set up to fix the important dispatch lag on clients, we came with a new solution to try. The database behind Wikidata is replicated to several other database servers. At each edit, the changes are replicated to these other servers. There is always a short lag, which is usually less than a second. If this lag is too high, the other databases can’t synchronize correctly, which can cause problems for reading and editing Wikidata, or reusing data on other projects. If the lag is too high on too many servers, the master database stops accepting new edits. When the lag is close to the limit, the system is prioritizing “humans” edits and ignore the edits from bots, sending back an error. This limit is set up by the maxlag option in the API. People writing bots can set up a number as maxlag for their bot. The default value is 5. This number is used to evaluate two things: the replication lag between master database and replicas, and the size of the job queue. *On Tuesday, June 3rd, maxlag will also evaluate the dispatch lag between Wikidata and clients (eg Wikipedias).* The dispatch lag is the latency between an edit on Wikidata and the moment when it’s shown on clients. Its median value is around 2 minutes. *If you’re running a bot and using a standard configuration (maxlag=5), when the median of dispatch lag is more than 300 seconds, your bot edits won’t be saved and will return an error. * If this change is impacting your work too much, please let us know by letting a comment in this ticket <https://phabricator.wikimedia.org/T194950>. This is also where you can ask any question. You can also change your configuration in order to increase the maxlag limit. More information: Wikidata dispatch Grafana board <https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch?refresh=1m&org…> Thanks for your constructive feedback, -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

5 years, 10 months

IRC Office Hour - 26 June, Structured Data on Commons

by Keegan Peterzell

Hi all, There will be an IRC office hour for Structured Data on Commons [0] from 18:00-19:00 UTC Tuesday, 26 June 2018. You can find links to join as well as date and time conversion at the IRC Office Hour page on Meta [1]. Specific topics are not set in advance, you can come prepared to discuss whatever aspect of the project that you would like. More information about what has taken place can be found at the Structured Data hub on Commons [3]. Thanks, I look forward to seeing you all there. I will send out a reminder a few hours before the meeting starts. 0. https://commons.wikimedia.org/wiki/Commons:Structured_data 1. https://meta.wikimedia.org/wiki/IRC_office_hours#Upcoming_office_hours 2. https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved -- Keegan Peterzell Technical Collaboration Specialist Wikimedia Foundation

5 years, 10 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata June 2018