Wikidata June 2019

wikidata@lists.wikimedia.org

47 participants
39 discussions

by Vladimir Ryabtsev

Hello, I am looking for the list of supported language codes for variations of Chinese. So far in API responses I found these: zh zh-cn zh-hans zh-hant zh-hk zh-tw zh-mo zh-sg "Configure" link in "more languages" section leads to this page: https://www.wikidata.org/wiki/Help:Navigating_Wikidata/User_Options#Babel_e… Which in turn refers to https://meta.wikimedia.org/wiki/Table_of_Wikimedia_projects#Projects_per_la… But apparently there is no such values as 'zh-cn', 'zh-nans' etc. How can I get the COMPLETE list of language codes (desirably with description) for Chinese that is supported by Wikidata? Best regards, Vlad

4 years, 10 months

Scaling Wikidata Query Service

by Guillaume Lederrey

Hello all! There has been a number of concerns raised about the performance and scaling of Wikdata Query Service. We share those concerns and we are doing our best to address them. Here is some info about what is going on: In an ideal world, WDQS should: * scale in terms of data size * scale in terms of number of edits * have low update latency * expose a SPARQL endpoint for queries * allow anyone to run any queries on the public WDQS endpoint * provide great query performance * provide a high level of availability Scaling graph databases is a "known hard problem", and we are reaching a scale where there are no obvious easy solutions to address all the above constraints. At this point, just "throwing hardware at the problem" is not an option anymore. We need to go deeper into the details and potentially make major changes to the current architecture. Some scaling considerations are discussed in [1]. This is going to take time. Reasonably, addressing all of the above constraints is unlikely to ever happen. Some of the constraints are non negotiable: if we can't keep up with Wikidata in term of data size or number of edits, it does not make sense to address query performance. On some constraints, we will probably need to compromise. For example, the update process is asynchronous. It is by nature expected to lag. In the best case, this lag is measured in minutes, but can climb to hours occasionally. This is a case of prioritizing stability and correctness (ingesting all edits) over update latency. And while we can work to reduce the maximum latency, this will still be an asynchronous process and needs to be considered as such. We currently have one Blazegraph expert working with us to address a number of performance and stability issues. We are planning to hire an additional engineer to help us support the service in the long term. You can follow our current work in phabricator [2]. If anyone has experience with scaling large graph databases, please reach out to us, we're always happy to share ideas! Thanks all for your patience! Guillaume [1] https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy [2] https://phabricator.wikimedia.org/project/view/1239/ -- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST

4 years, 10 months

TechConf19: Nominations deadline (6/17) and office hours Thursday (6/13)

by Greg Grossmeier

Hello all, Two things: 1) On Thursday June 13th at 18:00 UTC (11am Pacific) there will be an open office hours for those of you who would like to share your thoughts on the event; topics you'd like to see discussed there, decisions you'd like made, etc. It will occur using Google Meet, at this url: https://meet.google.com/exz-zxfy-nuj If you can't make it to this office hours, don't fret! You can always (continue to) share your thoughts on the Phabricator task: https://phabricator.wikimedia.org/T220212 2) REMINDER: The deadline for participant/attendee nominations is Monday June 17th, this coming Monday. Remember, you can nominate others or yourself. And you can fill out the form as many times as you have nominations. Form: https://forms.gle/CLeGFSMiEasJgEU27 FAQ: https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/FAQ This survey is conducted via a third-party service, which may make it subject to additional terms. For more information on privacy and data-handling, see this survey privacy statement: https://foundation.wikimedia.org/wiki/Wikimedia_Technical_Conference_Survey… On behalf of the Technical Conference Program Committee, Greg On Wed, May 29, 2019 at 04:39:37PM -0700, Greg Grossmeier wrote: > Hello all, > > As you may have seen, the next Wikimedia Technical Conference[0] is > coming up in November 2019. > > It will take place November 12-15th in Atlanta, GA (USA). As announced > at the Hackathon and documented on-wiki[1] this year's event will > focus on the topic of "Developer Productivity". > > Like last year, we are looking for diverse stakeholders, perspectives, > and experiences that will help us to make informed decisions. We need > people who can create and architect solutions, as well as those who > will make funding and prioritization decisions for the projects. > > See the FAQ for (hopefully) any questions you have: > <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/FAQ> > > Please fill out the survey using this link to nominate yourself or someone > else to attend: <https://forms.gle/CLeGFSMiEasJgEU27> > > This survey is conducted via a third-party service, which may make it > subject to additional terms. For more information on privacy and > data-handling, see this survey privacy statement: > <https://foundation.wikimedia.org/wiki/Wikimedia_Technical_Conference_Survey…> > > This nomination form will remain open between May 29 and June 17, 2018. > > If you have any questions, please post them on the event's talk page > <https://www.mediawiki.org/wiki/Talk:Wikimedia_Technical_Conference/2019>. > > Thanks! > > Greg and the Technical Conference 2019 Program Committee > > [0] <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019> > [1] <https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019#Vision_S…> > > -- > Greg Grossmeier > Release Team Manager -- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | Release Team Manager A18D 1138 8E47 FAC8 1C7D |

4 years, 10 months

Wednesday: Technical Advice IRC Meeting

by Raz Shuty

Sorry for cross-posting! Reminder: Technical Advice IRC meeting this week **Wednesday 3-4 pm UTC** on #wikimedia-tech. Questions can be asked in English and Persian! The Technical Advice IRC Meeting (TAIM) is a weekly support event for volunteer developers. Every Wednesday, two full-time developers are available to help you with all your questions about Mediawiki, gadgets, tools and more! This can be anything from "how to get started" over "who would be the best contact for X" to specific questions on your project. If you know already what you would like to discuss or ask, please add your topic to the next meeting: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting Hope to see you there! -- Raz Shuty Engineering Manager Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Imagine a world, in which every single human being can freely share in the sum of all knowledge. That‘s our commitment. Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

4 years, 10 months

Weekly Summary #369

by Léa Lacroix

4 years, 10 months

[Breaking Change] wbeditentity including empty alias set will remove all aliases

by Léa Lacroix

Hello all, This change is relevant for everyone using the *wbeditentity* endpoint of Wikidata’s API. While working on editing the termbox from mobile, we discovered a bug in our code of the wbeditentity endpoint, that does not conform with the implicit interpretation of the documentation <https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/docs/change-…>. A request including {"aliases":{"en":[]}} should, according to the implicit interpretation of its documentation, replace all aliases in English by an empty string, meaning removing all aliases. However, at the moment this action is not actually performed, meaning that this request would leave the aliases untouched. We want to fix this bug, because we need this request to work in order to be able to remove all aliases also in the new termbox on mobile. We are treating this bug fix as a breaking change because the documentation was ambiguous, and there may be some tools currently sending requests with empty alias arrays when nothing need to be touched, intentionally or not. If you are maintaining a tool, please *inspect your tool usage of wbeditentity endpoint*, and make sure that no calls with empty alias arrays are sent unless the intention is to remove these aliases. According to our breaking change policy, this bug fix will be first deployed on beta.wikidata.org later on May 28th, then on wikidata.org on *June 12th*. If you have any question or issue, feel free to discuss in the related ticket <https://phabricator.wikimedia.org/T203337>. Cheers, -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

4 years, 10 months

About Functional QuadStore for WikiData (was Are we ready for our future)

by Amirouche Boubekki

GerardM post triggered my interest to post to the mailing list. As you might know I am working on functional quadstore that is quadstore that keeps around old version of data, like a wiki but in direct-acyclic-graph. It only stores differences between commits. It rely on snapshot of the latest version for fast reads. My ultimate goal is to build somekind of portable knowlege base. That is something like WikiBase + blazegraph but that you spinup on regular machine with the press of button. Enought brag about me. I wont't reply to all the message of the threads one by one but: Here is what SHOULD BE possible: - incremental dumps - time traveling queries - full dumps - The federation of wikibase SHOULD BE possible since it stored in a history like GIT and git pull git push are planned in the ROADMAP And online edition of the quadstore. Access Control List are not designed yet, I except that this should be enforced by the application layer. I planned start working on Data Management System (something like CKAN) with search featrure. But I would gadly work with wikimedia instead. Also, given it modeled after git, one can do merge-request like features, ie. exist the massive import that is crippled. What I would need is logs possibly with timing of queries (read and write) to do benchmarks. Maybe I should ask for fund at mediawiki? FWIW, I got 2 times faster than blazegraph on microbenchmark. > Hoi, > Wikidata grows like mad. This is something we all experience in the really > bad response times we are suffering. It is so bad that people are asked > what kind of updates they are running because it makes a difference in the > lag times there are. > > Given that Wikidata is growing like a weed, it follows that there are two > issues. Technical - what is the maximum that the current approach supports > - how long will this last us. Fundamental - what funding is available to > sustain Wikidata. > > For the financial guys, growth like Wikidata is experiencing is not > something you can reliably forecast. As an organisation we have more money > than we need to spend, so there is no credible reason to be stingy. > > For the technical guys, consider our growth and plan for at least one > year. When the impression exists that the current architecture will not > scale beyond two years, start a project to future proof Wikidata. > > It will grow and the situation will get worse before it gets better. > Thanks, > GerardM > > PS I know about phabricator tickets, they do not give the answers to the > questions we need to address. >

4 years, 10 months

Re: [Wikidata] instance of, subclass of, oh my

by Houcemeddine A. Turki

Dear Sir, I thank you for your efforts. When dealing with biomedical taxonomic statements jn Wikidata, we found similar deficiencies. I have already decided to write a paper about the biomedical taxonomy of Wikidata and how to adjust it. I will be honoured if you can be the first author of the work. You have already extracted the taxonomic statements. So, you can easily filter the biomedical ones. This work has already been done for other taxonomies such as SNOMED-CT (https://scholar.google.ca/citations?user=UsG8QFwAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=c4LlYxsAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=jVLGHGQAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=fBAvwi4AAAAJ&hl=fr&oi=sra). I will be available online for further discussion if you agree to work with our team. This will be simple. Yours Sincerely, Houcemeddine Turki (he/him) Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia Undergraduate Researcher, UR12SP36 GLAM and Education Coordinator, Wikimedia TN User Group Member, WikiResearch Tunisia Member, Wiki Project Med Member, WikiIndaba Steering Committee Member, Wikimedia and Library User Group Steering Committee Co-Founder, WikiLingua Maghreb Founder, TunSci ____________________ +21629499418 -------- Message d'origine -------- De : Gabriel Altay <gabriel.altay(a)gmail.com> Date : 2019/06/15 23:05 (GMT+01:00) À : Discussion list for the Wikidata project <wikidata(a)lists.wikimedia.org> Objet : Re: [Wikidata] instance of, subclass of, oh my Thanks Jan, I will pursue the badminton discussion on the talk page. On Sat, Jun 15, 2019 at 5:49 PM Jan Ainali <jan(a)aina.li<mailto:jan@aina.li>> wrote: Hello Gabriel, I agree with you about the badminton tournaments, that seems odd. It appears to already be a discussion about that on the talk page of the only participant in the badminton project: https://www.wikidata.org/wiki/User_talk:Florentyna#subclass_of:_badminton_t… Perhaps it is best to continue the discussion there? /Jan Ainali http://ainali.com<http://ainali.com/> Den lör 15 juni 2019 kl 23:11 skrev Gabriel Altay <gabriel.altay(a)gmail.com<mailto:gabriel.altay@gmail.com>>: Hello everyone, I was playing around with a recent wikidata dump and extracted the items that "looked" like classes based on the definition here, https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Classes Specifically, an item is a class-item if any of the following are true, * the item is the value of a P31 ("instance of") statement * the item has a P279 ("subclass of") statement (subclass) * the item is the value of a P279 ("subclass of") statement (superclass) Once I extracted all items that met these criteria (2,399,621 items from wikidata-20190603-all.json.bz2) I started examining the results. One of the things I found slightly surprising is that there are about 23k badminton events that are classes b/c they have "subclass of https://www.wikidata.org/wiki/Q13357858" statements. SPARQL query below. https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%0AWHERE%20%0… It also looks like there is a badminton project page, https://www.wikidata.org/wiki/Category:WikiProject_Badminton https://www.wikidata.org/wiki/Wikidata:WikiProject_Badminton/Subclass I'd like to remove these statements as it seems that a particular instance of a badminton tournament https://www.wikidata.org/wiki/Q121940 is not a class. It seems that this pattern is also in place for about 1,000,000 items which are instance of gene (e.g. https://www.wikidata.org/wiki/Q40108). I had a couple questions for the mailing list, 1) do folks know if there is an active group working on wikidata ontology 2) i've read a few messages about shape expressions. would it be worthwhile to setup a shape expression that prevents most items from having both "instance of" and "subclass of" statements? 3) if these entries are generated by bots, what is the best way to get in touch with the owner, their user talk page? I am probably missing a lot of information about what has been done so far in the community, but I'm happy to read anything someone points me towards. best, -Gabriel _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org<mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org<mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata

4 years, 10 months

instance of, subclass of, oh my

by Gabriel Altay

Hello everyone, I was playing around with a recent wikidata dump and extracted the items that "looked" like classes based on the definition here, https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Classes Specifically, an item is a class-item if any of the following are true, * the item is the value of a P31 ("instance of") statement * the item has a P279 ("subclass of") statement (subclass) * the item is the value of a P279 ("subclass of") statement (superclass) Once I extracted all items that met these criteria (2,399,621 items from wikidata-20190603-all.json.bz2) I started examining the results. One of the things I found slightly surprising is that there are about 23k badminton events that are classes b/c they have "subclass of https://www.wikidata.org/wiki/Q13357858" statements. SPARQL query below. https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%0AWHERE%20%0… It also looks like there is a badminton project page, https://www.wikidata.org/wiki/Category:WikiProject_Badminton https://www.wikidata.org/wiki/Wikidata:WikiProject_Badminton/Subclass I'd like to remove these statements as it seems that a particular instance of a badminton tournament https://www.wikidata.org/wiki/Q121940 is not a class. It seems that this pattern is also in place for about 1,000,000 items which are instance of gene (e.g. https://www.wikidata.org/wiki/Q40108). I had a couple questions for the mailing list, 1) do folks know if there is an active group working on wikidata ontology 2) i've read a few messages about shape expressions. would it be worthwhile to setup a shape expression that prevents most items from having both "instance of" and "subclass of" statements? 3) if these entries are generated by bots, what is the best way to get in touch with the owner, their user talk page? I am probably missing a lot of information about what has been done so far in the community, but I'm happy to read anything someone points me towards. best, -Gabriel

4 years, 10 months

Next ld4-Wikidata Affinity Group meeting on June 18th

by Hilary K Thorsen

Hello everyone, We're continuing to have biweekly meetings of the Wikidata Affinity Group as part of the Linked Data for Production project<https://wiki.duraspace.org/pages/viewpage.action?pageId=104568167>. Our next meeting is Tuesday, June 18th. Elena Aleynikova and Jens Ohlig of Wikimedia Deutschland have graciously offered to share a presentation about their work with the German National Library using Wikibase. The call details are below and I hope you will join us. To receive notification about upcoming meetings and meeting notes you can subscribe to the ld4-wikidata Google Group<https://groups.google.com/d/forum/ld4-wikidata>, and if you aren't able to attend, the meetings are recorded and meeting notes available on Google Drive<https://drive.google.com/drive/folders/1JwTulCABs0TkGQDVSnYbIYEb7bC-j4-n>. The Affinity Group is open to anyone interested, so feel free to share this invitation. The call details and communication channels are below. Call Details: June 18, 2019 9am PST / 12pm EST / 16:00 UTC / 6pm CEST (Time zone converter<https://www.timeanddate.com/worldclock/converter.html?iso=20190521T160000&p…>) Zoom link to join meeting: https://stanford.zoom.us/j/204437188  Agenda: https://docs.google.com/document/d/1_VgK2C1AYI4qv3IpjJVfpFb8moGfWiAwTSTIbKq… Communication: Ld4-wikidata Google group: https://groups.google.com/d/forum/ld4-wikidata #wikidata channel on LD4 Slack: http://bit.ly/joinld4slack Notes in public LD4 Wikidata folder: https://drive.google.com/drive/folders/1JwTulCABs0TkGQDVSnYbIYEb7bC-j4-n Website: https://wiki.duraspace.org/display/LD4P2/Wikidata+Affinity+Group Cheers, Hilary Hilary Thorsen Wikimedian in Residence Digital Library Systems and Services Stanford Libraries Stanford, CA 94305 thorsenh(a)stanford.edu 650-285-9429

4 years, 10 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata June 2019