Wikidata June 2019

wikidata@lists.wikimedia.org

47 participants
39 discussions

Weekly Summary #338
by Léa Lacroix 10 Oct '20

10 Oct '20

4 4

Accessing tabular data from SPARQL?
by Daniel Mietchen 07 Jul '20

07 Jul '20

Hi, I'm looking into ways to use tabular data like https://commons.wikimedia.org/wiki/Data:Zika-institutions-test.tab in SPARQL queries but could not find anything on that. My motivation here is in part coming from the time out limits, and the basic idea here would be to split queries that typically time out into sets of queries that do not time out and - if their results were aggregated - would yield the results that would be expected for the original query would it not time out. The second line of motivation here is that of keeping track of how things develop over time, which would be interesting for both content and maintenance queries as well as usage of things like classes, references, lexemes or properties. I would appreciate any pointers or thoughts on the matter. Thanks, Daniel

3 5

Overload of query.wikidata.org
by Guillaume Lederrey 10 Jul '19

10 Jul '19

Hello all! We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent. As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines. More information and a proper incident report will be communicated as soon as we are on top of things again. Thanks for your understanding! Guillaume [1] https://meta.wikimedia.org/wiki/User-Agent_policy -- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST

8 10

[Breaking change] Important for Wikidata tools maintainers: wb_terms table to be dropped at the end of May
by Léa Lacroix 05 Jul '19

05 Jul '19

Hello all, This is an important announcement for all the tool builders and maintainers who access Wikidata’s data by *querying directly Labs database replicas*. In May-June 2019, the Wikidata development team will drop the wb_terms table from the database in favor of a new optimized schema. Over years, this table has become too big, causing various issues. This change requires the tools using wb_terms to be updated. Developers and maintainers will need to *adapt their code* to the new schema before the migration starts and switch to the new code when the migration starts. The migration will start on *May 29th*. On May 15th, a test system will be available for you to test your code. The table being used by plenty of external tools, we are setting up a process to make sure that the change can be done together with the developers and maintainers, without causing issues and broken tools. Most of the documentation and updates will take place on Phabricator: - In this Phabricator task <https://phabricator.wikimedia.org/T221764>, you can find a description of the changes and the process, and you can ask for more details or for help in the comments. This is also where updates will be announced if necessary. - On the Tool Builders Migration board <https://phabricator.wikimedia.org/tag/wb_terms_-_tool_builders_migration> you will find all the details about the migration, how to update your tool <https://phabricator.wikimedia.org/T221765>, and you can add your own tasks. - If you need to discuss with the Wikidata developers or get more specific help, we set up two dedicated IRC meetings and a session at the Wikimedia hackathon. More information in this task <https://phabricator.wikimedia.org/T221764>. We are aware that this change will ask you to make some important changes in your code, and we are willing to help you as much as our resources allow us to. We hope that you will understand that this change is made to avoid bigger issues in the near future. Note that this change is not impacting Wikibase instances outside of Wikidata. A dedicated migration plan and announcement will follow. We strongly encourage you to not wait until last minute to make the changes in your code. If you have any question or issue, we will be happy to help. In order to keep the discussions in one place, please ask questions or raise issues directly in the Phabricator task and board. Thanks for your understanding, Cheers, -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

3 4

Performance and update versus query
by Gerard Meijssen 30 Jun '19

30 Jun '19

Hoi, The performance of the query update is getting worse. Questions about this have been raised before. I do remember quality replies like it is not exponential so there is no problem. However, here we are and there is a problem. The problem is that I run batch jobs, batch jobs that do not run [1]. I have the impression that they are put in some kind of suspended animation by a person. These jobs are submitted by the SourceMD tool by Magnus, Magnus is well known for being responsive to suggestions on how he can improve them. So do not use as an argument that there is something wrong with these job. At most it is acceptable for these run to put on some kind of hold for the duration of a crisis and then there has to be a release. At the same time I notice that the reports indicating multiple items with the same ORCiD id include items that should have been picked up by earlier reports. I notice that query does not pick up existing items with an ORCid id and creates new ones. For me this is an indication that Query is not reliable. There is talk on the Wiki that there is no point in having fixed descriptions in anything but English. What caused this discussion is the sheer amount of updates needed just for one language. At the London Wikimania this perceived need for fixed descriptions was discussed vis a vis automated descriptions and as I recall the only argument for having them at all was "standards" in relation to dumps. Yes, automated descriptions may be cached and included in a dump. I have been asked to write for the ORCiD blog and thereby in effect plug the relevance of the Scholia presentation for scientists. When I do, the number of jobs like the ones I run will mushroom. It is why I have not put anything forward so far because we cannot cope as it is. The issues I see is, * again to what extend can we grow our content, both for query and update for the short medium and long term * will batch jobs like mine be able to complete * can we ingest the attention when scholars discover how relevant Scholia is for them, the subject they care for. * do we care that motivation of volunteers relies on the availability of sufficient performance to do the tasks they care for. Thanks, Gerard [1] https://tools.wmflabs.org/sourcemd/?action=batches&user=GerardM

3 4

Cadastro
by Patrecio Rocha 26 Jun '19

26 Jun '19

Patrecio Rocha Musico Católico Patrecio Rocha www.facebook.com/patreciorochaoficial

1 0

Significant change of Wikidata dump size
by Vladimir Ryabtsev 26 Jun '19

26 Jun '19

Hello, I apologize if I missed something, but why the current JSON dump size is ~25GB while a week ago it was ~58GB? (see https://dumps.wikimedia.org/wikidatawiki/entities/20190617/) Regards, Vladimir

3 7

minimal hardware requirements for loading wikidata dump in Blazegraph
by Adam Sanchez 25 Jun '19

25 Jun '19

Hello, Does somebody know the minimal hardware requirements (disk size and RAM) for loading wikidata dump in Blazegraph? The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G. The bigdata.jnl file which stores all the triples data in Blazegraph is 478G but still growing. I had 1T disk but is almost full now. Thanks, Adam

8 11

“Wikidata Digest, Vol 84, Issue 15” has been cancelled
by Mukherjee Lal Kaustubh 25 Jun '19

25 Jun '19

“Wikidata Digest, Vol 84, Issue 15” has been cancelled when: 26 November 2018, 12:00 PM IST - 1:00 PM IST deleted by: Mukherjee Lal Kaustubh --- iCloud is a service provided by Apple. Apple ID: https://appleid.apple.com/choose-your-country/ Support: https://www.apple.com/support/icloud/ww Terms and Conditions: https://www.apple.com/legal/internet-services/icloud/ww/ Privacy Policy: https://www.apple.com/uk/legal/privacy/en-ww/ Copyright 2019 Apple Distribution International, Hollyhill Industrial Estate, Hollyhill, Cork, Ireland. All rights reserved.

1 0

Weekly Summary #370
by Léa Lacroix 24 Jun '19

24 Jun '19

*Here's your quick overview of what has been happening around Wikidata over the last week.* Discussions - Closed request for comments: semi-protection to prevent vandalism on most used Items <https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/semi-protection…> Events <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Events> - Upcoming: 1st Iberoamerican Knowledge Graphs and Semantic Web Conference <http://www.kgswc.org> keynote on Wikidata by José E. Labra <https://www.slideshare.net/jelabra/wikidata-151323898> Villa Clara, Cuba 24-28 June - Upcoming: Wikidata meeting for researchers <http://sparql-wikidata-2019.smr-team.org/> in Sfax University, Tunisia, 25-27 June 2019 - Upcoming: Wikidata meetup in London <https://www.eventbrite.co.uk/e/london-wikidata-meetup-3-wikidata-osmuk-tick…>, June 29th - Upcoming: July 3rd: UGent Wikidata and Wikibase Workshop 2019 <https://www.wikidata.org/wiki/Wikidata:Events/UGent_Wikidata_and_Wikibase_W…> Press, articles, blog posts <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Press_coverage> - *Demonstrating Spindra: A Geographic Knowledge Graph Management System <https://ieeexplore.ieee.org/document/8731539/>*, Yuhan Sun, et al (in 2019 IEEE 35th International Conference on Data Engineering) - *Comparing DBpedia, Wikidata, and YAGO for Web Information Retrieval <https://link.springer.com/chapter/10.1007%2F978-981-13-6031-2_40>*, Sini Govinda Pillai, et al. - *WikiDataSets: Standardized sub-graphs from WikiData <https://arxiv.org/pdf/1906.04536.pdf>*, Armand Boschin, in ArXiv - *Ordia: A Web application for Wikidata lexemes <https://pdfs.semanticscholar.org/136c/f1c109586777e71efab645250605fb3ff3bf.…>*, Finn Årup Nielsen <https://www.wikidata.org/wiki/User:Fnielsen> - *Combining embedding methods for a word intrusion task <https://pdfs.semanticscholar.org/163d/bc94ff35e8c5641a616672aaab184379ec87.…>* Finn Årup Nielsen, et al - *Query expansion using Wikidata attributes’ values <https://eudl.eu/doi/10.4108/eai.24-4-2019.2284070>* Sarah Dahir, Abderrahim et al., in ICCWCS'19 - *Automatic Question Generation based on MOOC Video Subtitles and Knowledge Graph <https://dl.acm.org/citation.cfm?id=3323820>* Lin Ma, Yuchun Ma, in ICIET 2019 - *Wikidata as a linked-data hub for Biodiversity data <https://biss.pensoft.net/article/35206/>*, Andra Waagmeester <https://www.wikidata.org/wiki/User:Andrawaag>, et al. - *Using Crowd-curation to Improve Taxon Annotations on the Wikimedia Infrastructure <https://biss.pensoft.net/article/35216/>*, Andra Waagmeester <https://www.wikidata.org/wiki/User:Andrawaag>, et al. - *Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation <http://labra.weso.es/publication/2019_eswc_shex/>, et al. Best in-use paper award ESWC <https://twitter.com/eswc_conf/status/1137110223691440128>* - *From “an” Identifier to “the” Identifier <https://doi.org/10.6017/ital.v38i2.10886>*, Theo van Veen - Developing workflows for local authority file conversion from MARC to Wikidata <https://www.youtube.com/watch?v=Fa6jZTs1sY4> Other Noteworthy Stuff - Structured Data on Commons <https://commons.wikimedia.org/wiki/Commons:Structured_data>: qualifiers for depicts support have been enabled on June 20th - Result format change for Query Service JSON query output <https://lists.wikimedia.org/pipermail/wikidata/2019-June/013206.html> - Filename scheme for Wikidata RDF entity dumps will change <https://lists.wikimedia.org/pipermail/wikidata/2019-June/013200.html> - The development of Wikidata Bridge (editing Wikidata's data from Wikipedia) started <https://lists.wikimedia.org/pipermail/wikidata/2019-June/013216.html> - CheckShex userscript <https://www.wikidata.org/wiki/User:Teester/CheckShex.js> adds an in-page way of checking if an item fits an Entityschema Did you know? - Newest properties <https://www.wikidata.org/wiki/Special:ListProperties>: - General datatypes: has written for <https://www.wikidata.org/wiki/Property:P6872>, motif represents <https://www.wikidata.org/wiki/Property:P6875>, solar irradiance <https://www.wikidata.org/wiki/Property:P6876>, effective temperature <https://www.wikidata.org/wiki/Property:P6879> - External identifiers: JIS standard <https://www.wikidata.org/wiki/Property:P6857>, Réunionnais du monde ID <https://www.wikidata.org/wiki/Property:P6858>, Corporate Number (South Korea) <https://www.wikidata.org/wiki/Property:P6859>, dbSNP ID <https://www.wikidata.org/wiki/Property:P6861>, digilibIT author ID <https://www.wikidata.org/wiki/Property:P6862>, Digital Prosopography of the Roman Republic ID <https://www.wikidata.org/wiki/Property:P6863>, eBiodiversity ID <https://www.wikidata.org/wiki/Property:P6864>, eurobasket.com coach ID <https://www.wikidata.org/wiki/Property:P6865>, euroleague.net coach ID <https://www.wikidata.org/wiki/Property:P6866>, Gamepedia Wiki ID <https://www.wikidata.org/wiki/Property:P6867>, Hoopla artist ID <https://www.wikidata.org/wiki/Property:P6868>, Hoopla publisher ID <https://www.wikidata.org/wiki/Property:P6869>, Latvian National Encyclopedia Online ID <https://www.wikidata.org/wiki/Property:P6870>, ECI Lok Sabha constituency code <https://www.wikidata.org/wiki/Property:P6871>, IntraText author ID <https://www.wikidata.org/wiki/Property:P6873>, Musixmatch artist ID <https://www.wikidata.org/wiki/Property:P6874>, MAHG ID <https://www.wikidata.org/wiki/Property:P6877>, Amburger database ID <https://www.wikidata.org/wiki/Property:P6878>, ATP tennis tournament edition ID <https://www.wikidata.org/wiki/Property:P6880>, Rugby League Project ID (general) <https://www.wikidata.org/wiki/Property:P6881>, RIAA artist ID <https://www.wikidata.org/wiki/Property:P6882> - New property proposals <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Property_proposal> to review: - General datatypes: extracted from <https://www.wikidata.org/wiki/Wikidata:Property_proposal/extracted_from>, Instagram hashtag <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Instagram_hashtag>, Historical Archives of the European Union ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Historical_Archive…>, number of pins, number of pin positions <https://www.wikidata.org/wiki/Wikidata:Property_proposal/number_of_pins,_nu…>, Representative in legislature <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Representative_in_…>, subscribers <https://www.wikidata.org/wiki/Wikidata:Property_proposal/subscribers> - External identifiers: nchdb asset id <https://www.wikidata.org/wiki/Wikidata:Property_proposal/nchdb_asset_id>, Biblioteca Nacional Aruba ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Biblioteca_Naciona…>, Bangladesh administrative division code (2017-) <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Bangladesh_adminis…>, Goodreads series ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Goodreads_series_ID>, Heritage Gazetteer of Cyprus <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Heritage_Gazetteer…>, Retrosheet ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Retrosheet_ID>, The DJ List artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/The_DJ_List_artist…>, BBC artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/BBC_artist_ID>, The Independent topic ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/The_Independent_to…>, NME artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/NME_artist_ID>, Metro topic ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Metro_topic_ID>, MangaSeek person ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/MangaSeek_person_ID>, Soccerway coach ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Soccerway_coach_ID>, Nederlandse Top 40 artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Nederlandse_Top_40…>, Dutch Charts artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Dutch_Charts_artis…>, hitparade.ch artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/hitparade.ch_artis…>, Aviation Safety Network Wikibase ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Aviation_Safety_Ne…>, Find & Connect ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Find_%26_Connect_ID>, SA Flora ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/SA_Flora_ID>, ATRF ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/ATRF_ID>, geograph <https://www.wikidata.org/wiki/Wikidata:Property_proposal/geograph>, ArtBrokerage artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/ArtBrokerage_artis…>, Indian gallantry awardee ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Indian_gallantry_a…>, Scandipop topic ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Scandipop_topic_ID>, iTunes movie collection ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/iTunes_movie_colle…>, SLNSW unpublished item ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/SLNSW_unpublished_…>, LB.ua dossier <https://www.wikidata.org/wiki/Wikidata:Property_proposal/LB.ua_dossier>, FPBR person ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/FPBR_person_ID>, RBF amateur boxer ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/RBF_amateur_boxer_…>, RBF professional boxer ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/RBF_professional_b…>, Roskomnadzor media license number <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Roskomnadzor_media…>, 2014 Commonwealth Games athlete ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/2014_Commonwealth_…>, Music certifictions ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Music_certifiction…>, El portal de Música artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/El_portal_de_M%C3%…> - Query examples: - Most commonly used qualifiers on depicts (P180) statements on Wikidata, with top 10 classes or items that each one is used on <https://query.wikidata.org/embed.html#SELECT%20%3Ftotal%20%3Fcount%20%3Fdep…> (source <https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&oldid=…> ) - Most commonly used properties connecting military people to military organisations <https://query.wikidata.org/#SELECT%20%3Fcount%20%3Fproperty%20%3FpropertyLa…> (source <https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=9681…> ) - Bubblechart of most frequent WMF import sources <https://query.wikidata.org/embed.html#%23%20most%20frequent%20Wikimedia%20i…> - French male first names ending with "ée" <https://w.wiki/5Ew> ( source <https://twitter.com/photos_floues/status/1142012922434084864>) - Newest WikiProjects <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:WikiProjects>: WikiProject Australia <https://www.wikidata.org/wiki/Wikidata:WikiProject_Australia> - Newest database reports: Property completion by country leaderboard <https://www.wikidata.org/wiki/Wikidata:WikiProject_Properties/Reports/Count…> Development - Beginning of wb_terms migration (phab:T221764 <https://phabricator.wikimedia.org/T221764>) - Fixed a bug with phan on Wikibase ([[phab:T226083) - Enable bugfix for wbeditentity setting aliases to empty array ( phab:T223303 <https://phabricator.wikimedia.org/T223303>) - Fix an issue with pipe character on some special pages (phab:T223270 <https://phabricator.wikimedia.org/T223270>) - Fix an issue with removed constraint still displayed on Lexeme ( phab:T223372 <https://phabricator.wikimedia.org/T223372>) - Make edits to EntitySchema pages autopatrolled (phab:T224495 <https://phabricator.wikimedia.org/T224495>) - Work on creating better edit summaries for wbeditentity API endpoint ( phab:T224010 <https://phabricator.wikimedia.org/T224010>) - Deploy the work environment for new mobile termbox - Start setting up the technical environment for Wikidata bridge You can see all open tickets related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R>. If you want to help, you can also have a look at the tasks needing a volunteer <https://phabricator.wikimedia.org/project/board/71/query/zfiRgTnZF7zu/?filt…>. Monthly Tasks - Add labels, in your own language(s), for the new properties listed above. - Comment on property proposals: all open proposals <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Overview> - Suggested and open tasks <https://www.wikidata.org/wiki/Wikidata:Contribute/Suggested_and_open_tasks> ! - Contribute to a Showcase item <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Showcase_items> . - Help translate <https://www.wikidata.org/wiki/Special:LanguageStats> or proofread the interface and documentation pages, in your own language! - Help merge identical items <https://www.wikidata.org/wiki/User:Pasleim/projectmerge> across Wikimedia projects. - Help write the next summary! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata June 2019