Wikibase Community User Group August 2021

wikibaseug@lists.wikimedia.org

7 participants
6 discussions

Wikibase Installation and Upgrading survey results published on Meta!
by Mohammed Sadat Abdulai 30 Aug '21

30 Aug '21

Hello, In July, many of you participated in our Wikibase Installation & Updating surveys (see announcement <https://lists.wikimedia.org/hyperkitty/list/wikibaseug@lists.wikimedia.org/…>). We compiled the results – you can have a look on Meta <https://meta.wikimedia.org/wiki/Wikibase/Wikibase_Installation_%26_Updating…> . Many thanks to all those who participated in the survey. Your answers will help us find ways to improve the installation & updating process for users. If you have any questions or additional feedback, please feel free to let us know in this discussion page <https://meta.wikimedia.org/wiki/Talk:Wikibase/Wikibase_Installation_%26_Upd…> or write to me privately. Cheers, -- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase* Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>. Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

Open position for developers / devOps engineers
by Lozana Rossenova 25 Aug '21

25 Aug '21

Dear all, I'm posting here because there is an open devOps position at the Lab where I work. >> https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/job… We are looking for someone with experience in OSS / Mediawiki / Wikibase software (ideally) hence I'm posting here. Please feel free to spread the word if you know anyone who might be interested and feel free to reach out to me directly at lozana.rossenova(a)tib.eu if you have any questions and want to learn more. Cheers, Lozana -- Lozana Rossenova (PhD, London South Bank University) Digital Archives Designer and Researcher

1 0

Wikibase tools & messaging
by Mohammed Sadat Abdulai 24 Aug '21

24 Aug '21

Hello everyone, The Wikibase development team is excited to see the emergence of community-created import tooling such as RaiseWikibase and wikibase-insert, particularly because Wikibase does not yet come equipped with its own import mechanism “out of the box”. To help better support the community, we would like to offer some advice to toolmakers and provide some insight into our planned explorations into making API-based importing better and faster. We anticipate an inherent issue with tools that directly inject information into the Wikibase database tables. Namely, that the schemas that such tools rely on are subject to change. Normal development processes across Wikimedia can (and likely will) endanger the long-term health and stability of these tools, an outcome we would like to avoid as much as possible. Specifically, we can’t guarantee that the layout or content of the tables that these tools write to will not change. Although Wikibase does not have its own public stable interface policy, we work with an eye on Wikidata's policy <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Unstable_Int…> . We would like to offer the following advice to tool developers: 1. Wherever possible, use the HTTP APIs. 2. When using them, make sure to batch requests when possible. 3. Understand that by reading or writing directly to DB tables, Wikibase behavior may end up broken in subtle ways. 4. Understand that tools which read or write from the DB on today’s Wikibase may not still work on tomorrow’s Wikibase. The Wikibase development roadmap <https://www.wikidata.org/wiki/Wikidata:Development_plan#Wikibase_ecosystem> for the year is already tightly booked. However, Wikimedia Germany intends to dedicate some resources during our next 2021 prototyping effort towards exploring ways to optimize our API -- specifically, a solution approaching something like an "import mode" which would bypass unnecessary actions when inserting a body of previously vetted information. This might, for example, include ignoring checks on the uniqueness of labels or user permissions. In addition, we plan to dedicate time toward evaluating OpenRefine’s new Wikibase reconciliation functionality <https://docs.openrefine.org/next/manual/wikibase/overview> on behalf of the community. We will keep you updated on our efforts related to these topics. If you are interested, we welcome you to watch the progress of these related Phabricator tickets as well: - https://phabricator.wikimedia.org/T287164 -- Improve bulk import via API - https://phabricator.wikimedia.org/T285987 -- Do not generate full html parser output at the end of Wikibase edit requests Cheers, -- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase* Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>. Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

3 4

[Breaking change] Languages of entity stubs in RDF output
by Mohammed Sadat Abdulai 24 Aug '21

24 Aug '21

This breaking change is relevant for anyone who consumes Wikidata RDF data through Special:EntityData (rather than the dumps) without using the “dump” flavor. When an Item references other entities (e.g. the statement P31:Q5), the non-dump (?flavor=dump) RDF output of that Item would include the labels and descriptions of the referenced entities (e.g. P31 and Q5) in all languages. That bloats the output drastically and causes performance issues. See Special:EntityData/Q1337.rdf <https://www.wikidata.org/wiki/Special:EntityData/Q1337.rdf> as an example. We will change this so that for referenced entities, only labels and descriptions in the request language (set e.g. via ?uselang=) and its fallback languages are included in the response. For the main entity being requested, labels, descriptions and aliases are still included in all languages available, of course. If you don’t actually need this “stub” data of referenced entities at all, and are only interested in data about the main entity being requested, we encourage you to use the “dump” flavor instead (include flavor=dump in the URL parameters). In that case, this change will not affect you at all, since the dump flavor includes no stub data, regardless of language. This change is currently available for testing at test.wikidata.org. It will be deployed on Wikidata on August 23rd. You are welcome to give us general feedback by leaving a comment in this ticket <https://phabricator.wikimedia.org/T285795>. If you have any questions please do not hesitate to ask. Cheers, -- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase* Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>. Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

2 1

Upcoming Wikibase Live Session—August 26th, 2021
by Mohammed Sadat Abdulai 23 Aug '21

23 Aug '21

Hi everyone, We have our next Wikibase Live Session on Thursday, August 26th at 1600 UTC (18:00 Berlin). What are you working on around Wikibase? You're welcome to come and share with the Wikibase community. *Details about how to participate are below:* Time: 16:00 UTC (18:00 Berlin), 1 hour, Thursday 26th August 2021 Google Meet: https://meet.google.com/nky-nwdx-tuf Join by phone: https://meet.google.com/tel/nky-nwdx-tuf?pin=4267848269474&hs=1 Notes: https://etherpad.wikimedia.org/p/WBUG_2021.08.26 If you have any questions, please do not hesitate to ask. Talk to you soon! -- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase* Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>. Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

Experiences/doubts regarding bulk imports into Wikibase
by Aidan Hogan 04 Aug '21

04 Aug '21

Hey all, Henry (in CC) and I have been looking into the possibility of importing a dataset in the order of around 10-20 million items into Wikibase, and maybe around 50 million claims. Wikibase would be perfect for our needs, but we have been struggling quite a lot to load the data. We are using the Docker version. Initial attempts on a small sample of 10-20 thousand items were not promising, with the load taking a very long time. We found that RaiseWikibase helped to considerably speed up the initial load: https://github.com/UB-Mannheim/RaiseWikibase but on a small sample of 10-20 thousand items, the secondary indexing process was taking several hours. This is the building_indexing() process here (which just calls maintenance scripts): https://github.com/UB-Mannheim/RaiseWikibase/blob/main/RaiseWikibase/raiser… This seems to be necessary for labels to appear correctly in the wiki, and for search to work. Rather than call that method, we have been trying to invoke the maintenance scripts directly and play with arguments that might help, such as batch size. However, some of the scripts still take a long time, even considering the small size of what we are loading. For example: docker exec wikibase-docker_wikibase_1 bash "-c" "php extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --sleep 0.1 --batch-size 10000" Takes around 2 hours on the small sample (which we could multiply by a thousand for the full dataset, i.e., 83 days as an estimate). Investigating the mysql database, it seems to be generating four tables: wbt_item_terms, wbt_term_in_lang, wbt_text, and wbt_text_in_lang, but these are in the order of 20,000 tuples when finished, so it is surprising that the process takes so long. My guess is that the PHP code is looking up pages per item, generating thousands of random accesses on the disk, when it would seem better to just stream tuples/pages contiguously from the table/disk? Later on the CirrusSearch indexing is also taking a long time for the small sample, generating jobs for batches that take a long time to clear. In previous experience, ElasticSearch will happily eat millions of documents in an hour. We are still looking at how batch sizes might help, but it feels like it is taking much longer than it should. Overall, we were wondering if we are approaching this bulk import in the right way? It seems that the PHP scripts are not optimised for performance/scale? Anyone has experience, tips or pointers on converting and loading large-ish scale legacy data into Wikibase? Is there no complete solution (envisaged) for this right now? Best, Aidan

7 17

2024

2023

2022

2021

2020

2019

2018

Wikibase Community User Group August 2021