[Wikibase] Re: Experiences/doubts regarding bulk imports into Wikibase

23 Jul 2021


      @Renat, thank you for starting the Phabricator!
@Jesper, this looks really great! Do you know if this takes care of the 
secondary indexing for labels, specifically the wbt_item_terms, 
wbt_term_in_lang, wbt_text, and wbt_text_in_lang tables? I notice that 
these tables are not mentioned in the code.
Also one doubt: when you say "if you do this without a transaction", 
here you mean that if you do it without an explicit transaction / with a 
transaction per operation? I guess that the default behaviour is that 
each operation will form its own transaction and do the corresponding 
logging, latching, etc., for each insert, update, etc.?
@Dennis, agreed that this is part of the issue. Some of the scripts do 
provide options for batching, which certainly help significantly, but 
can still lead to lots of transactions / tasks / requests when importing 
at scale. It seems that some of the scripts for secondary indexing, 
however, do not support batching.
Best,
Aidan
On 2021-07-23 10:39, Jesper Zedlitz wrote:
...
...
Anyone has experience, tips or pointers on converting
and loading large-ish scale legacy data into Wikibase? Is there no
complete solution (envisaged) for this right now?
Even though this topic is a few days old, I would like to add some of my
experiences. I had the same problem about a year ago and wrote a Java
program to insert millions of items pretty fast. It works for the LTS
version. I don't know if it also works with the current version.
You can find the code here: https://github.com/jze/wikibase-insert
Best wishes,
Jesper
_______________________________________________
Wikibaseug mailing list -- wikibaseug@lists.wikimedia.org
To unsubscribe send an email to wikibaseug-leave@lists.wikimedia.org

2024

2023

2022

2021

2020

2019

2018

[Wikibase] Re: Experiences/doubts regarding bulk imports into Wikibase