Il giorno Mon, 4 Feb 2019 15:36:08 +0100 Kévin Bois kevin.bois@biblissima-condorcet.fr ha scritto:
Hello,
I'm trying to write a pywikibot script which read and create items / properties on my Wikibase instance. Following pieces of tutorials and script examples, I managed to write something working.
1/ The idea is to read a CSV file, and create an item with its properties for each line. So I have to loop over thousands of lines and create an item and multiple claims associated, and it takes quite some time to do so. (atleast 1 hour to create 1000 items) I guess it's because for each line, I create a new entity and new claims, which means multiple requests for each line. Some pseudo code I use in my script: To create a new item, I use : repo.editEntity({}, {}, summary='new item') assuming repo = site.data_repository() To create a new claim, I use : self.user_add_claim_unless_exists(item, claim), assuming my Bot inherit WikidataBot
Is there a better way to optimize that kind of bulk import ?
--
2/ I kind of have the same problem If I want to check if an item already exists, because first I need to get all existing items and check if they are in my CSV or not. (the CSV does not contain QIDs, but does contain a "custom" ID I've created and added as a property to each item )
--
I hope I was clear enough, any relevant example, idea, advice, would be much appreciated. Bear in mind I'm a beginner with the whole ecosystem so I'm open to any recommendation. Thanks ! _______________________________________________ pywikibot mailing list pywikibot@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot
I do not know if this message will be delivered. I hope so.
About the first question, I think you can split split the workload among different python threads.
About the second, could you generate the QID with an injective function from your id, so you would just have to execute the function using your ID and check if the correspondent QID exists.
Pellegrino