Hello,
I'll answer in the body of both mails. Thank you so much for your help !
Le 4 févr. 2019 à 17:14, Strainu strainu10@gmail.com a écrit :
Not sure about this, but you might consider using low-level API functions directly or even crafting your API calls by hand. That kind of defies the purpose of using pwb, but oh well...
=> I see. I think I'll try figuring it out with pywikibot first, for simplicity sake. If I can't find a good enough solution with pwb, I may try that.
This sounds like a great job for a SPARQL Query (see https://query.wikidata.org for the public endpoint for WIkidata). Is it feasible to add such an interface to your instance?
=> Yes, I'll plug in a SPARQL endpoint soon, I assume that kind of request is fast, so this is definitely something I'll try !
--
Le 4 févr. 2019 à 17:59, Pellegrino Prevete pellegrinoprevete@gmail.com a écrit :
Il giorno Mon, 4 Feb 2019 15:36:08 +0100 Kévin Bois kevin.bois@biblissima-condorcet.fr ha scritto:
Hello,
I'm trying to write a pywikibot script which read and create items / properties on my Wikibase instance. Following pieces of tutorials and script examples, I managed to write something working.
1/ The idea is to read a CSV file, and create an item with its properties for each line. So I have to loop over thousands of lines and create an item and multiple claims associated, and it takes quite some time to do so. (atleast 1 hour to create 1000 items) I guess it's because for each line, I create a new entity and new claims, which means multiple requests for each line. Some pseudo code I use in my script: To create a new item, I use : repo.editEntity({}, {}, summary='new item') assuming repo = site.data_repository() To create a new claim, I use : self.user_add_claim_unless_exists(item, claim), assuming my Bot inherit WikidataBot
Is there a better way to optimize that kind of bulk import ?
--
2/ I kind of have the same problem If I want to check if an item already exists, because first I need to get all existing items and check if they are in my CSV or not. (the CSV does not contain QIDs, but does contain a "custom" ID I've created and added as a property to each item )
--
I hope I was clear enough, any relevant example, idea, advice, would be much appreciated. Bear in mind I'm a beginner with the whole ecosystem so I'm open to any recommendation. Thanks ! _______________________________________________ pywikibot mailing list pywikibot@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot
I do not know if this message will be delivered. I hope so.
About the first question, I think you can split split the workload among different python threads.
=> That sounds awesome, I'll look into that
About the second, could you generate the QID with an injective function from your id, so you would just have to execute the function using your ID and check if the correspondent QID exists.
=> It sounds like what I had in mind but I'm not sure if I understood correctly what you mean. To expand what I wanted to do: before adding anything with the script, I wanted to create a big mapping (in a python dictionary) with my custom ID and its corresponding QID. something like that: id_mapping = {custom_id1 => QID1, custom_id2 => QID2, ...}. Then I could easily look into that dictionary when needed before actually adding an item. This is why I'm trying to retrieve all existing item as a first step.
Pellegrino
pywikibot mailing list pywikibot@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot