I'll answer in the body of both mails. Thank you so much for your help !

Le 4 févr. 2019 à 17:14, Strainu <strainu10@gmail.com> a écrit :

Not sure about this, but you might consider using low-level API
functions directly or even crafting your API calls by hand. That kind
of defies the purpose of using pwb, but oh well...

=> I see. I think I'll try figuring it out with pywikibot first, for simplicity sake. If I can't find a good enough solution with pwb, I may try that.

This sounds like a great job for a SPARQL Query
(see https://query.wikidata.org for the public endpoint for WIkidata).
Is it feasible to add such an interface to your instance?

=> Yes, I'll plug in a SPARQL endpoint soon, I assume that kind of request is fast, so this is definitely something I'll try !

Le 4 févr. 2019 à 17:59, Pellegrino Prevete <pellegrinoprevete@gmail.com> a écrit :

Il giorno Mon, 4 Feb 2019 15:36:08 +0100
Kévin Bois <kevin.bois@biblissima-condorcet.fr> ha scritto:

Hello,

I'm trying to write a pywikibot script which read and create items /
properties on my Wikibase instance. Following pieces of tutorials and
script examples, I managed to write something working.

1/ The idea is to read a CSV file, and create an item with its
properties for each line. So I have to loop over thousands of lines
and create an item and multiple claims associated, and it takes quite
some time to do so. (atleast 1 hour to create 1000 items) I guess
it's because for each line, I create a new entity and new claims,
which means multiple requests for each line. Some pseudo code I use
in my script: To create a new item, I use : repo.editEntity({}, {},
summary='new item') assuming repo = site.data_repository() To create
a new claim, I use : self.user_add_claim_unless_exists(item, claim),
assuming my Bot inherit WikidataBot

Is there a better way to optimize that kind of bulk import ?

--

2/ I kind of have the same problem If I want to check if an item
already exists, because first I need to get all existing items and
check if they are in my CSV or not. (the CSV does not contain QIDs,
but does contain a "custom" ID I've created and added as a property
to each item )

--

I hope I was clear enough, any relevant example, idea, advice, would
be much appreciated. Bear in mind I'm a beginner with the whole
ecosystem so I'm open to any recommendation. Thanks !
_______________________________________________ pywikibot mailing list
pywikibot@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikibot

I do not know if this message will be delivered. I hope so.

About the first question, I think you can split split the workload
among different python threads.

=> That sounds awesome, I'll look into that

About the second, could you generate the QID with an injective function
from your id, so you would just have to execute the function using your
ID and check if the correspondent QID exists.

=> It sounds like what I had in mind but I'm not sure if I understood correctly what you mean.

To expand what I wanted to do: before adding anything with the script, I wanted to create a big mapping (in a python dictionary) with my custom ID and its corresponding QID. something like that: id_mapping = {custom_id1 => QID1, custom_id2 => QID2, ...}. Then I could easily look into that dictionary when needed before actually adding an item. This is why I'm trying to retrieve all existing item as a first step.

Pellegrino

_______________________________________________
pywikibot mailing list
pywikibot@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikibot