Hi Binaris
I did not write the any of the threaded stuff in wikipedia.py but I have used it a couple of times. I think what you should do is provide a callable _object_ and not a callback function. You can then iterate through the list of callback objects and look at the errors if there are any. Here is a sample program I wrote to illustrate the concept:
import wikipedia as pywikibot from time import sleep
pages = [ 'User:HRoestBot/CallbackTest1', 'User:HRoestBot/CallbackTest2', ]
class CallbackObject(object): def __init__(self): self.done = False
def __call__(self, page, error): self.page = page self.error = error self.done = True
Callbacks = [] for mypage in pages: print(mypage); callb = CallbackObject() page = pywikibot.Page(pywikibot.getSite(), mypage) Callbacks.append(callb) page.put_async('some text', callback=callb)
# Waiting until all pages are saved on Wikipedia while True: if all( [c.done for c in Callbacks] ): break print "Still Waiting" sleep(5)
# Now we can look at the errors for obj in Callbacks: print obj.page, obj.error if not obj.error is None: # do something to handle errors
The output of such a program may then be
$ python test.py unicode test: triggers problem #3081100 HRoestBot/CallbackTest1 HRoestBot/CallbackTest2 Sleeping for 4.0 seconds, 2012-02-24 09:32:57 Still Waiting Still Waiting Updating page [[HRoestBot/CallbackTest1]] via API Still Waiting Sleeping for 19.3 seconds, 2012-02-24 09:33:18 Still Waiting Updating page [[HRoestBot/CallbackTest2]] via API Still Waiting [[de:HRoestBot/CallbackTest1]] An edit conflict has occured. [[de:HRoestBot/CallbackTest2]] An edit conflict has occured. hr@hr:~/projects/private/pywikipedia_gitsvn$
At least that is how I do it. I hope that helps to understand. You can also use pywikibot.page_put_queue.qsize() and pywikibot.page_put_queue.empty() to check whether the queue is empty or not but this might still lead to problems because the page is fetched from the queue and *then* page.put is called on it. So until page.put() finishes, the queue will be empty even though the bot is still putting the page. See the function def async_put(), it seems to me much safer to rely on the Callback objects to be sure that all the put-calls are done.
You can also look at _flush() method in wikipedia.py to see how it determines whether all pages are put and its save to exit or not.
Hannes
On 23 February 2012 21:30, Bináris wikiposta@gmail.com wrote:
I made a big effort to understand this stuff with put_async and threading, but here is a point I can't get over.
I read a lot and understood that means of waiting for a thread is join, and in wikipedia.py, join must be a method of _putthread which is the Thread object. Noe, wherever I write the line _putthread.join() (I tried put_async, async_put and even replace.py which I know is not a good solution) it freezes my command window as if the thread never terminated. _putthread.join(time) waits for the given time, but this is not appropriate, just for test. Does any script really use this callback at all? Which one? Line 8054 in wikipedia.py says "an explicit end-of-Queue marker is needed", and this is supposed to be a call for async_put with None as value of page. But I don't see anywhere this dummy call for async_put. May that be a bug or I just misunderstand? (Btw, Python 2.4 should be forgotten.)
Please, I need your help.
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l