[Bug 55889] New: Thread-safe versions of the Generators

List overview All Threads
Download

newer

older

[Bug 73397] claimit.py should add...

[Bug 73370] deletedrevs deprecated

bugzilla-daemon＠wikimedia.org

18 Oct 2013 18 Oct '13

8:04 p.m.

https://bugzilla.wikimedia.org/show_bug.cgi?id=55889

Web browser: --- Bug ID: 55889 Summary: Thread-safe versions of the Generators Product: Pywikibot Version: core (2.0) Hardware: All OS: All Status: NEW Severity: enhancement Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: crangasi2001@yahoo.com Classification: Unclassified Mobile Platform: ---

Currently, the generators functions use yield, which is not tread-safe. PWB should offer a thread-safe version using one of the many interesting suggestions from http://www.dabeaz.com/generators/Generators.pdf (or any other method :P)

-- You are receiving this mail because: You are the assignee for the bug.

Show replies by date

bugzilla-daemon＠wikimedia.org

18 Oct 18 Oct

8:15 p.m.

New subject: [Bug 55889] Thread-safe versions of the Generators

https://bugzilla.wikimedia.org/show_bug.cgi?id=55889

Merlijn van Deen valhallasw@arctus.nl changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |valhallasw@arctus.nl

--- Comment #1 from Merlijn van Deen valhallasw@arctus.nl --- What is the goal you want to achieve by this? Remember that threads in Python are useless for computations, due to the GIL.

-- You are receiving this mail because: You are the assignee for the bug.

bugzilla-daemon＠wikimedia.org

11:05 p.m.

New subject: [Bug 55889] Thread-safe versions of the Generators

https://bugzilla.wikimedia.org/show_bug.cgi?id=55889

--- Comment #2 from Strainu crangasi2001@yahoo.com --- As I understand it, I/O happens outside of GIL. As the API requests are the most time-consuming part of many of my robots (and more precisely the connection to the servers), being able to do requests from several threads should somewhat improve performance (as long as the throttling is not too aggressive).

I've noticed that the preloading limit is not only 50 pages, making this problem even more stringent for many small pages. It's probably also a good idea for things like image upload/download.

If it helps, we can do some tests to see if performance is increased for a simple file downloader?

-- You are receiving this mail because: You are the assignee for the bug.

bugzilla-daemon＠wikimedia.org

11:26 p.m.

New subject: [Bug 55889] Thread-safe versions of the Generators

https://bugzilla.wikimedia.org/show_bug.cgi?id=55889

--- Comment #3 from Merlijn van Deen valhallasw@arctus.nl --- I see. There are already some features in place, but we are maybe not using asynchronous requests at all points where it might be useful.

First of all, connections should be re-used - this is already a feature in the httplib2 library.

The next layer, comms.threadedhttp, supports asynchronous requests ('features' would be a closer term - basically, you create a request and then wait for a lock to be released). However, I don't think we use this feature anywhere, as it's not exposed in the higher-up layers.

For saving pages, which (I think) is the most relevant place for async request, we already have support, where requests that do not return a reply that has to be handled can be handled asynchronously - see Page.put_async.

For pagegenerators, we might be able to win a bit by requesting the (i+1)th page before returning the i-th page (or, for the PreloadingGenerator, by requesting the (i+1)th batch before all pages from the i-th batch have been returned).

-- You are receiving this mail because: You are the assignee for the bug.

bugzilla-daemon＠wikimedia.org

19 Oct 19 Oct

12:25 a.m.

New subject: [Bug 55889] Thread-safe versions of the Generators

https://bugzilla.wikimedia.org/show_bug.cgi?id=55889

--- Comment #4 from Strainu crangasi2001@yahoo.com --- (In reply to comment #3)

...

The next layer, comms.threadedhttp, supports asynchronous requests. [...] I don't think we use this feature anywhere, as it's not exposed in the higher-up layers.

I've noticed that while writing the answer to Gerard's questions today :)

...

For saving pages, which (I think) is the most relevant place for async request, we already have support, where requests that do not return a reply that has to be handled can be handled asynchronously - see Page.put_async.

I've experimented with put_async with mixed results. When the upload works, it's mostly OK, however when one request hits an error (like a 504 from the server) it just keeps trying again and again, keeping the thread blocked.

Instead, the request should probably be de-queued, processed and, if a callback has been registered, the callback should be called in order to allow the bot to re-queue the request. This, however, could cause trouble if the order of the requests is important. The bot can receive a callback, but AFAIK it cannot remove already queued requests. Also, what happens if no callback has been registered? Should we simply re-queue the request? I don't have a perfect solution at this time, but this is a point that should be considered.

Another possible issue, that PWB can't really do much about, is that one can get a 504 even if the save is successful, making the re-queueing useless. I don't have a good solution for that either, but we could consult with the Wikimedia developers.

...

For pagegenerators, we might be able to win a bit by requesting the (i+1)th page before returning the i-th page (or, for the PreloadingGenerator, by requesting the (i+1)th batch before all pages from the i-th batch have been returned).

This should be especially useful if it can be controlled by the user. Do you have any ideas on how to do this?

I think there were some good ideas brought up on this bug. Should we start a thread on the mailing list so we can gather more input on this?

-- You are receiving this mail because: You are the assignee for the bug.

bugzilla-daemon＠wikimedia.org

26 Oct 26 Oct

7:03 p.m.

New subject: [Bug 55889] Improve support for asynchronous requests (saving/preloading pages)

https://bugzilla.wikimedia.org/show_bug.cgi?id=55889

Merlijn van Deen valhallasw@arctus.nl changed:

-- You are receiving this mail because: You are the assignee for the bug.

bugzilla-daemon＠wikimedia.org

16 Apr 16 Apr

4:56 a.m.

New subject: [Bug 55889] Improve support for asynchronous requests (saving/preloading pages)

https://bugzilla.wikimedia.org/show_bug.cgi?id=55889

Ricordisamoa ricordisamoa@live.it changed:

What |Removed |Added ---------------------------------------------------------------------------- Depends on| |55220

-- You are receiving this mail because: You are the assignee for the bug.

bugzilla-daemon＠wikimedia.org

13 Nov 13 Nov

4:55 p.m.

New subject: [Bug 55889] Improve support for asynchronous requests (saving/preloading pages)

https://bugzilla.wikimedia.org/show_bug.cgi?id=55889

--- Comment #5 from Gerrit Notification Bot gerritadmin@wikimedia.org --- Change 172023 had a related patch set uploaded by John Vandenberg: Asynchronous HTTP requests

https://gerrit.wikimedia.org/r/172023

-- You are receiving this mail because: You are the assignee for the bug.