Hi everyone,
My bot is fully functional. It does a number of things in a loop and sleeps between each item in the loop, then at the end of the loop. Yet, my bot is single-threaded. If it is desirable that some of these items occur more frequently than others, then it seems appropriate to break the task up into multiple threads. Yet, there are a few problems:
- Editing Wikipedia requires a login, edit token, and cookies, which must be shared among the threads if multiple threads are able to edit. - If the threads do not coordinate their accesses to Wikipedia, the bot may communicate with Wikipedia in bursts, which does not play nicely. - One can get logged out at any time, requiring that the thread login again. - The fragile nature of the Internet means that all kinds of spurious errors are bound to occur.
Does anyone have any ideas about how to implement multithreading in a Wikipedia bot in a robust way?
Richard
Hello,
2011/3/18 richardcavell@mail.com:
Does anyone have any ideas about how to implement multithreading in a Wikipedia bot in a robust way?
Do you absolutely need to have several threads communicating with Wikipedia?
My first simple idea here would involve a single thread handling the communication, and as many worker threads as needed for processing. When a worker is done with its task, it exchanges with the master/communication guru thread: give previous payload's results, get next payload. Because only a single thread communicates with Wikipedia, the tokens/rate control/authentication/error handling/etc... is simpler to handle.
Would that work for you? I think that the "best" design really depends on the kind of tasks you are working for.
Regards,
richardcavell@mail.com wrote:
Hi everyone,
My bot is fully functional. It does a number of things in a loop and sleeps between each item in the loop, then at the end of the loop. Yet, my bot is single-threaded. If it is desirable that some of these items occur more frequently than others, then it seems appropriate to break the task up into multiple threads. Yet, there are a few problems:
Or that you reorder it.
If you have check_spam(); sleep(); fix_typos(); sleep(); welcome_users(); sleep();
And check_spam() should be done more frequently, the easiest way seem to be doing:
check_spam(); sleep(); fix_typos(); sleep(); check_spam(); sleep(); welcome_users(); sleep();
Virtualize it. Give every thread a scheduling mechanism. When activated, let any thread do nothing but writing his job into a queue. Let a master thread work himself through the queue.
-- Johannes Ponader 0162/94 64 94 0
Am 19.03.2011 um 22:52 schrieb Platonides platonides@gmail.com:
richardcavell@mail.com wrote:
Hi everyone,
My bot is fully functional. It does a number of things in a loop and sleeps between each item in the loop, then at the end of the loop. Yet, my bot is single-threaded. If it is desirable that some of these items occur more frequently than others, then it seems appropriate to break the task up into multiple threads. Yet, there are a few problems:
Or that you reorder it.
If you have check_spam(); sleep(); fix_typos(); sleep(); welcome_users(); sleep();
And check_spam() should be done more frequently, the easiest way seem to be doing:
check_spam(); sleep(); fix_typos(); sleep(); check_spam(); sleep(); welcome_users(); sleep();
Wikibots-l mailing list Wikibots-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibots-l
Virtualize it. Give every thread a scheduling mechanism. When
activated, let any thread do nothing but writing his job > into a queue. Let a master thread work himself through the queue.
Yeah, I think that's the way to do it. You have to have at least one thread, and that thread becomes the 'main' thread, which carries the cookies, the edit token and the right to edit Wikipedia. Its Wikipedia access rate can be throttled as necessary. Any other threads are just sensory, and pass their data to a job queue for the main thread.
Richard
Just to make it clear: I assume we are discussing multitasking, not multithreading. And so you have to decide the classic multitasking decisions.
-- Johannes Ponader 0162/94 64 94 0
Am 22.03.2011 um 15:06 schrieb richardcavell@mail.com:
Virtualize it. Give every thread a scheduling mechanism. When
activated, let any thread do nothing but writing his job > into a queue. Let a master thread work himself through the queue.
Yeah, I think that's the way to do it. You have to have at least one thread, and that thread becomes the 'main' thread, which carries the cookies, the edit token and the right to edit Wikipedia. Its Wikipedia access rate can be throttled as necessary. Any other threads are just sensory, and pass their data to a job queue for the main thread.
Richard
Wikibots-l mailing list Wikibots-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibots-l
Forget previous post. Wikipedia made me smarter ;-)
I'd now just name it a scheduling problem: how will the main thread decide which tasks to perform in which order, and how long to sleep between.
From simple to elegant (and from hacky to easily expandable/reframeable):
• You might hard-code it, as you already proposed, • you might create a job list (including sleep-times) statically and let the bot loop through it, • or you might code a scheduling mechanism that creates the job list/job order dynamically.
-- Johannes Ponader 0162/94 64 94 0
Am 22.03.2011 um 15:06 schrieb richardcavell@mail.com:
Virtualize it. Give every thread a scheduling mechanism. When
activated, let any thread do nothing but writing his job > into a queue. Let a master thread work himself through the queue.
Yeah, I think that's the way to do it. You have to have at least one thread, and that thread becomes the 'main' thread, which carries the cookies, the edit token and the right to edit Wikipedia. Its Wikipedia access rate can be throttled as necessary. Any other threads are just sensory, and pass their data to a job queue for the main thread.
Richard
Wikibots-l mailing list Wikibots-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibots-l
Johannes Ponader wrote:
Forget previous post. Wikipedia made me smarter ;-)
I'd now just name it a scheduling problem: how will the main thread decide which tasks to perform in which order, and how long to sleep between.
From simple to elegant (and from hacky to easily expandable/reframeable):
• You might hard-code it, as you already proposed, • you might create a job list (including sleep-times) statically and let the bot loop through it, • or you might code a scheduling mechanism that creates the job list/job order dynamically.
-- Johannes Ponader 0162/94 64 94 0
Slightly better than hardcoding the frequency as the number of function calls, get a random number and map to the percentage of time that should be given to each task.
wikibots-l@lists.wikimedia.org