Hi everyone,
Nightshade was a bit slow so I typed "top -c". I was amazed to see that almost all the top processes seem to be interwiki related (interwiki.py). Same seems to be the case at willow. Normally I wouldn't really care, we have the servers so we should use them, but now the login servers seem to be overloaded. Isn't this a bit too many interwiki bots?
Maarten
In my opinion, the problem is not how many bots are working, but that interwiki.py seems to use to overuse memory space. For example, my interwiki.py was running, it was spending 982 megabyte and it was killed. You may find a verbose log of its work at http://toolserver.org/~nickanc/interwiki.log .
Nickanc
2012/1/14 Maarten Dammers maarten@mdammers.nl:
Hi everyone,
Nightshade was a bit slow so I typed "top -c". I was amazed to see that almost all the top processes seem to be interwiki related (interwiki.py). Same seems to be the case at willow. Normally I wouldn't really care, we have the servers so we should use them, but now the login servers seem to be overloaded. Isn't this a bit too many interwiki bots?
Maarten
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On 15 January 2012 00:32, Nickanc Wikipedia nickanc.wiki@gmail.com wrote:
In my opinion, the problem is not how many bots are working, but that interwiki.py seems to use to overuse memory space. For example, my interwiki.py was running, it was spending 982 megabyte and it was killed. You may find a verbose log of its work at http://toolserver.org/~nickanc/interwiki.log .
No, that is not the problem. Multichill was referring to CPU usage, not
memory usage. And although interwiki.py in general is using a large amount of memory, your specific case has a different origin (being the use of the ReferringPageGenerator, which results in a memory *leak*).
You may be able to partially mitigate your problem by using interwiki_contents_on_disk = True, but this will not solve the actual memory leak - it will only release memory used by the page contents.
If you'd like to discuss details on your problem, please mail to the pywikipedia mailing list pywikipedia-l@lists.wikimedia.org or visit on IRC.
Merlijn
There has been some similar discussion in the past, on how to deal with interwiki bots. http://lists.wikimedia.org/pipermail/toolserver-l/2010-November/003660.html http://lists.wikimedia.org/pipermail/toolserver-l/2010-December/003698.html http://lists.wikimedia.org/pipermail/toolserver-l/2011-January/003847.html
- Chris
Hello, At Sunday 15 January 2012 17:13:26 DaB. wrote:
Isn't this a bit too many interwiki bots?
yes, there are, although not the cpu-load is the problem but the memory-usage. The best solution would be if the mediawiki-devs finaly get rid of interwiki- links in the article-text of course, but I have the fealing thta will not happen soon. The second best solution would be, if the interwiki.py would fix their code, but there I have also the fealing that will take some time.
So here is my plan to fix the problem on our (the TS) side: 1.) I create a MMP called interwiki-bot (or something). 2.) YOU (the ts-users) choose (by election, by appointing, by playing "Trip to Jerusalem", I don't care) 5 of you who will become member of that MMP until 15th February. Only rule: 1 of the 5 has to be an active user of a non- wikipedia-project (like wikisource or wiktionary or so). 3.) The members of the MMP create a wikimedia-project-account (like "ts- interwikibot" or something) and request global-bot-status until 1. April. 4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP.
Any problems with my plan?
Sincerly, DaB.
As an interwiki bot runner myself, I find this plan a little too constrained. Many of us run the interwiki bots on many different configuration and with this MMP project created, some configurations that we use would then not be available with this new plan. I can't think of much, but seeing from top -c, I can tell that the other bot runners run their bots differently from mine. Personally, I rather we wait for the Pywikipedia devs to fix that script, install more memory for interwiki bots, or create another custom login server just for running interwiki bots. Your plan is generally okay, just about having only 5 people to run this project, from many many bot operators, its quite hard to choose. Its best if people don't run multiple interwiki bots for one project (especially enwiktionary, which has an overload of interwiki bots).
Regards, Hydriz
From: WP@daniel.baur4.info To: toolserver-l@lists.wikimedia.org Date: Sun, 15 Jan 2012 17:38:40 +0100 Subject: Re: [Toolserver-l] interwiki.py
Hello, At Sunday 15 January 2012 17:13:26 DaB. wrote:
Isn't this a bit too many interwiki bots?
yes, there are, although not the cpu-load is the problem but the memory-usage. The best solution would be if the mediawiki-devs finaly get rid of interwiki- links in the article-text of course, but I have the fealing thta will not happen soon. The second best solution would be, if the interwiki.py would fix their code, but there I have also the fealing that will take some time.
So here is my plan to fix the problem on our (the TS) side: 1.) I create a MMP called interwiki-bot (or something). 2.) YOU (the ts-users) choose (by election, by appointing, by playing "Trip to Jerusalem", I don't care) 5 of you who will become member of that MMP until 15th February. Only rule: 1 of the 5 has to be an active user of a non- wikipedia-project (like wikisource or wiktionary or so). 3.) The members of the MMP create a wikimedia-project-account (like "ts- interwikibot" or something) and request global-bot-status until 1. April. 4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP.
Any problems with my plan?
Sincerly, DaB.
Perhaps you should look at why people are running with different settings then standardize. Interwiki bots should all be doing the roughly the same job, shouldn't they? so whats with the different settings?
Sometimes these bot operators might just want to run the bot on a few pages, or some special settings that I don't know of. But still entirely blocking people from running interwiki bots is quite ridiculous.
Regards, Hydriz
From: p858snake@gmail.com Date: Mon, 16 Jan 2012 18:09:12 +1000 To: toolserver-l@lists.wikimedia.org Subject: Re: [Toolserver-l] interwiki.py
Perhaps you should look at why people are running with different settings then standardize. Interwiki bots should all be doing the roughly the same job, shouldn't they? so whats with the different settings?
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
I think that the most important diffenrence is the "home wiki" for the launch.
2012/1/16 K. Peachey p858snake@gmail.com
Perhaps you should look at why people are running with different settings then standardize. Interwiki bots should all be doing the roughly the same job, shouldn't they? so whats with the different settings?
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
2012/1/16 Hydriz Wikipedia admin@wikisorg.tk:
Personally, I rather we wait for the Pywikipedia devs to fix that script,
This is not going to happen anytime soon. Considering the state of the code base (two hundred exceptions for three hunderd wikis, long functions and no automated testing - and thus practically untestable), and the state of the InterLanguage extension ('will be installed soon'), so-one is really willing to invest a lot of time in tracking memory usage and reducing it.
The only reasonable action we can take to reduce the memory consumption is to let the OS do its job in freeing memory: using one process to track pages that have to be corrected (using the database, if possible), and one process to do the actual fixing (interwiki.py). This should be reasonably easy to implement (i.e. use a pywikibot page generator to generate a list of pages, use a database layer to track interlanguage links and popen('interwiki.py <page>') if this is a fixable situation)
Best, Merlijn
Yes, so probably our issues here are the lack of coordination of bot owners and memory usage issue. Shouldn't we write some simple script which can automatically remove old memory used by the interwiki script? DaB's idea was okay for me, just that one of the points was that no one else can run the interwiki script anymore, which is ridiculous to me. Maybe the MMP can be used to ensure that there is no overlapping bots? All interwiki bot owners should join this project, check an available wiki that no one has taken up and start asking for clearance to run their own interwiki bot there.
Regards, Hydriz
From: valhallasw@arctus.nl Date: Mon, 16 Jan 2012 09:19:19 +0100 To: toolserver-l@lists.wikimedia.org Subject: Re: [Toolserver-l] interwiki.py
2012/1/16 Hydriz Wikipedia admin@wikisorg.tk:
Personally, I rather we wait for the Pywikipedia devs to fix that script,
This is not going to happen anytime soon. Considering the state of the code base (two hundred exceptions for three hunderd wikis, long functions and no automated testing - and thus practically untestable), and the state of the InterLanguage extension ('will be installed soon'), so-one is really willing to invest a lot of time in tracking memory usage and reducing it.
The only reasonable action we can take to reduce the memory consumption is to let the OS do its job in freeing memory: using one process to track pages that have to be corrected (using the database, if possible), and one process to do the actual fixing (interwiki.py). This should be reasonably easy to implement (i.e. use a pywikibot page generator to generate a list of pages, use a database layer to track interlanguage links and popen('interwiki.py <page>') if this is a fixable situation)
Best, Merlijn
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
2012/1/16 Hydriz Wikipedia admin@wikisorg.tk:
Yes, so probably our issues here are the lack of coordination of bot owners and memory usage issue. Shouldn't we write some simple script which can automatically remove old memory used by the interwiki script? DaB's idea was okay for me, just that one of the points was that no one else can run the interwiki script anymore, which is ridiculous to me.
Maybe the MMP can be used to ensure that there is no overlapping bots? All interwiki bot owners should join this project, check an available wiki that no one has taken up and start asking for clearance to run their own interwiki bot there.
Even bots on different wikis will have a large overlap. Perhaps we should restrict not-'selected' interwiki bots to running with -back set (for autonomous runs on the main namespace of Wikipedia, because I think that that's where most bots are active)?
Merlijn van Deen valhallasw@arctus.nl wrote:
Personally, I rather we wait for the Pywikipedia devs to fix that script,
This is not going to happen anytime soon. Considering the state of the code base (two hundred exceptions for three hunderd wikis, long functions and no automated testing - and thus practically untestable), and the state of the InterLanguage extension ('will be installed soon'), so-one is really willing to invest a lot of time in tracking memory usage and reducing it.
The only reasonable action we can take to reduce the memory consumption is to let the OS do its job in freeing memory: using one process to track pages that have to be corrected (using the database, if possible), and one process to do the actual fixing (interwiki.py). This should be reasonably easy to implement (i.e. use a pywikibot page generator to generate a list of pages, use a database layer to track interlanguage links and popen('interwiki.py <page>') if this is a fixable situation)
We could also move the pressure: Labs' bot running infra- structure doesn't seem to be /that/ far from opening. If interwiki bots were running there, it would allow the foun- dation to judge whether pushing for the deployment of Inter- Language isn't worth it in the end.
Meanwhile I think DaB.'s proposal is very adequate.
Tim
On Tue, Jan 17, 2012 at 3:02 AM, Tim Landscheidt tim@tim-landscheidt.de wrote:
We could also move the pressure: Labs' bot running infra- structure doesn't seem to be /that/ far from opening. If interwiki bots were running there, it would allow the foun- dation to judge whether pushing for the deployment of Inter- Language isn't worth it in the end.
Labs isn't a fix all solution for situations like these, Since the issue is interwiki,py has memory management problems amongst others apparently I would be guessing ryan would be hesitant to have it running that labs platform even though labs is designed to do more more "virtual containers" than a shared system like how the toolserver operates unless those issues were resolved.
On 01/16/2012 01:39 PM, K. Peachey wrote:
On Tue, Jan 17, 2012 at 3:02 AM, Tim Landscheidt tim@tim-landscheidt.de wrote:
We could also move the pressure: Labs' bot running infra- structure doesn't seem to be /that/ far from opening. If interwiki bots were running there, it would allow the foun- dation to judge whether pushing for the deployment of Inter- Language isn't worth it in the end.
Labs isn't a fix all solution for situations like these, Since the issue is interwiki,py has memory management problems amongst others apparently I would be guessing ryan would be hesitant to have it running that labs platform even though labs is designed to do more more "virtual containers" than a shared system like how the toolserver operates unless those issues were resolved.
Since "is this something labs could do?" has come up, please feel free to add features and functionality you'd like in Labs at
https://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_wanted
Hello, At Monday 16 January 2012 13:41:28 DaB. wrote:
As an interwiki bot runner myself, I find this plan a little too constrained.
It is a little bit drastic, yes but it will work. There were some other ideas in the past (see Chris Grant's mail), but they didn't work at the end. The new plan has the following advantages: - YOU (the ts-users) make the rules and decide who should be in the MMP (I never said BTW that the people in the MMP should make the rules), -The roots can contact the group quite easily instead of speaking to douzends of users, -The Wikimedia-Project-People (Wikipedians, Wikisourclers, etc.) have only 1 contact-adress too, -The cases of "bot a removes a link and bot b put it in again 5 minutes later" will reduce very much.
Many of us run the interwiki bots on many different configuration and with this MMP project created, some configurations that we use would then not be available with this new plan. I can't think of much, but seeing from top -c, I can tell that the other bot runners run their bots differently from mine.
Like Hercule said, that should be the homewiki for most times; and it should be no problem of the MMP-people to switch the homewiki now and then (e.g. if they run 5 instances of their bot and change the homewiki ever hour, then every project is the homewiki every 3 days). The MMP should also only be for interwiki-bots which run permantly; if an user let run a bot because a wiki needs to change 100 interwiki-links on a one- time-base, that's no problem.
Personally, I rather we wait for the Pywikipedia devs to fix that script, install more memory for interwiki bots, or create another custom login server just for running interwiki bots.
Throwing more hardware at a problem doesn't fix the problem at all and like Merlijn wrote already, I doubt that the pywikipedia-devs will fix the problem soon (they know about it for years, and don't seems to care that after some time a simple python-script needs more memory than a java-programm *including* the virtual maschine!).
Your plan is generally okay, just about having only 5 people to run this project, from many many bot operators, its quite hard to choose. Its best if people don't run multiple interwiki bots for one project (especially enwiktionary, which has an overload of interwiki bots).
In theory, 1 user would be enough to run a interwiki-bots (or serveral instances of it) for all wikis. I increased the number to 5 to make sure that there is always somebody to controll the bots.
Regards, Hydriz
Sincerly, DaB.
On 01/16/2012 09:03 AM, Hydriz Wikipedia wrote:
4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP.
Sounds great to me. If I discover that interwiki links between two languages are not updated, I also need a place to report this to the MMP group, rather than starting my own interwiki.py job.
On Sun, Jan 15, 2012 at 5:38 PM, DaB. WP@daniel.baur4.info wrote:
So here is my plan to fix the problem on our (the TS) side:
+1 :)
Sounds great annd I agree with it.
I have only one issue: in spanish projects we don't have approved a global bot rule (wikipedia, wikinews, wiki*). Is any problem with the plan if we don't have this flag status?
Dennis Tobar El 15/01/2012 13:39, "DaB." WP@daniel.baur4.info escribió:
Hello, At Sunday 15 January 2012 17:13:26 DaB. wrote:
Isn't this a bit too many interwiki bots?
yes, there are, although not the cpu-load is the problem but the memory-usage. The best solution would be if the mediawiki-devs finaly get rid of interwiki- links in the article-text of course, but I have the fealing thta will not happen soon. The second best solution would be, if the interwiki.py would fix their code, but there I have also the fealing that will take some time.
So here is my plan to fix the problem on our (the TS) side: 1.) I create a MMP called interwiki-bot (or something). 2.) YOU (the ts-users) choose (by election, by appointing, by playing "Trip to Jerusalem", I don't care) 5 of you who will become member of that MMP until 15th February. Only rule: 1 of the 5 has to be an active user of a non- wikipedia-project (like wikisource or wiktionary or so). 3.) The members of the MMP create a wikimedia-project-account (like "ts- interwikibot" or something) and request global-bot-status until 1. April. 4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP.
Any problems with my plan?
Sincerly, DaB.
-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hello, At Saturday 21 January 2012 15:23:42 DaB. wrote:
I have only one issue: in spanish projects we don't have approved a global bot rule (wikipedia, wikinews, wiki*). Is any problem with the plan if we don't have this flag status?
the new bot (or bots) run by the MMP has of course to respect the local bot- rules of a project. So if a project demands a local approval first, the MMP has to request the approval before running a bot there.
Sincerly, DaB.
Hello all, in nearly 2 months it will not longer be allowed to run most langlink-bots (aka interwiki-bots) on the toolserver for nearly all of you (see my old mail below for details). To make it official I added a new rule 9.4 to our rule-page short time ago and send this mail also to -announced. The new rule forbids the running of any langlink-bot on the toolserver with the following 2 exceptions: *The bot is run by the MMP interwiki-bot (or however it will be called at the end), *The bot runs only for limited time (!=continuous) or testing.
Until now, no building of the needed MMP has taken place AFAIK, so I would recommend to speed that up so there will be a MMP until 1. March.
Sincerly, DaB.
At Wednesday 01 February 2012 17:34:45 DaB. wrote:
Hello,
At Sunday 15 January 2012 17:13:26 DaB. wrote:
Isn't this a bit too many interwiki bots?
yes, there are, although not the cpu-load is the problem but the memory-usage. The best solution would be if the mediawiki-devs finaly get rid of interwiki- links in the article-text of course, but I have the fealing thta will not happen soon. The second best solution would be, if the interwiki.py would fix their code, but there I have also the fealing that will take some time.
So here is my plan to fix the problem on our (the TS) side: 1.) I create a MMP called interwiki-bot (or something). 2.) YOU (the ts-users) choose (by election, by appointing, by playing "Trip to Jerusalem", I don't care) 5 of you who will become member of that MMP until 15th February. Only rule: 1 of the 5 has to be an active user of a non- wikipedia-project (like wikisource or wiktionary or so). 3.) The members of the MMP create a wikimedia-project-account (like "ts- interwikibot" or something) and request global-bot-status until 1. April. 4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP.
Any problems with my plan?
Sincerly, DaB.
Hello, I think this new rule definitely makes sense. A clarification: I sometimes (once in a few months, usually) run interwiki.py on all Wikiquotes with various configurations (sometimes categories, sometimes -same etc.) which are not taken care of by others. These tasks are a few days long, will they be ok? If not it's not a problem for me to run them on my machine as I usually do.
This brings me to another question: not all interwiki bots are the same (although most sadly are), users run different tasks, which can be very long and could be forced to be run under the MMP, but should all be represented in it. Therefore I suggest a modification to the timeline, if you need to speed it up: 1) as soon as someone volunteers for the MMP, the MMP is created and the rule implemented (if after April 2); 2) more people can join if they run different long tasks who would not be allowed on regular accounts (perhaps you can have 2 for standard Wikipedia continuous task), and this incorporates the rule about different projects; 3) each user can run his tasks with his own bot Wikimedia account so that it's clear who's responsible of what, as possibly required by Wikimedia projects' rules (this might be unneded if only -continuous is forbidden for others?).
Inactivity removal and similar things can be sorted out later.
Nemo
toolserver-l@lists.wikimedia.org