monitoring / control system for bots

List overview All Threads
Download

newer

older

DataValues now a dependency for...

Echo IRC office hours for...

Petr Bena

27 Dec 2012 27 Dec '12

5:13 p.m.

Hi,

Someone once suggested we create a control panel for bots. I think the first step would be to create a page where we could see overview of all bots we are running on projects. If we create some protocol for querying bot status we could create some central monitoring server which would either:

* Query actively each bot for a status (on some address and IP) * Each bot would contact this server delivering the information to it

I would support the second as it's easier to manage - in first case we would need to configure the "master server" with list of bots to query.

The system could be simply set of a daemon written in any language and a php script. Bots would contact the server using php script (they would just pass information whether they are running or having troubles using some POST data) daemon would periodically flag all bots that didn't respond for a certain period as having troubles / needing repair.

Thanks to this we would have overview of all active bots on all projects and their status. What do you think? Is someone interested in working on that.

Show replies by date

Bináris

27 Dec 27 Dec

5:36 p.m.

Hi Peter,

what is the final purpose of this suggestion? What kind of problem do we have that needs to be solved this way? I think the first step to do anything with bots should be to store the bot activity in tables other than recent changes so as to be able to mark them in page histories. This has been requested for years now.

-- Bináris

Petr Bena

5:42 p.m.

It would be first step to solve this: https://bugzilla.wikimedia.org/show_bug.cgi?id=34606

+ it would make it easier for bot operators to keep track of status of their services as well for community to find out why certain service is no longer available. For example if archiving bot crashes, the people would see why archiving doesn't work. Etc.

On Thu, Dec 27, 2012 at 11:36 AM, Bináris wikiposta@gmail.com wrote:

...

Hi Peter,

what is the final purpose of this suggestion? What kind of problem do we have that needs to be solved this way? I think the first step to do anything with bots should be to store the bot activity in tables other than recent changes so as to be able to mark them in page histories. This has been requested for years now.

-- Bináris _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Petr Bena

5:45 p.m.

In addition, we would have a reliable list of bots running on each wiki, far better than what we have for example on english wiki now. In case some bot would be down for a longer time, people would easily find that out and some developer could overtake its task.

On Thu, Dec 27, 2012 at 11:42 AM, Petr Bena benapetr@gmail.com wrote:

...

It would be first step to solve this: https://bugzilla.wikimedia.org/show_bug.cgi?id=34606

it would make it easier for bot operators to keep track of status of

their services as well for community to find out why certain service is no longer available. For example if archiving bot crashes, the people would see why archiving doesn't work. Etc.

On Thu, Dec 27, 2012 at 11:36 AM, Bináris wikiposta@gmail.com wrote:

...
Hi Peter,

what is the final purpose of this suggestion? What kind of problem do we have that needs to be solved this way? I think the first step to do anything with bots should be to store the bot activity in tables other than recent changes so as to be able to mark them in page histories. This has been requested for years now.

-- Bináris _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Bináris

5:49 p.m.

I see. Sorry for having misspelled your name.

2012/12/27 Petr Bena benapetr@gmail.com

...

In addition, we would have a reliable list of bots running on each wiki, far better than what we have for example on english wiki now.

This needs to make compulsory in each wiki's bot policy to use this facility, does it not?

-- Bináris

Petr Bena

5:57 p.m.

For beginning it's definitely not needed to be compulsory. Whether the communities will want to have this function reliable in future if it became a standard or not is up to them. I don't really like idea of enforcing anyone to anything, but it should be recommended at least to each bot developer to use it in case it would work and people would like it.

On Thu, Dec 27, 2012 at 11:49 AM, Bináris wikiposta@gmail.com wrote:

...

I see. Sorry for having misspelled your name.

2012/12/27 Petr Bena benapetr@gmail.com

...
In addition, we would have a reliable list of bots running on each wiki, far better than what we have for example on english wiki now.

This needs to make compulsory in each wiki's bot policy to use this facility, does it not?

-- Bináris _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Petr Bena

5:58 p.m.

It would be kind of same as nagios, just for bots, not servers

On Thu, Dec 27, 2012 at 11:57 AM, Petr Bena benapetr@gmail.com wrote:

...

For beginning it's definitely not needed to be compulsory. Whether the communities will want to have this function reliable in future if it became a standard or not is up to them. I don't really like idea of enforcing anyone to anything, but it should be recommended at least to each bot developer to use it in case it would work and people would like it.

On Thu, Dec 27, 2012 at 11:49 AM, Bináris wikiposta@gmail.com wrote:

...
I see. Sorry for having misspelled your name.

2012/12/27 Petr Bena benapetr@gmail.com

...
In addition, we would have a reliable list of bots running on each wiki, far better than what we have for example on english wiki now.

This needs to make compulsory in each wiki's bot policy to use this facility, does it not?

-- Bináris _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tim Landscheidt

28 Dec 28 Dec

12:22 a.m.

Petr Bena benapetr@gmail.com wrote:

...

It would be kind of same as nagios, just for bots, not servers

...

[...]

Could this then be realized just as Nagios plugins so that we do not build a separate infrastructure for it?

Tim

Lars Aronsson

2 Jan 2 Jan

8:38 a.m.

On 12/27/2012 11:13 AM, Petr Bena wrote:

...

Someone once suggested we create a control panel for bots. I think the first step would be to create a page where we could see overview of all bots we are running on projects.

This assumes that "we" are "running bots" on projects. That might be correct for some bots, but not for all. Many users have a bot account that they use for various purposes at various times. Trying to build them into a control panel is just as unlikely to succeed as trying to schedule regular users. Which articles do you plan to edit next Tuesday, and how?

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Matthew Flaschen

9:29 a.m.

On 01/01/2013 08:38 PM, Lars Aronsson wrote:

...

On 12/27/2012 11:13 AM, Petr Bena wrote:

...
Someone once suggested we create a control panel for bots. I think the first step would be to create a page where we could see overview of all bots we are running on projects.

This assumes that "we" are "running bots" on projects. That might be correct for some bots, but not for all. Many users have a bot account that they use for various purposes at various times. Trying to build them into a control panel is just as unlikely to succeed as trying to schedule regular users. Which articles do you plan to edit next Tuesday, and how?

He may have misspoke on the "we" part. However, for wikis with bot approval processes (e.g. https://en.wikipedia.org/wiki/Wikipedia:Bot_Approvals_Group ), there is tracking on what bots work on (due to the potentially disruptive nature of an active bot on a large wiki).

A bot approval group could certainly encourage people to participate in this dashboard. For the bot writer, all it should take is a HTTP POST to the dashboard every few edits to check in (which could be a simple as "350 edits for task XYZ in the last hour", in appropriate format).

Many bots work on more than one task, so knowing which task(s) they are currently working on in a big-picture view could be quite useful.

Matt Flaschen

Bináris

3:03 p.m.

2013/1/2 Matthew Flaschen mflaschen@wikimedia.org

...

He may have misspoke on the "we" part. However, for wikis with bot approval processes (e.g. https://en.wikipedia.org/wiki/Wikipedia:Bot_Approvals_Group ), there is tracking on what bots work on (due to the potentially disruptive nature of an active bot on a large wiki).

Before generalizing it will be very useful to overview various wikis. According to the interwikis, very few wikis have an explicit bot policy like enwiki and even less have BAG. Definition of a bot may vary, too: in the bot policy of enwiki only automatic processes are called bots, while in other projects the so-called "assisted edits" may also qualify as botwork. Enwiki!=Wikipedia.

...

A bot approval group could certainly encourage people to participate in this dashboard. For the bot writer, all it should take is a HTTP POST to the dashboard every few edits to check in (which could be a simple as "350 edits for task XYZ in the last hour", in appropriate format).

This "all it should take" is not so trivial for everybody, and may require rewriting a plenty of running bots.

...

-- Bináris

Matthew Flaschen

3 Jan 3 Jan

12:07 a.m.

On 01/02/2013 03:03 AM, Bináris wrote:

...

Before generalizing it will be very useful to overview various wikis. According to the interwikis, very few wikis have an explicit bot policy like enwiki and even less have BAG. Definition of a bot may vary, too: in the bot policy of enwiki only automatic processes are called bots, while in other projects the so-called "assisted edits" may also qualify as botwork. Enwiki!=Wikipedia.

I explicitly said "for wikis with bot approval processes", and "a large wiki". Nowhere did I say all wikis or all Wikipedia wikis met either of those criteria.

I am aware of AWB (assisted edits) and the like. I don't think they should fall under this idea.

...

...
A bot approval group could certainly encourage people to participate in this dashboard. For the bot writer, all it should take is a HTTP POST to the dashboard every few edits to check in (which could be a simple as "350 edits for task XYZ in the last hour", in appropriate format).

This "all it should take" is not so trivial for everybody, and may require rewriting a plenty of running bots.

It's not trivial, but it's about the same as a single API call (many bots at least use the API for edits, others do multiple API calls), and probably easier than screen scraping.

Matt Flaschen

Lars Aronsson

2 Jan 2 Jan

8:25 p.m.

On 01/02/2013 03:29 AM, Matthew Flaschen wrote:

...

He may have misspoke on the "we" part. However, for wikis with bot approval processes (e.g. https://en.wikipedia.org/wiki/Wikipedia:Bot_Approvals_Group ), there is tracking on what bots work on (due to the potentially disruptive nature of an active bot on a large wiki).

When you apply for bot status, there is typically some requirement to present an idea for the bot, but once the status is granted, that idea can change without having the bot status removed.

LA2-bot has been used by me since 2007 and has 100 edits or more on 26 different projects, covering everything from ISBN number fixes on Russian Wikipedia, to flag icon templates on Danish Wikipedia, to verb forms on English Wiktionary. The only time my bot status was revoked, was because of inactivity on the Polish Wikipedia.

http://toolserver.org/~quentinv57/tools/sulinfo.php?username=LA2-bot

I see Pywikipediabot and replace.py as just an alternative browser software for some edits. The very widespread idea that a "bot" is something magic with science-fiction powers, and the messages that this software leaves on Recent Changes, make users insist that I apply for bot status when using it (except on Commons, where it has 5,000 edits without bot status), and so I think up some good idea and apply for bot status, which is almost always granted.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Petr Bena

9:53 p.m.

This is not about bot status (+bot) but about its system status (UP / DOWN etc)

On Wed, Jan 2, 2013 at 2:25 PM, Lars Aronsson lars@aronsson.se wrote:

...

On 01/02/2013 03:29 AM, Matthew Flaschen wrote:

...
He may have misspoke on the "we" part. However, for wikis with bot approval processes (e.g. https://en.wikipedia.org/wiki/**Wikipedia:Bot_Approvals_Group https://en.wikipedia.org/wiki/Wikipedia:Bot_Approvals_Group), there is tracking on what bots work on (due to the potentially disruptive nature of an active bot on a large wiki).

When you apply for bot status, there is typically some requirement to present an idea for the bot, but once the status is granted, that idea can change without having the bot status removed.

LA2-bot has been used by me since 2007 and has 100 edits or more on 26 different projects, covering everything from ISBN number fixes on Russian Wikipedia, to flag icon templates on Danish Wikipedia, to verb forms on English Wiktionary. The only time my bot status was revoked, was because of inactivity on the Polish Wikipedia.

http://toolserver.org/~**quentinv57/tools/sulinfo.php?**username=LA2-bot http://toolserver.org/~quentinv57/tools/sulinfo.php?username=LA2-bot

I see Pywikipediabot and replace.py as just an alternative browser software for some edits. The very widespread idea that a "bot" is something magic with science-fiction powers, and the messages that this software leaves on Recent Changes, make users insist that I apply for bot status when using it (except on Commons, where it has 5,000 edits without bot status), and so I think up some good idea and apply for bot status, which is almost always granted.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-l https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Matthew Flaschen

3 Jan 3 Jan

12:11 a.m.

On 01/02/2013 08:25 AM, Lars Aronsson wrote:

...

On 01/02/2013 03:29 AM, Matthew Flaschen wrote:

...
He may have misspoke on the "we" part. However, for wikis with bot approval processes (e.g. https://en.wikipedia.org/wiki/Wikipedia:Bot_Approvals_Group ), there is tracking on what bots work on (due to the potentially disruptive nature of an active bot on a large wiki).

When you apply for bot status, there is typically some requirement to present an idea for the bot, but once the status is granted, that idea can change without having the bot status removed.

Every wiki has a different approach to bots. But for English Wikipedia, that is not how the approval process (https://en.wikipedia.org/wiki/Wikipedia:BOTAPPROVAL) works:

"Small changes, for example to fix problems or improve the operation of a particular task, are unlikely to be an issue, but larger changes should not be implemented without some discussion. Completely new tasks usually require a separate approval request. Bot operators may wish to create a separate bot account for each task."

Matt Flaschen

Federico Leva (Nemo)

1:24 a.m.

Matt, let's be clearer then: what you describe is ok ONLY for en.wiki. ALL the other wikis have a different system. Thanks, Nemo

Matthew Flaschen

1:39 a.m.

On 01/02/2013 01:24 PM, Federico Leva (Nemo) wrote:

...

Matt, let's be clearer then: what you describe is ok ONLY for en.wiki. ALL the other wikis have a different system.

I was replying to Lars, who made the across-the-board statement "once the status is granted, that idea can change without having the bot status removed".

My point was simply that English Wikipedia does not allow bots to do totally new tasks without approval.

Matt Flaschen

Lars Aronsson

4 Jan 4 Jan

11:42 a.m.

On 01/02/2013 06:11 PM, Matthew Flaschen wrote:

...

Every wiki has a different approach to bots. But for English Wikipedia, that is not how the approval process (https://en.wikipedia.org/wiki/Wikipedia:BOTAPPROVAL) works:

"Small changes, for example to fix problems or improve the operation of a particular task, are unlikely to be an issue, but larger changes should not be implemented without some discussion. Completely new tasks usually require a separate approval request. Bot operators may wish to create a separate bot account for each task."

That is what the rules say, but do you have any science to back up that this is also how it works in practice? How many bot accounts are revoked each month because their owners were naughty and used their bots in a different manner from what they applied for? The idea with a bot account, after all, is that nobody bothers to watch your edits in the Recent Changes.

I think you can go forward if you accept that there are some bots that run like a machinery, according to the rules, and other bot accounts that are used like a more advanced browser for a creative and spontaneous user.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

[[w:en:User:Madman]]

12:40 p.m.

On Thu, Jan 3, 2013 at 11:42 PM, Lars Aronsson lars@aronsson.se wrote:

...

That is what the rules say, but do you have any science to back up that this is also how it works in practice? How many bot accounts are revoked each month because their owners were naughty and used their bots in a different manner from what they applied for? The idea with a bot account, after all, is that nobody bothers to watch your edits in the Recent Changes.

That *is* how it works in practice. Bots get blocked for running unapproved tasks. Most contributors may not watch bots' edits in the Recent Changes, but they do notice when their articles are edited. Approved tasks aren't typically revoked, as that usually would be punitive and unnecessary, but it does happen; for an example of all approved tasks for a bot being revoked due to inappropriate and unapproved tasks, please see http://en.wikipedia.org/wiki/Wikipedia_talk:Bot_Approvals_Group/Archive_8#Ku....

...

I think you can go forward if you accept that there are some bots that run like a machinery, according to the rules, and other bot accounts that are used like a more advanced browser for a creative and spontaneous user.

Bots are *not* advanced browsers and they're not treated as such by enwiki's bot policy. That's what AWB (hence the name) and gadgets are for. The BAG has granted some broad approvals in the past, but I think you'll find that's pretty rare these days.

...

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

-madman

Matma Rex

8:21 p.m.

On Fri, 04 Jan 2013 05:42:45 +0100, Lars Aronsson lars@aronsson.se wrote:

...

On 01/02/2013 06:11 PM, Matthew Flaschen wrote:

...
Every wiki has a different approach to bots. But for English Wikipedia, that is not how the approval process (https://en.wikipedia.org/wiki/Wikipedia:BOTAPPROVAL) works:

"Small changes, for example to fix problems or improve the operation of a particular task, are unlikely to be an issue, but larger changes should not be implemented without some discussion. Completely new tasks usually require a separate approval request. Bot operators may wish to create a separate bot account for each task."

That is what the rules say, but do you have any science to back up that this is also how it works in practice? How many bot accounts are revoked each month because their owners were naughty and used their bots in a different manner from what they applied for? The idea with a bot account, after all, is that nobody bothers to watch your edits in the Recent Changes.

I think you can go forward if you accept that there are some bots that run like a machinery, according to the rules, and other bot accounts that are used like a more advanced browser for a creative and spontaneous user.

You are both assuming that there are no other wikis except for the English Wikipedia.

For example, on pl.wiki, there are basically only two kinds of bots: interwiki-only and multipurpose. As long as you're not breaking anything using the bot and not doing anycontroversial changes, if you've gotten the flag, you can do any task you deem necessary. A bot control in this case simply wouldn't work.

Not to mention that I think *most* of the bots n pl.wiki are ran from users' home computers, most often on AWB or a local pywikipedia install, but there are at least three people who use their own libraries, including myself.

And if this is an en.wiki-only matter, this isn't really the right list to discuss it.

Matthew Flaschen

5 Jan 5 Jan

4:03 p.m.

On 01/04/2013 08:21 AM, Matma Rex wrote:

...

You are both assuming that there are no other wikis except for the English Wikipedia.

I'm not assuming that. I explicitly said "Every wiki has a different approach to bots.". I meant it, and I welcome people providing information about other wikis.

...

For example, on pl.wiki, there are basically only two kinds of bots: interwiki-only and multipurpose. As long as you're not breaking anything using the bot and not doing anycontroversial changes, if you've gotten the flag, you can do any task you deem necessary. A bot control in this case simply wouldn't work.

Bots could still tell the dashboard what they're working on, even if they don't need permission to add a new task.

...

And if this is an en.wiki-only matter, this isn't really the right list to discuss it.

It's not. It's an idea that could work for multiple wikis, but we currently just brainstorming.

Matt Flaschen

Matma Rex

8:52 p.m.

On Sat, 05 Jan 2013 10:03:13 +0100, Matthew Flaschen mflaschen@wikimedia.org wrote:

...

...
For example, on pl.wiki, there are basically only two kinds of bots: interwiki-only and multipurpose. As long as you're not breaking anything using the bot and not doing anycontroversial changes, if you've gotten the flag, you can do any task you deem necessary. A bot control in this case simply wouldn't work.

Bots could still tell the dashboard what they're working on, even if they don't need permission to add a new task.

In this case, when you're saying "bots", you actually mean "users", as for one-time runs it would end up being the user's job. This simply seems impractical.

And if we try to make a compromise by making the bots automatically report edit summaries somewhere, then well, what's the improvement over simply looking at recent changes? You could make a summary of last edits by bots using two lines of code and one API call, no need for a "control systems".

Daniel Schwen

9:48 p.m.

What we rather need is monitoring for the instances. My bots have not been the problem, so far the source for unreliable bot operation has been the underlying infrastructure. Be it the moving of home dirs and the read-only fs or overloaded instances. On Jan 5, 2013 6:52 AM, "Matma Rex" matma.rex@gmail.com wrote:

...

On Sat, 05 Jan 2013 10:03:13 +0100, Matthew Flaschen < mflaschen@wikimedia.org> wrote:

...
For example, on pl.wiki, there are basically only two kinds of bots:

...
interwiki-only and multipurpose. As long as you're not breaking anything using the bot and not doing anycontroversial changes, if you've gotten the flag, you can do any task you deem necessary. A bot control in this case simply wouldn't work.

Bots could still tell the dashboard what they're working on, even if they don't need permission to add a new task.

In this case, when you're saying "bots", you actually mean "users", as for one-time runs it would end up being the user's job. This simply seems impractical.

And if we try to make a compromise by making the bots automatically report edit summaries somewhere, then well, what's the improvement over simply looking at recent changes? You could make a summary of last edits by bots using two lines of code and one API call, no need for a "control systems".

______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-l https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Petr Bena

9:54 p.m.

We already have that http://nagios.wmflabs.org/nagios3/

On Sat, Jan 5, 2013 at 3:48 PM, Daniel Schwen lists@schwen.de wrote:

...

What we rather need is monitoring for the instances. My bots have not been the problem, so far the source for unreliable bot operation has been the underlying infrastructure. Be it the moving of home dirs and the read-only fs or overloaded instances. On Jan 5, 2013 6:52 AM, "Matma Rex" matma.rex@gmail.com wrote:

...
On Sat, 05 Jan 2013 10:03:13 +0100, Matthew Flaschen < mflaschen@wikimedia.org> wrote:

...
For example, on pl.wiki, there are basically only two kinds of bots:

...
interwiki-only and multipurpose. As long as you're not breaking

anything

...
...
...
using the bot and not doing anycontroversial changes, if you've gotten the flag, you can do any task you deem necessary. A bot control in this case simply wouldn't work.

Bots could still tell the dashboard what they're working on, even if they don't need permission to add a new task.

In this case, when you're saying "bots", you actually mean "users", as

for

...
one-time runs it would end up being the user's job. This simply seems impractical.

And if we try to make a compromise by making the bots automatically

report

...
edit summaries somewhere, then well, what's the improvement over simply looking at recent changes? You could make a summary of last edits by bots using two lines of code and one API call, no need for a "control

systems".

...
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<

https://lists.wikimedia.org/mailman/listinfo/wikitech-l%3E

...

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Ryan Lane

7 Jan 7 Jan

5:31 a.m.

On Sat, Jan 5, 2013 at 6:48 AM, Daniel Schwen lists@schwen.de wrote:

...

What we rather need is monitoring for the instances. My bots have not been the problem, so far the source for unreliable bot operation has been the underlying infrastructure. Be it the moving of home dirs and the read-only fs or overloaded instances.

Are you subscribed to the labs-l list? We sent out a warning about the home directory change and about what you'd need to do once we made it. We don't like to reboot people's instances for them, so we left instances running with a read-only home directory until they were ready to reboot.

If there are overloaded instances, create more and have less bots running on a single instance. If your bot uses a lot of resources, then it can have its own instance.

- Ryan

4337

Age (days ago)

4347

Last active (days ago)

wikitech-l@lists.wikimedia.org

24 comments

10 participants

tags (0)

participants (10)

[[w:en:User:Madman]]
Bináris
Daniel Schwen
Federico Leva (Nemo)
Lars Aronsson
Matma Rex
Matthew Flaschen
Petr Bena
Ryan Lane
Tim Landscheidt