[Foundation-l] Flagged bots to edit pages containing spam links

List overview All Threads
Download

newer

older

Re: [Foundation-l] privacy policy...

[Foundation-l] data retention...

White Cat

28 Apr 2008 28 Apr '08

5:17 p.m.

https://bugzilla.wikimedia.org/show_bug.cgi?id=13706

Perhaps a community discussion is necessary on the matter, I hereby initiate it.

When a person tries to edit a page that contains a URL matching the spam autoblocker regex, the user is prohibited from making the edit until the spam link is removed. The spam autoblocker was intended to prevent the addition of new spam.

In a scenario where a spambot adds spam links to wikipedia, then later the spam url is added to the spam blacklist, then a user tries to edit a page that already contains spam added before the spam url is added to the spam blacklist. For a human this isn't much of a deal to deal with, it is however a different story when it comes to bots.

Consider you are operating a bot that makes non-controversial routine maintenance edits on a regular basis. The spam autoblocker would prevent such edits. If your bot's task is dealing with images renamed/deleted on commons or if your bots task is dealing with interwiki links this is particularly problematic. Interwiki bots, commons delinking bots often edit hundereds of pages a day on hundereds of wikis. Thats a lot of logs. So the suggestion that I should spend perhaps hours per day reading log files for spam on pages on languages I cannot even understand (or necesarily read the ?'s and %'s) is quite unreasonable. This is a task better dealt with by the locals (humans) of the wiki community rather than bots preforming mindless, routine and non-controversial tasks.

There is also the matter of legitimate reason to include spam on pages such as archived discussion on a spam bot attack where example URLs are used before these make their way to the spam autoblocker.

- White Cat

Show replies by date

Chad

28 Apr 28 Apr

6:34 p.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

Forum shopping for this after the lead developer and CTO has said no is not the way to go about it.

...

From a technical standpoint: I agree with Brion. There are a whole host

of reasons why an edit might fail (locked db's, protected pages, or even the server dying), and the bot needs to be designed to deal with that. If your bot crashes, etc. due to an edit failing: well that's your fault as a developer.

-Chad

On Mon, Apr 28, 2008 at 11:17 AM, White Cat wikipedia.kawaii.neko@gmail.com wrote:

...

https://bugzilla.wikimedia.org/show_bug.cgi?id=13706

Perhaps a community discussion is necessary on the matter, I hereby initiate it.

When a person tries to edit a page that contains a URL matching the spam autoblocker regex, the user is prohibited from making the edit until the spam link is removed. The spam autoblocker was intended to prevent the addition of new spam.

In a scenario where a spambot adds spam links to wikipedia, then later the spam url is added to the spam blacklist, then a user tries to edit a page that already contains spam added before the spam url is added to the spam blacklist. For a human this isn't much of a deal to deal with, it is however a different story when it comes to bots.

Consider you are operating a bot that makes non-controversial routine maintenance edits on a regular basis. The spam autoblocker would prevent such edits. If your bot's task is dealing with images renamed/deleted on commons or if your bots task is dealing with interwiki links this is particularly problematic. Interwiki bots, commons delinking bots often edit hundereds of pages a day on hundereds of wikis. Thats a lot of logs. So the suggestion that I should spend perhaps hours per day reading log files for spam on pages on languages I cannot even understand (or necesarily read the ?'s and %'s) is quite unreasonable. This is a task better dealt with by the locals (humans) of the wiki community rather than bots preforming mindless, routine and non-controversial tasks.

There is also the matter of legitimate reason to include spam on pages such as archived discussion on a spam bot attack where example URLs are used before these make their way to the spam autoblocker.
  - White Cat
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

White Cat

8:58 p.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

I beg your pardon? Forum shopping on foundation-l? Seems self contradictory...

On Mon, Apr 28, 2008 at 7:34 PM, Chad innocentkiller@gmail.com wrote:

...

Forum shopping for this after the lead developer and CTO has said no is not the way to go about it.

From a technical standpoint: I agree with Brion. There are a whole host of reasons why an edit might fail (locked db's, protected pages, or even the server dying), and the bot needs to be designed to deal with that. If your bot crashes, etc. due to an edit failing: well that's your fault as a developer.

-Chad

On Mon, Apr 28, 2008 at 11:17 AM, White Cat wikipedia.kawaii.neko@gmail.com wrote:

...
https://bugzilla.wikimedia.org/show_bug.cgi?id=13706

Perhaps a community discussion is necessary on the matter, I hereby

initiate

...
it.

When a person tries to edit a page that contains a URL matching the

spam

...
autoblocker regex, the user is prohibited from making the edit until

the

...
spam link is removed. The spam autoblocker was intended to prevent the addition of new spam.

In a scenario where a spambot adds spam links to wikipedia, then later

the

...
spam url is added to the spam blacklist, then a user tries to edit a

page

...
that already contains spam added before the spam url is added to the

spam

...
blacklist. For a human this isn't much of a deal to deal with, it is

however

...
a different story when it comes to bots.

Consider you are operating a bot that makes non-controversial routine maintenance edits on a regular basis. The spam autoblocker would

prevent

...
such edits. If your bot's task is dealing with images renamed/deleted

on

...
commons or if your bots task is dealing with interwiki links this is particularly problematic. Interwiki bots, commons delinking bots often

edit

...
hundereds of pages a day on hundereds of wikis. Thats a lot of logs. So

the

...
suggestion that I should spend perhaps hours per day reading log files

for

...
spam on pages on languages I cannot even understand (or necesarily read

the

...
?'s and %'s) is quite unreasonable. This is a task better dealt with by

the

...
locals (humans) of the wiki community rather than bots preforming

mindless,

...
routine and non-controversial tasks.

There is also the matter of legitimate reason to include spam on pages

such

...
as archived discussion on a spam bot attack where example URLs are used before these make their way to the spam autoblocker.
  - White Cat
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Chad

29 Apr 29 Apr

7:11 a.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

Brion already said that this wouldn't be implemented and discussion was over. You now bring it up on Foundation-l. This is known as forum shopping.

Also known as "asking the other parent."

-Chad

On Mon, Apr 28, 2008 at 2:58 PM, White Cat wikipedia.kawaii.neko@gmail.com wrote:

...

I beg your pardon? Forum shopping on foundation-l? Seems self contradictory...

On Mon, Apr 28, 2008 at 7:34 PM, Chad innocentkiller@gmail.com wrote:

...
Forum shopping for this after the lead developer and CTO has said no is not the way to go about it.

From a technical standpoint: I agree with Brion. There are a whole host of reasons why an edit might fail (locked db's, protected pages, or even the server dying), and the bot needs to be designed to deal with that. If your bot crashes, etc. due to an edit failing: well that's your fault as a developer.

-Chad

On Mon, Apr 28, 2008 at 11:17 AM, White Cat wikipedia.kawaii.neko@gmail.com wrote:

...
https://bugzilla.wikimedia.org/show_bug.cgi?id=13706

Perhaps a community discussion is necessary on the matter, I hereby

initiate

...
it.

When a person tries to edit a page that contains a URL matching the

spam

...
autoblocker regex, the user is prohibited from making the edit until

the

...
spam link is removed. The spam autoblocker was intended to prevent the addition of new spam.

In a scenario where a spambot adds spam links to wikipedia, then later

the

...
spam url is added to the spam blacklist, then a user tries to edit a

page

...
that already contains spam added before the spam url is added to the

spam

...
blacklist. For a human this isn't much of a deal to deal with, it is

however

...
a different story when it comes to bots.

Consider you are operating a bot that makes non-controversial routine maintenance edits on a regular basis. The spam autoblocker would

prevent

...
such edits. If your bot's task is dealing with images renamed/deleted

on

...
commons or if your bots task is dealing with interwiki links this is particularly problematic. Interwiki bots, commons delinking bots often

edit

...
hundereds of pages a day on hundereds of wikis. Thats a lot of logs. So

the

...
suggestion that I should spend perhaps hours per day reading log files

for

...
spam on pages on languages I cannot even understand (or necesarily read

the

...
?'s and %'s) is quite unreasonable. This is a task better dealt with by

the

...
locals (humans) of the wiki community rather than bots preforming

mindless,

...
routine and non-controversial tasks.

There is also the matter of legitimate reason to include spam on pages

such

...
as archived discussion on a spam bot attack where example URLs are used before these make their way to the spam autoblocker.
  - White Cat
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Mark Wagner

1 May 1 May

1:37 a.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

On 4/28/08, Chad innocentkiller@gmail.com wrote:

...

From a technical standpoint: I agree with Brion. There are a whole host of reasons why an edit might fail (locked db's, protected pages, or even the server dying), and the bot needs to be designed to deal with that. If your bot crashes, etc. due to an edit failing: well that's your fault as a developer.

It would be nice if flagged bots were exempt from the spamfilter. Spam URLs and protected pages are the situations that my bots can't handle -- for everything else, the bot can either wait or try again.

-- Mark [[en:User:Carnildo]]

Andrew Whitworth

2:56 a.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

On Wed, Apr 30, 2008 at 7:37 PM, Mark Wagner carnildo@gmail.com wrote:

...

On 4/28/08, Chad innocentkiller@gmail.com wrote:

...
From a technical standpoint: I agree with Brion. There are a whole host of reasons why an edit might fail (locked db's, protected pages, or even the server dying), and the bot needs to be designed to deal with that. If your bot crashes, etc. due to an edit failing: well that's your fault as a developer.

It would be nice if flagged bots were exempt from the spamfilter. Spam URLs and protected pages are the situations that my bots can't handle -- for everything else, the bot can either wait or try again.

This is something that I can agree with, if a user is trusted enough to receive the bot flag in the first place (or "trusted not to make spam/vandalism/controversial mass edits"), we shouldn't have to worry about spam filtering them.

--Andrew Whitworth

White Cat

4 May 4 May

8:19 p.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

I am told that devs aren't keen on making an exception. While they (at least Tim Starling) agrees the current method is rather messed up. They were talking about a more permanent solution.

A suggestion was that to make the spam autoblocker only black the edit if a new spam link is being introduced and spam already on the pages do not get affected. This comes at the expense of performance though.

Then there is the matter of the meta spam autoblocker page has started getting very large. Soon it will not be possible to load the page.

- White Cat

On Thu, May 1, 2008 at 3:56 AM, Andrew Whitworth wknight8111@gmail.com wrote:

...

On Wed, Apr 30, 2008 at 7:37 PM, Mark Wagner carnildo@gmail.com wrote:

...
On 4/28/08, Chad innocentkiller@gmail.com wrote:

...
From a technical standpoint: I agree with Brion. There are a whole

host

...
...
of reasons why an edit might fail (locked db's, protected pages, or

even

...
...
the server dying), and the bot needs to be designed to deal with

that. If

...
...
your bot crashes, etc. due to an edit failing: well that's your

fault as a

...
...
developer.

It would be nice if flagged bots were exempt from the spamfilter. Spam URLs and protected pages are the situations that my bots can't handle -- for everything else, the bot can either wait or try again.

This is something that I can agree with, if a user is trusted enough to receive the bot flag in the first place (or "trusted not to make spam/vandalism/controversial mass edits"), we shouldn't have to worry about spam filtering them.

--Andrew Whitworth

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Simetrical

6 May 6 May

1:20 a.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

On Sun, May 4, 2008 at 2:19 PM, White Cat wikipedia.kawaii.neko@gmail.com wrote:

...

A suggestion was that to make the spam autoblocker only black the edit if a new spam link is being introduced and spam already on the pages do not get affected. This comes at the expense of performance though.

No it doesn't, it would just require an extra regex search (on the old text) and some trivial array processing. Not a substantial difference. If you're talking about getting a feature added, anyway, I think you're on the wrong list.

White Cat

8 May 8 May

5:01 p.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

Where am I supposed to propose it? Bugzilla is obviously the wrong address.

- White Cat

On Tue, May 6, 2008 at 2:20 AM, Simetrical <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com> wrote:

...

On Sun, May 4, 2008 at 2:19 PM, White Cat wikipedia.kawaii.neko@gmail.com wrote:

...
A suggestion was that to make the spam autoblocker only black the edit

if a

...
new spam link is being introduced and spam already on the pages do not

get

...
affected. This comes at the expense of performance though.

No it doesn't, it would just require an extra regex search (on the old text) and some trivial array processing. Not a substantial difference. If you're talking about getting a feature added, anyway, I think you're on the wrong list.

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Simetrical

9 May 9 May

6:51 p.m.

New subject: [Foundation-l] Flagged bots to edit pages containing spam links

On Thu, May 8, 2008 at 11:01 AM, White Cat wikipedia.kawaii.neko@gmail.com wrote:

...

Where am I supposed to propose it? Bugzilla is obviously the wrong address.

It's the correct one, as you should know. That some features you've proposed get rejected doesn't mean related features will also get rejected.

6088

Age (days ago)

6099

Last active (days ago)

wikimedia-l@lists.wikimedia.org

9 comments

5 participants

tags (0)

participants (5)

Andrew Whitworth
Chad
Mark Wagner
Simetrical
White Cat