On 18/07/06, Martin Jambon martin_jambon@emailuser.net wrote:
On Tue, 18 Jul 2006, Rotem Liss wrote:
Andy Roberts wrote:
Here's another pattern though -
I suffer roughly weekly from a bot which doesn't add any links, it just edits several existing pages and adds a line consisting of about a dozen random digits eg:
300142760257
But different every time. That kind of behaviour, combined with changing IP nos and delays of a few minutes between edits seems to be theoretically impossible to defend against, as well as pointless.
I'm loathe to force login to edit, because the number of genuine contributions does drop a little when I resort to that.
If you want to block big numbers without commas, you can use: \d{7,} ? this should block every number above 999,999 which contains no commas, although I haven't checked it. You can tweak the minimal number of digits to block by editing the number (\d{6,} will block all the numbers above 99,999, and \d{8,} will block all the numbers above 9,999,999, etc.).
I came up with this pattern:
(?<!\S)\d{12,}(?!\S)
which means "at least 12 digits which are not preceded or followed by non-whitespace characters". The idea is to allow URLs which contain long sequences of digits.
Thanks Martin.
I like the way you were able to translate your RegEx and rationale into logical English.