[Mediawiki-l] PHP 5.2.x and ini_set( 'pcre.backtrack_limit', '2M' ); in LocalSettings.php
roger at rogerchrisman.com
roger at rogerchrisman.com
Wed Nov 10 01:57:15 UTC 2010
Oops, for my better formed regex lose the "|" at the end, like so:
# Matches 2 external links with less than " x.. x.. " between them:
$wgSpamRegex = "http:\/\/(\S+\s+){1,3}\S*http:\/\/\S+"/i;
Unless you have it as part of a larger regex of or "|" parts, which I
actually do, on several lines concatenated like so (with much
#commented out):
$wgSpamRegex = "/". # "/" is the opening wrapper
#"s-e-x|zoofilia|sexyongpin|grusskarte|geburtstagskarten|animalsex|".
#"sex-with|dogsex|adultchat|adultlive|camsex|sexcam|livesex|sexchat|".
#"chatsex|onlinesex|adultporn|adultvideo|adultweb.|hardcoresex|hardcoreporn|".
#"teenporn|xxxporn|lesbiansex|livegirl|livenude|livesex|livevideo|camgirl|".
#"spycam|voyeursex|casino-online|online-casino|kontaktlinsen|cheapest-phone|".
#"laser-eye|eye-laser|fuelcellmarket|lasikclinic|cragrats|parishilton|".
#"paris-hilton|paris-tape|2large|fuel-dispenser|fueling-dispenser|huojia|".
#"jinxinghj|telematicsone|telematiksone|a-mortgage|diamondabrasives|".
#"reuterbrook|sex-plugin|sex-zone|lazy-stars|eblja|liuhecai|".
#"buy-viagra|-cialis|-levitra|boy-and-girl-kissing|". # Match spammy words
#"dirare\.com|". # Matches dirare.com a spammer's domain name
"overflow\s*:\s*auto|". # Matches overflow:auto regardless of whitespace
"height\s*:\s*[0-4]px|". # Matches height:0px (most CSS hidden spam)
"\<\s*a\s*href|". # Matches '<a href' links, forcing wiki syntax
#"(http:\/\/(.|\n)*){14}|". # Matches x number of external links
#"http:\/\/\S*\s*\S*\s*\S*\s*\S*\s*\S*\s*\S*http:\/\/\S*|". # Roger -- bad
#"http:\/\/\S*\s*\S*\s*\S*\s*\S*http:\/\/\S*|". # Roger -- NB
pcre.backtrack_limit
"http:\/\/(\S+\s+){1,3}\S*http:\/\/\S+|". # Roger -- better!
"display\s*:\s*none". # Matches display:none regardless of whitespace
"/i"; # "/" ends the regular expression, "i"
case-insensitive
# "\s" matches whitespace
# "*" is a repeater (zero or more times)
# "\s*" means to look for 0 or more amount
of whitespace
# "\S*" means to look for 0 or more amount
of non-whitespace
Largely copied from
http://www.mediawiki.org/wiki/Manual:$wgSpamRegex#A_Large_Example
Roger
More information about the MediaWiki-l
mailing list