[Mediawiki-l] PHP 5.2.x and ini_set( 'pcre.backtrack_limit', '2M' ); in LocalSettings.php

roger at rogerchrisman.com roger at rogerchrisman.com
Wed Nov 10 01:57:15 UTC 2010


Oops, for my better formed regex lose the "|" at the end, like so:

# Matches 2 external links with less than " x.. x.. " between them:
$wgSpamRegex = "http:\/\/(\S+\s+){1,3}\S*http:\/\/\S+"/i;


Unless you have it as part of a larger regex of or "|" parts, which I
actually do, on several lines concatenated like so (with much
#commented out):

$wgSpamRegex = "/".        # "/" is the opening wrapper
#"s-e-x|zoofilia|sexyongpin|grusskarte|geburtstagskarten|animalsex|".
#"sex-with|dogsex|adultchat|adultlive|camsex|sexcam|livesex|sexchat|".
#"chatsex|onlinesex|adultporn|adultvideo|adultweb.|hardcoresex|hardcoreporn|".
#"teenporn|xxxporn|lesbiansex|livegirl|livenude|livesex|livevideo|camgirl|".
#"spycam|voyeursex|casino-online|online-casino|kontaktlinsen|cheapest-phone|".
#"laser-eye|eye-laser|fuelcellmarket|lasikclinic|cragrats|parishilton|".
#"paris-hilton|paris-tape|2large|fuel-dispenser|fueling-dispenser|huojia|".
#"jinxinghj|telematicsone|telematiksone|a-mortgage|diamondabrasives|".
#"reuterbrook|sex-plugin|sex-zone|lazy-stars|eblja|liuhecai|".
#"buy-viagra|-cialis|-levitra|boy-and-girl-kissing|". # Match spammy words
#"dirare\.com|".           # Matches dirare.com a spammer's domain name
"overflow\s*:\s*auto|".   # Matches overflow:auto regardless of whitespace
"height\s*:\s*[0-4]px|".  # Matches height:0px (most CSS hidden spam)
"\<\s*a\s*href|".         # Matches '<a href' links, forcing wiki syntax
#"(http:\/\/(.|\n)*){14}|".   # Matches x number of external links
#"http:\/\/\S*\s*\S*\s*\S*\s*\S*\s*\S*\s*\S*http:\/\/\S*|". # Roger -- bad
#"http:\/\/\S*\s*\S*\s*\S*\s*\S*http:\/\/\S*|". # Roger -- NB
pcre.backtrack_limit
"http:\/\/(\S+\s+){1,3}\S*http:\/\/\S+|". # Roger -- better!
"display\s*:\s*none".     # Matches display:none regardless of whitespace
"/i";                     # "/" ends the regular expression, "i"
case-insensitive
                          # "\s" matches whitespace
                          # "*" is a repeater (zero or more times)
                          # "\s*" means to look for 0 or more amount
of whitespace
                          # "\S*" means to look for 0 or more amount
of non-whitespace


Largely copied from
http://www.mediawiki.org/wiki/Manual:$wgSpamRegex#A_Large_Example

Roger



More information about the MediaWiki-l mailing list