On 6/20/07, Raimond Spekking raimond.spekking@gmail.com wrote:
Hmmm, it seems, that preg_quote() is doing too much:
with preg_quote which does not work: "/http://+[a-z0-9_-.]*(wiener-gasometer\.at/index\.html|dispatch\.opac\.d-nb.de|wikipedia\.org)/Si"
instead of
with str_replace which works already in SpamBlacklist: "/http://+[a-z0-9_-.]*(wiener-gasometer.at/index.html|dispatch.opac.d-nb.de|wikipedia.org)/Si"
But I am no regex expert, maybe I missed a parameter/point :-(
Just glancing at the code and your results, you probably want to preg_quote() the individual URLs, before you concatenate them with '|'. Make sure to use preg_quote( $url, '/' ) so it escapes the delimiter '/' too. Incidentally, you may want to use a delimiter other than / for URLs, just for prettiness.
So I'd change it something like:
$regexes = ''; - $regexStart = '/http://+[a-z0-9_-.]*('; - $regexEnd = ')/Si'; + $regexStart = '!http://+%5B-a-z0-9_.%5D*('; + $regexEnd = ')!Si'; $regexMax = 4096; $build = false; foreach( $lines as $line ) { // FIXME: not very robust size check, but should work. :) if( $build === false ) { $build = $line; } elseif( strlen( $build ) + strlen( $line ) > $regexMax ) { - $regexes .= $regexStart . - str_replace( '/', '/', preg_replace('|\*/|', '/', $build) ) . - $regexEnd; + $regexes .= $regexStart . $build . $regexEnd; - $build = $line; + $build = preg_quote($line, '!'); } else { - $build .= '|' . $line; + $build .= '|' . preg_quote($line, '!'); } } if( $build !== false ) { - $regexes .= $regexStart . - str_replace( '/', '/', preg_replace('|\*/|', '/', $build) ) . - $regexEnd; + $regexes .= $regexStart . $build . $regexEnd; }
Although I haven't tested that exact code.