-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
DanTMan wrote:
Simetrical wrote:
There's no reason to complicate the regex with explicit maximum lengths. If additional characters are allowed in unquoted table names, add those to the [a-z0-9_] class. This should avoid all valid joins and similar. (It still fails on comments, though! :P )
"/^\s*(\S+|`[^`]+`)(.\S+|.`[^`]+`)?\s*$/i"
Oh god, make the madness stop! :)
Parsing here sucks for several reasons:
* actual syntax differs between different DB backends * big ugly regexes are hard to read * it feels like "magic" trying to treat different strings differently based on content, which is always icky
Generally we try to make natural use of different *data types* for different kinds of input here.
Since we're talking about a case where we want to make an *exception* from the standard behavior -- string table names being for internal processing, leading to prefixing and quoting -- we should explicitly mark it as such.
Long ago, I tossed around the idea of using a 'RawSql' or similar data type to tell the query-building functions that yes, we were sure, we really want to pass some raw SQL here -- we know what we're doing, so please don't escape it for us.
This might look like:
$db->select( 'page', array( 'page_namespace', 'page_title' ), array( 'page_id' => new RawSql('RAND()*1000' ) );
or whatever.
For the case of wild & crazy custom joins, it might be:
$db->select( new RawSql("$page LEFT JOIN $barfo ON page_id=barfo_page"), array( 'barfo_key', 'page_namespace', 'page_title' ),
or whatever.
Now, I don't know if this is the best system ever, but I like that it's explicit about the use of unprocessed (and thus potentially unsafe) data, which'll make it easier to spot potential trouble spots when maintaining the code later.
- -- brion vibber (brion @ wikimedia.org)