-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
greg@svn.wikimedia.org wrote:
Force inserted bools to be ints, per Tim's suggestion on bug 15148. This should be fine for now as Postgres uses SMALLINTs for BOOLs, to match MySQLs BOOL/TINYINT aliasing and subsequent MW coding assumptions. It's possible this will cause bad effects if inserted values are called in a boolean context when they are going into a non-SMALLINT column, but we can handle those as they come up.
[snip]
function addQuotes( $s ) { if ( is_null( $s ) ) { return 'NULL';
} else if ( is_bool( $s ) ) {
return intval( $s );
It probably wouldn't hurt to do this on general principle in the base class. Some quick background:
PHP's boolean type interpolates to string as:
true -> "1" false -> ""
MySQL's "bool" column type is an alias to "tinyint" (which we use, as "bool" didn't exist in earlier MySQL versions), where you generally store 1 or 0. However if you stick '' in there it gets automatically interpreted as 0... (does this happen in MySQL's strict mode, though?)
PostgreSQL has a "bool" data type also, which seems to accept a variety of values: http://www.postgresql.org/docs/8.1/static/datatype-boolean.html none of which are "". ;)
It's also much whingier about its numeric types, and won't let us insert an empty string, which is what a boolean false interpolates to.
Now, most of the time we explicitly use 1 and 0 when storing boolean values in MediaWiki, like:
'page_is_new' => ($lastRevision === 0) ? 1 : 0, 'page_is_redirect' => $rt !== NULL ? 1 : 0,
But it'd be nice not to break if people forget and use a genuine bool... for that matter it'd be nice to be able to do that as a matter of course. ;)
- -- brion
On Thu, Aug 21, 2008 at 6:13 PM, Brion Vibber brion@wikimedia.org wrote:
Now, most of the time we explicitly use 1 and 0 when storing boolean values in MediaWiki, like:
'page_is_new' => ($lastRevision === 0) ? 1 : 0, 'page_is_redirect' => $rt !== NULL ? 1 : 0,
But it'd be nice not to break if people forget and use a genuine bool... for that matter it'd be nice to be able to do that as a matter of course. ;)
Would it make sense for the database abstraction layer to know our schema, so that it can automatically cast things to bool (or whatever is necessary) before handing them down to the actual database? If we wanted to get hardcore it could also do stuff like handle truncation for limited-length fields, but that's scarier (e.g. if someone manually alters their schema to lengthen a field, they'd have to tell MediaWiki about it).
Of course, if the abstraction layer doesn't know about a field it could fall back to current ignorant behavior.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Aryeh Gregor wrote:
Would it make sense for the database abstraction layer to know our schema, so that it can automatically cast things to bool (or whatever is necessary) before handing them down to the actual database? If we wanted to get hardcore it could also do stuff like handle truncation for limited-length fields, but that's scarier (e.g. if someone manually alters their schema to lengthen a field, they'd have to tell MediaWiki about it).
In my opinion that would be a bit horrifyingly scary. ;) Domain-specific knowledge is better placed in domain-specific classes...
Proper behavior for overlong input can vary. Sometimes we want to reject it outright, since it's invalid -- an overlong page title, username, or group name wouldn't function correctly truncated.
Other times where truncation is appropriate might be edit comments, where we might want a pretty ellipsis... and we might want to truncate part of a comment _inside_ a quoted string, or something fancy like that that's totally unsuitable for the DB layer. :D
Further, we should probably expect that the database may outright reject long fields rather than silently truncating them -- like MySQL in strict mode -- and should ensure that our behavior is sane.
- -- brion
On Thu, Aug 21, 2008 at 7:23 PM, Brion Vibber brion@wikimedia.org wrote:
In my opinion that would be a bit horrifyingly scary. ;) Domain-specific knowledge is better placed in domain-specific classes...
Well, having pgsql and all the other BOOL-supporting DBMSes forced to use integer types just because devs are too lazy to make sure things work properly on all databases is exactly the sort of thing the abstraction layer is meant to prevent. Each database should be able to properly use its own features. Introducing domain-specific knowledge is kind of icky, but I can't think of any other way to do it.
Of course this logic could be properly factored out of Database.php. For instance, we could add another argument to the constructor (or add a mutator, which could allow extensions to join the fun) that would allow you to pass in an array of fieldname => type mappings. Then whatever instantiates the Database object we use could pass that in when it constructs the object.
Proper behavior for overlong input can vary. Sometimes we want to reject it outright, since it's invalid -- an overlong page title, username, or group name wouldn't function correctly truncated.
I suggested truncation because that's what MySQL would normally do, and so it's what we're probably relying on in some cases already. More sensible behavior would be to throw an exception, rather than having it silently sort of work on MySQL but fail elsewhere (including MySQL in strict mode).
Aryeh Gregor schreef:
On Thu, Aug 21, 2008 at 7:23 PM, Brion Vibber brion@wikimedia.org wrote:
In my opinion that would be a bit horrifyingly scary. ;) Domain-specific knowledge is better placed in domain-specific classes...
Well, having pgsql and all the other BOOL-supporting DBMSes forced to use integer types just because devs are too lazy to make sure things work properly on all databases is exactly the sort of thing the abstraction layer is meant to prevent. Each database should be able to properly use its own features. Introducing domain-specific knowledge is kind of icky, but I can't think of any other way to do it.
Of course this logic could be properly factored out of Database.php. For instance, we could add another argument to the constructor (or add a mutator, which could allow extensions to join the fun) that would allow you to pass in an array of fieldname => type mappings. Then whatever instantiates the Database object we use could pass that in when it constructs the object.
Don't all DBMSes interpret 1 as true and 0 as false? Then why don't we just check if something is a bool and convert it to 1 or 0? Problem solved (I think).
Roan Kattouw (Catrope)
On Fri, Aug 22, 2008 at 4:45 AM, Roan Kattouw roan.kattouw@home.nl wrote:
Don't all DBMSes interpret 1 as true and 0 as false? Then why don't we just check if something is a bool and convert it to 1 or 0? Problem solved (I think).
Hmm, yes. That's what Greg just committed, actually, on Brion's advice. So does this mean pgsql can now use BOOLs without having to worry about scary keeping track of types and stuff?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Aryeh Gregor wrote:
On Thu, Aug 21, 2008 at 7:23 PM, Brion Vibber brion@wikimedia.org wrote:
In my opinion that would be a bit horrifyingly scary. ;) Domain-specific knowledge is better placed in domain-specific classes...
Well, having pgsql and all the other BOOL-supporting DBMSes forced to use integer types just because devs are too lazy to make sure things work properly on all databases is exactly the sort of thing the abstraction layer is meant to prevent.
You *do* realize that "1" and "0" are valid values for PostgreSQL's genuine BOOL type, right?
Each database should be able to properly use its own features. Introducing domain-specific knowledge is kind of icky, but I can't think of any other way to do it.
We use type-specific wrappers for formatting timestamp values appropriately, so clearly there is another way to do it. :)
Proper behavior for overlong input can vary. Sometimes we want to reject it outright, since it's invalid -- an overlong page title, username, or group name wouldn't function correctly truncated.
I suggested truncation because that's what MySQL would normally do, and so it's what we're probably relying on in some cases already.
We must not rely on that, since MySQL may be configured in strict mode, which will whine at truncation.
More sensible behavior would be to throw an exception, rather than having it silently sort of work on MySQL but fail elsewhere (including MySQL in strict mode).
Right, that would throw an exception due to the database error.
- -- brion
On Fri, Aug 22, 2008 at 12:22 PM, Brion Vibber brion@wikimedia.org wrote:
You *do* realize that "1" and "0" are valid values for PostgreSQL's genuine BOOL type, right?
Yeah, I'm not very awake right now, I think. So is pgsql going to switch to a proper BOOL type now or what?
We use type-specific wrappers for formatting timestamp values appropriately, so clearly there is another way to do it. :)
Touché. (Although people do periodically forget to use those.)
wikitech-l@lists.wikimedia.org