On Sat, May 10, 2008 at 10:20 PM, nad@svn.wikimedia.org wrote:
In MySQL the standard way of inserting data into rows exhibiting AUTOINCREMENT columns is simply to use a ''NULL'' value which will be ignored. In MSSQL however assigning a ''NULL'' to an ''IDENTITY'' column is not allowed, instead the best way is not to include those items in the list of columns to be updated at all.
To get round this in the MediaWiki MSSQL layer, I've modified the insert wrapper in the ''DatabaseMssql'' class to check if the primary key is used in the insert and remove it if so. It checks this by assuming that the primary key will be of the same name as the table but with ''_id'' on the end, and that it will the first item in the list of columns to update.
It would be best to raise some kind of warning when this condition occurs, preferably in the MySQL code path too, so that developers using MySQL with notices enabled can notice and fix it. Just not specifying the column to begin with works fine with MySQL too, so it should be used if it's more compatible.
MySQL implicitly casts NULL assignments to NOT NULL columns to an empty string or zero value accordingly, but MSSQL raises an error instead. This is a big problem within the MediaWiki environment because the code relies heavily on this implicit NULL casting. I've tried to get round the problem by replacing NULL's with empty strings from update and insert queries, and MSSQL is happy to cast the empty string to a numeric zero if necessary.
How does PostgreSQL handle this? If you know what type the column is, you can cast it in the application logic. Again, it would seem better to implement this checking in the cross-database code and raise warnings if it comes up so it can be fixed on a case-by-case basis. This doesn't work with MySQL strict mode either, which is the default under some circumstances last I heard.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Simetrical wrote:
On Sat, May 10, 2008 at 10:20 PM, nad@svn.wikimedia.org wrote:
In MySQL the standard way of inserting data into rows exhibiting AUTOINCREMENT columns is simply to use a ''NULL'' value which will be ignored. In MSSQL however assigning a ''NULL'' to an ''IDENTITY'' column is not allowed, instead the best way is not to include those items in the list of columns to be updated at all.
To get round this in the MediaWiki MSSQL layer, I've modified the insert wrapper in the ''DatabaseMssql'' class to check if the primary key is used in the insert and remove it if so. It checks this by assuming that the primary key will be of the same name as the table but with ''_id'' on the end, and that it will the first item in the list of columns to update.
It would be best to raise some kind of warning when this condition occurs, preferably in the MySQL code path too, so that developers using MySQL with notices enabled can notice and fix it. Just not specifying the column to begin with works fine with MySQL too, so it should be used if it's more compatible.
The complication here is that for PostgreSQL, we *must* insert a value, since it doesn't have autoincrement columns. This is why we jump through hoops to use $dbw->nextSequenceValue() stuff...
Rather than making assumptions about field names (which sounds like baaaad mojo that can break in many exciting ways), it may be more reliable to use a special value here.
Using a type-safe value (say, returning an object of a particular class from nextSequenceValue()) could allow the database class to drop the column from the query in a safe way.
MySQL implicitly casts NULL assignments to NOT NULL columns to an empty string or zero value accordingly, but MSSQL raises an error instead. This is a big problem within the MediaWiki environment because the code relies heavily on this implicit NULL casting. I've tried to get round the problem by replacing NULL's with empty strings from update and insert queries, and MSSQL is happy to cast the empty string to a numeric zero if necessary.
How does PostgreSQL handle this? If you know what type the column is, you can cast it in the application logic. Again, it would seem better to implement this checking in the cross-database code and raise warnings if it comes up so it can be fixed on a case-by-case basis. This doesn't work with MySQL strict mode either, which is the default under some circumstances last I heard.
This is simply a case where if it's going on, it's a bug in the code. Don't "work around" it in the database class, just fix it! :)
- -- brion
On Mon, May 12, 2008 at 12:33 PM, Brion Vibber brion@wikimedia.org wrote:
Rather than making assumptions about field names (which sounds like baaaad mojo that can break in many exciting ways), it may be more reliable to use a special value here.
Using a type-safe value (say, returning an object of a particular class from nextSequenceValue()) could allow the database class to drop the column from the query in a safe way.
Yeah, River pointed this out to me on IRC, I was going to say. Defining class MSSQLAutoincrement {} and then returning that with DatabaseMSSQL::nextSequenceValue() and checking for it in the insert logic (and dropping that part of the insert) would be the way to go here.
How does PostgreSQL handle this? If you know what type the column is, you can cast it in the application logic. Again, it would seem better to implement this checking in the cross-database code and raise warnings if it comes up so it can be fixed on a case-by-case basis. This doesn't work with MySQL strict mode either, which is the default under some circumstances last I heard.
This is simply a case where if it's going on, it's a bug in the code. Don't "work around" it in the database class, just fix it! :)
Yup.
If the queries which put NULL into AUTOINCREMENT NOT NULL columns are changed to insert a zero instead, then MSSQL can use its IDENTITY(1,1) without requiring nextSequenceValue().
Simetrical wrote:
On Mon, May 12, 2008 at 12:33 PM, Brion Vibber brion@wikimedia.org wrote:
Rather than making assumptions about field names (which sounds like baaaad mojo that can break in many exciting ways), it may be more reliable to use a special value here.
Using a type-safe value (say, returning an object of a particular class from nextSequenceValue()) could allow the database class to drop the column from the query in a safe way.
Yeah, River pointed this out to me on IRC, I was going to say. Defining class MSSQLAutoincrement {} and then returning that with DatabaseMSSQL::nextSequenceValue() and checking for it in the insert logic (and dropping that part of the insert) would be the way to go here.
How does PostgreSQL handle this? If you know what type the column is, you can cast it in the application logic. Again, it would seem better to implement this checking in the cross-database code and raise warnings if it comes up so it can be fixed on a case-by-case basis. This doesn't work with MySQL strict mode either, which is the default under some circumstances last I heard.
This is simply a case where if it's going on, it's a bug in the code. Don't "work around" it in the database class, just fix it! :)
Yup.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Mon, May 12, 2008 at 7:47 PM, Aran aran@organicdesign.co.nz wrote:
If the queries which put NULL into AUTOINCREMENT NOT NULL columns are changed to insert a zero instead, then MSSQL can use its IDENTITY(1,1) without requiring nextSequenceValue().
And MySQL will insert a literal zero, which will on the first attempt insert a bogus ID, and on subsequent attempts either fail due to duplicate ID or overwrite the existing entry (depending on INSERT vs. REPLACE). In other words, everything would explode. :) nextSequenceValue() is needed here, databases just aren't consistent enough.
wikitech-l@lists.wikimedia.org