Just some basic comments, I'm sure Brion has more.
leonsp@svn.wikimedia.org schreef:
Revision: 45755 Author: leonsp Date: 2009-01-14 22:20:15 +0000 (Wed, 14 Jan 2009)
Log Message:
(bug 17028) Added support for IBM DB2 database. config/index.php has new interface elements that only show up if PHP has ibm_db2 module enabled. AutoLoader knows about the new DB2 classes. GlobalFunctions has a new constant for DB2 time format. Revision class fixed slightly. Also includes new PHP files containing the Database and Search API implementations for IBM DB2.
[...] Modified: trunk/phase3/includes/Revision.php =================================================================== --- trunk/phase3/includes/Revision.php 2009-01-14 22:15:50 UTC (rev 45754) +++ trunk/phase3/includes/Revision.php 2009-01-14 22:20:15 UTC (rev 45755) @@ -961,6 +961,10 @@ */ static function getTimestampFromId( $title, $id ) { $dbr = wfGetDB( DB_SLAVE );
// Casting fix for DB2
if ($id == '') {
$id = 0;
}
You should probably use intval($id) here.
[...]
Added: trunk/phase3/includes/SearchIBM_DB2.php
--- trunk/phase3/includes/SearchIBM_DB2.php (rev 0) +++ trunk/phase3/includes/SearchIBM_DB2.php 2009-01-14 22:20:15 UTC (rev 45755) @@ -0,0 +1,247 @@ +<?php +# Copyright (C) 2004 Brion Vibber brion@pobox.com
If you wrote this file, you should attribute yourself.
Added: trunk/phase3/includes/db/DatabaseIbm_db2.php
--- trunk/phase3/includes/db/DatabaseIbm_db2.php (rev 0) +++ trunk/phase3/includes/db/DatabaseIbm_db2.php 2009-01-14 22:20:15 UTC (rev 45755)
+/**
- Utility class for generating blank objects
- Intended as an equivalent to {} in Javascript
- @ingroup Database
- */
+class BlankObject { +}
Just use $obj = new stdClass; here.
+/**
- This represents a column in a DB2 database
- @ingroup Database
- */
+class IBM_DB2Field {
- private $name, $tablename, $type, $nullable, $max_length;
- /**
* Builder method for the class
* @param Object $db Database interface
* @param string $table table name
* @param string $field column name
* @return IBM_DB2Field
*/
- static function fromText($db, $table, $field) {
[...]
- }
- /**
* Get column name
* @return string column name
*/
- function name() { return $this->name; }
- /**
* Get table name
* @return string table name
*/
- function tableName() { return $this->tablename; }
- /**
* Get column type
* @return string column type
*/
- function type() { return $this->type; }
- /**
* Can column be null?
* @return bool true or false
*/
- function nullable() { return $this->nullable; }
- /**
* How much can you fit in the column per row?
* @return int length
*/
- function maxLength() { return $this->max_length; }
+}
Why do you need this? The other Database backends don't have it.
+/**
- Wrapper around binary large objects
- @ingroup Database
- */
+class IBM_DB2Blob {
- private $mData;
- function __construct($data) {
$this->mData = $data;
- }
- function getData() {
return $this->mData;
- }
+}
Why do you need these wrapper objects?
[...]
- public function is_numeric_type( $type ) {
switch (strtoupper($type)) {
case 'SMALLINT':
case 'INTEGER':
case 'INT':
case 'BIGINT':
case 'DECIMAL':
case 'REAL':
case 'DOUBLE':
case 'DECFLOAT':
return true;
}
return false;
- }
Indentation looks wrong here.
- /**
* Construct a LIMIT query with optional offset
* This is used for query pages
* $sql string SQL query we will append the limit too
* $limit integer the SQL limit
* $offset integer the SQL offset (default false)
*/
- public function limitResult($sql, $limit, $offset=false) {
if( !is_numeric($limit) ) {
throw new DBUnexpectedError( $this, "Invalid non-numeric limit passed to limitResult()\n" );
}
if( $offset ) {
wfDebug("Offset parameter not supported in limitResult()\n");
}
// TODO implement proper offset handling
// idea: get all the rows between 0 and offset, advance cursor to offset
return "$sql FETCH FIRST $limit ROWS ONLY ";
- }
So DB2 renames LIMIT $n to something else and doesn't even implement offset handling, even though both are in the SQL specification?
- /**
* USE INDEX clause
* DB2 doesn't have them and returns ""
* @param sting $index
*/
- public function useIndexClause( $index ) {
return "";
- }
What do you mean DB2 "doesn't have them"? FORCE INDEX isn't supported in DB2? Then unless its index choosing algorithm is extremely good, it won't be able to run certain queries with satisfactory efficiency.
- public function select( $table, $vars, $conds='', $fname = 'DatabaseIbm_db2::select', $options = array(), $join_conds = array() )
- {
$res = parent::select( $table, $vars, $conds, $fname, $options, $join_conds );
// We must adjust for offset
if ( isset( $options['LIMIT'] ) ) {
if ( isset ($options['OFFSET'] ) ) {
$limit = $options['LIMIT'];
$offset = $options['OFFSET'];
}
}
This only sets $limit if both $options['LIMIT'] and $options['OFFSET'] are set, which I'm pretty sure is not what you want.
// DB2 does not have a proper num_rows() function yet, so we must emulate it
// DB2 9.5.3/9.5.4 and the corresponding ibm_db2 driver will introduce a working one
// Yay!
You probably want to detect the version and use num_rows() if it's available.
// we want the count
$vars2 = array('count(*) as num_rows');
// respecting just the limit option
$options2 = array();
if ( isset( $options['LIMIT'] ) ) $options2['LIMIT'] = $options['LIMIT'];
Can't you just rewrite LIMIT n to FETCH FIRST n ROWS ONLY here?
Added: trunk/phase3/maintenance/ibm_db2/README
--- trunk/phase3/maintenance/ibm_db2/README (rev 0) +++ trunk/phase3/maintenance/ibm_db2/README 2009-01-14 22:20:15 UTC (rev 45755) @@ -0,0 +1,41 @@ +== Syntax differences between other databases and IBM DB2 == +{| border cellspacing=0 cellpadding=4 +!MySQL!!IBM DB2 +|-
+|SELECT 1 FROM $table LIMIT 1 +|SELECT COUNT(*) FROM SYSIBM.SYSTABLES ST +WHERE ST.NAME = '$table' AND ST.CREATOR = '$schema' [...]
This is probably better off as plain text than as wikitext.
Added: trunk/phase3/maintenance/ibm_db2/tables.sql
--- trunk/phase3/maintenance/ibm_db2/tables.sql (rev 0) +++ trunk/phase3/maintenance/ibm_db2/tables.sql 2009-01-14 22:20:15 UTC (rev 45755) @@ -0,0 +1,604 @@ +-- DB2
+-- SQL to create the initial tables for the MediaWiki database. +-- This is read and executed by the install script; you should +-- not have to run it by itself unless doing a manual install. +-- This is the IBM DB2 version. +-- For information about each table, please see the notes in maintenance/tables.sql +-- Please make sure all dollar-quoting uses $mw$ at the start of the line +-- TODO: Change CHAR/SMALLINT to BOOL (still used in a non-bool fashion in PHP code)
+CREATE SEQUENCE user_user_id_seq AS INTEGER START WITH 0 INCREMENT BY 1; +CREATE TABLE mwuser ( -- replace reserved word 'user'
- user_id INTEGER NOT NULL PRIMARY KEY, -- DEFAULT nextval('user_user_id_seq'),
- user_name VARCHAR(255) NOT NULL UNIQUE,
- user_real_name VARCHAR(255),
- user_password clob(1K),
- user_newpassword clob(1K),
- user_newpass_time TIMESTAMP,
- user_token VARCHAR(255),
- user_email VARCHAR(255),
- user_email_token VARCHAR(255),
- user_email_token_expires TIMESTAMP,
- user_email_authenticated TIMESTAMP,
- user_options CLOB(64K),
- user_touched TIMESTAMP,
- user_registration TIMESTAMP,
- user_editcount INTEGER
+); +CREATE INDEX user_email_token_idx ON mwuser (user_email_token);
You shouldn't rename indices like that, because index names are used in FORCE INDEX clauses (oh wait, but they weren't supported, right?)
+-- should be replaced with OmniFind, Contains(), etc +CREATE TABLE searchindex (
- si_page int NOT NULL,
- si_title varchar(255) NOT NULL default '',
- si_text clob NOT NULL
+);
Don't you need some index on this table to enable efficient searching?
Again, this is only a very shallow look from my part, Brion will probably have more interesting stuff to say.
Roan Kattouw (Catrope)
On Wed, Jan 14, 2009 at 6:10 PM, Roan Kattouw roan.kattouw@home.nl wrote:
So DB2 renames LIMIT $n to something else and doesn't even implement offset handling, even though both are in the SQL specification?
LIMIT/OFFSET are completely nonstandard. See, e.g., this Google result confirms this:
http://troels.arvin.dk/db/rdbms/#select-limit
FETCH FIRST n ROWS ONLY is the standard way, not LIMIT. A lot of stuff that you take for granted in MySQL is completely nonstandard -- REPLACE INTO... and INSERT IGNORE INTO... are two other major examples.
As a general rule, SQL seems to be such a terrible, slowly developed, impractical standard that a typical app doesn't implement most of what it says to do and implements a metric ton of things that it doesn't mention. Which is why we need such a complicated abstraction layer in the first place. Ideally we could just write in standard SQL with the occasional minor workaround, the way we write standard HTML and CSS and so on, but real life doesn't seem to work that way in this case.
What do you mean DB2 "doesn't have them"? FORCE INDEX isn't supported in DB2? Then unless its index choosing algorithm is extremely good, it won't be able to run certain queries with satisfactory efficiency.
My impression is that MySQL is completely stupid when it comes to choosing indexes. Most other serious databases are much smarter and don't need babysitting. Our pgsql wrapper also doesn't implement this. I don't think anything does other than DatabaseMysql.
Index behavior is very database-dependent -- even if other DBs did implement forcing a particular index, they probably wouldn't need to force the same indexes in the same cases. They might not even *have* the same indexes. I know our pgsql schema doesn't have the same indexes as MySQL in most cases: it often can make do with fewer, because of its ability to do things like take intersections of multiple indexes efficiently, or even retrieve by one index and order by another.
You shouldn't rename indices like that, because index names are used in FORCE INDEX clauses (oh wait, but they weren't supported, right?)
Right. pgsql also renames indexes (because as noted, it doesn't use the same ones). I'd imagine the same is true for the other DBs we support or semi-support.
Aryeh Gregor schreef:
On Wed, Jan 14, 2009 at 6:10 PM, Roan Kattouw roan.kattouw@home.nl wrote:
So DB2 renames LIMIT $n to something else and doesn't even implement offset handling, even though both are in the SQL specification?
LIMIT/OFFSET are completely nonstandard. See, e.g., this Google result confirms this:
http://troels.arvin.dk/db/rdbms/#select-limit
FETCH FIRST n ROWS ONLY is the standard way, not LIMIT. A lot of stuff that you take for granted in MySQL is completely nonstandard -- REPLACE INTO... and INSERT IGNORE INTO... are two other major examples.
I knew about those, yes, but I didn't know LIMIT/OFFSET was non-standard, even though it seems to be the more widely used variant. Of course if offset handling isn't implemented at all, that's something to worry about.
As a general rule, SQL seems to be such a terrible, slowly developed, impractical standard that a typical app doesn't implement most of what it says to do and implements a metric ton of things that it doesn't mention. Which is why we need such a complicated abstraction layer in the first place. Ideally we could just write in standard SQL with the occasional minor workaround, the way we write standard HTML and CSS and so on, but real life doesn't seem to work that way in this case.
What do you mean DB2 "doesn't have them"? FORCE INDEX isn't supported in DB2? Then unless its index choosing algorithm is extremely good, it won't be able to run certain queries with satisfactory efficiency.
My impression is that MySQL is completely stupid when it comes to choosing indexes.
It's pretty stupid at times, yes.
Most other serious databases are much smarter and don't need babysitting. Our pgsql wrapper also doesn't implement this. I don't think anything does other than DatabaseMysql.
Index behavior is very database-dependent -- even if other DBs did implement forcing a particular index, they probably wouldn't need to force the same indexes in the same cases. They might not even *have* the same indexes. I know our pgsql schema doesn't have the same indexes as MySQL in most cases: it often can make do with fewer, because of its ability to do things like take intersections of multiple indexes efficiently, or even retrieve by one index and order by another.
You shouldn't rename indices like that, because index names are used in FORCE INDEX clauses (oh wait, but they weren't supported, right?)
Right. pgsql also renames indexes (because as noted, it doesn't use the same ones). I'd imagine the same is true for the other DBs we support or semi-support.
Ah, didn't know pgsql did those things too. I guess it's alright then, as long as performance stays acceptable.
Roan Kattouw (Catrope)
On Wed, Jan 14, 2009 at 6:27 PM, Roan Kattouw roan.kattouw@home.nl wrote:
I knew about those, yes, but I didn't know LIMIT/OFFSET was non-standard, even though it seems to be the more widely used variant. Of course if offset handling isn't implemented at all, that's something to worry about.
You can always emulate it by rewriting
SELECT * FROM foo ORDER BY bar LIMIT m, n
as
SELECT * FROM (SELECT * FROM foo ORDER BY bar LIMIT m+n) ORDER BY bar DESC LIMIT n
or whatever (replacing LIMIT as appropriate). Other major DBMSes tend to implement subqueries more efficiently than MySQL too, AFAIK.
Ah, didn't know pgsql did those things too. I guess it's alright then, as long as performance stays acceptable.
Well, on small sites database performance isn't a big issue. On large sites, well, hopefully they'll have a DBA who can optimize it and give us patches. :)
On Wed, Jan 14, 2009 at 6:27 PM, Roan Kattouw roan.kattouw@home.nl wrote: [snip]
I knew about those, yes, but I didn't know LIMIT/OFFSET was non-standard, even though it seems to be the more widely used variant. Of course if offset handling isn't implemented at all, that's something to worry about.
[/snip]
MSSQL does something different too. The following bit of MySQL:
SELECT * FROM tbl LIMIT 5;
Would become the following in MSSQL:
SELECT TOP 5 FROM tbl;
Was definitely a headache on more than one occasion, being so used to typing limit...
-Chad
Roan Kattouw wrote:
Just some basic comments, I'm sure Brion has more.
You should probably send them to CodeReview these days.
+/**
- This represents a column in a DB2 database
- @ingroup Database
- */
+class IBM_DB2Field {
[...]
Why do you need this? The other Database backends don't have it.
Actually SQLite is going to have one after I commit my working copy. And MySQL has had one for a while.
- /**
* USE INDEX clause
* DB2 doesn't have them and returns ""
* @param sting $index
*/
- public function useIndexClause( $index ) {
return "";
- }
What do you mean DB2 "doesn't have them"? FORCE INDEX isn't supported in DB2? Then unless its index choosing algorithm is extremely good, it won't be able to run certain queries with satisfactory efficiency.
SQLite doesn't have them either. It's not an issue.
+-- should be replaced with OmniFind, Contains(), etc +CREATE TABLE searchindex (
- si_page int NOT NULL,
- si_title varchar(255) NOT NULL default '',
- si_text clob NOT NULL
+);
Don't you need some index on this table to enable efficient searching?
SQLite doesn't have a built-in full-text search engine either, and PostgreSQL only has one starting in 8.3. The schema I'll use for SQLite is pretty much the same. It keeps the scripts happy. They'll use SearchEngineDummy by default and so the table won't be populated. Lucene can be used instead.
-- Tim Starling
Tim Starling schreef:
Roan Kattouw wrote:
Just some basic comments, I'm sure Brion has more.
You should probably send them to CodeReview these days.
Yeah, I know, but since I wanted to reply to code and the reply was quite sizable, I thought I'd do it the old-fashioned way.
SQLite doesn't have a built-in full-text search engine either, and PostgreSQL only has one starting in 8.3. The schema I'll use for SQLite is pretty much the same. It keeps the scripts happy. They'll use SearchEngineDummy by default and so the table won't be populated. Lucene can be used instead.
Ah, so it's a dummy table. That explains.
Roan Kattouw (Catrope)
wikitech-l@lists.wikimedia.org