Hello, I have been looking at the possibility of improving database support recently.
From searching the mailing list archives, I have found that Database.php
does things relating to generating queries, and not just abstraction of the database(http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/14318).
To improve database support, I would like to suggest a database abstraction layer such as ADODB.
Are there any disadvantages that would result from doing work on using a database abstraction layer such as Adodb? Or advantages that are gained from the current methods of accessing databases.
I am interesting in contributing to the project through doing some work towards improving support to include a range of databases.
Regard
Hello,
Are there any disadvantages that would result from doing work on using a database abstraction layer such as Adodb? Or advantages that are gained from the current methods of accessing databases.
our layer is abstracting queries, not only database API. there're some differences between two approaches.
Cheers, Domas
On Thu, 14 May 2009 15:18 +0300, "Domas Mituzas" midom.lists@gmail.com wrote:
Hello,
Are there any disadvantages that would result from doing work on using a database abstraction layer such as Adodb? Or advantages that are gained from the current methods of accessing databases.
our layer is abstracting queries, not only database API. there're some differences between two approaches.
Cheers, Domas
The approach of abstracting queries does appear to be a good option, rather than just abstracting the database API, as then queries could possibly be done using SQL statements that use particular features or syntax specific to the database server.
From looking further into ADODB, it appears that the performance hit
from using it could be significant for projects like Wikipedia (14% or so from the statistics on the ADODB site).
So abstracting the database API using something like ADODB may not be a good option after all, to get it to work better with other databases than MySQL. There does not seem to be any projects with a similiar size to the ones that use MediaWiki , which use ADODB and support a range of databases.
Regards, Karun
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, May 15, 2009 at 6:48 AM, Karun Dambiec karun@fastmail.fm wrote:
From looking further into ADODB, it appears that the performance hit from using it could be significant for projects like Wikipedia (14% or so from the statistics on the ADODB site).
14% over what? For trivial queries that take 0.1ms to execute anyway? Not for long complicated ones, I bet. And I'm awfully sure it's not 14% for all of MediaWiki, including the parser and all. I doubt it has more overhead than our current solution -- probably much less, since it does much less. But the overhead is going to be negligible if you run a sane number of queries per request, unless this has way more features than I think.
On Fri, 15 May 2009 09:45 -0400, "Aryeh Gregor" Simetrical+wikilist@gmail.com wrote:
On Fri, May 15, 2009 at 6:48 AM, Karun Dambiec karun@fastmail.fm wrote:
From looking further into ADODB, it appears that the performance hit from using it could be significant for projects like Wikipedia (14% or so from the statistics on the ADODB site).
14% over what? For trivial queries that take 0.1ms to execute anyway? Not for long complicated ones, I bet. And I'm awfully sure it's not 14% for all of MediaWiki, including the parser and all. I doubt it has more overhead than our current solution -- probably much less, since it does much less. But the overhead is going to be negligible if you run a sane number of queries per request, unless this has way more features than I think.
Some of ADODBs own benchmarks have revealed that it is not necessarily too fast. If there is no accelerator used there is a significant speed decrease in comparison to using MySQL directly. I assume the same would likely apply for other database.
Benchmark 2
The first benchmark is a synthetic one that does not measure live performance. This benchmark tries to be more realistic, measuring HTTP requests/second. In this test, we select and display 82 rows from the products table once per page request.
...
Two runs were taken and averaged for each test. Higher values are better. All measurements in pages/second.
With No Accelerator Accelerator MySQL 83.53 81.35 ADOdb 61.19 21.33 PEAR DB 52.85 25.26
On Fri, May 15, 2009 at 5:48 PM, Karun Dambiec karun@fastmail.fm wrote:
The first benchmark is a synthetic one that does not measure live performance. This benchmark tries to be more realistic, measuring HTTP requests/second. In this test, we select and display 82 rows from the products table once per page request.
And do absolutely nothing else. That's not realistic. This makes no difference to a large app like MediaWiki. But it's a pointless discussion, since ADOdb doesn't do anything useful for us anyway.
On Thu, May 14, 2009 at 8:10 AM, Karun Dambiec karun@fastmail.fm wrote:
From searching the mailing list archives, I have found that Database.php does things relating to generating queries, and not just abstraction of the database(http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/14318).
Yep. Not much point in abstracting the database function calls if your SQL is a syntax error on half the databases. :)
To improve database support, I would like to suggest a database abstraction layer such as ADODB.
Are there any disadvantages that would result from doing work on using a database abstraction layer such as Adodb? Or advantages that are gained from the current methods of accessing databases.
Well, here's an example. We just had a case where the code generated a query like this:
(SELECT ...) UNION (SELECT ...) UNION (SELECT ...) ORDER BY ...
It turns out that this breaks in Oracle. The needed syntax there is apparently:
SELECT * FROM ((SELECT ...) UNION (SELECT ...) UNION (SELECT ...)) ORDER BY ...
So this was changed http://www.mediawiki.org/wiki/Special:Code/MediaWiki/50478. The only problem is, the changed syntax uses a subquery, which doesn't work in MySQL 4, and it had to be reverted http://www.mediawiki.org/wiki/Special:Code/MediaWiki/50483. *No* raw SQL is going to get you something that works both on MySQL 4.0 and Oracle here (AFAIK). You need an abstraction layer that not only passes the queries to the databases, but also generates SQL that will actually *work* on those databases. The solution here would probably be to add a function to the abstraction layers that looks like
return '(' . implode( ') UNION (', $queries ) . ')';
by default, and
return 'SELECT * FROM ((' . implode( ') UNION (', $queries ) . '))';
for Oracle. How would this be achieved by just using ADOdb, without query-generation logic? Keep in mind that we can't be purists here about standard SQL or whatever -- the code must work on MySQL 4.0, and it must be *efficient* on MySQL 4.0, at any cost, since that's what Wikipedia runs.
It's all very well for ADOdb to claim to support zillions of DBs, but is there any large web application that actually *works* on all those DBs, just by using ADOdb and without lots of other DB-specific logic? I strongly suspect not.
On Thu, 14 May 2009 09:50 -0400, "Aryeh Gregor" Simetrical+wikilist@gmail.com wrote:
Yep. Not much point in abstracting the database function calls if your SQL is a syntax error on half the databases. :)
In terms of performance, it could possibly be better to write queries for each database type.
Well, here's an example. We just had a case where the code generated a query like this:
(SELECT ...) UNION (SELECT ...) UNION (SELECT ...) ORDER BY ...
It turns out that this breaks in Oracle. The needed syntax there is apparently:
SELECT * FROM ((SELECT ...) UNION (SELECT ...) UNION (SELECT ...)) ORDER BY ...
So this was changed http://www.mediawiki.org/wiki/Special:Code/MediaWiki/50478. The only problem is, the changed syntax uses a subquery, which doesn't work in MySQL 4, and it had to be reverted http://www.mediawiki.org/wiki/Special:Code/MediaWiki/50483. *No* raw SQL is going to get you something that works both on MySQL 4.0 and Oracle here (AFAIK). You need an abstraction layer that not only passes the queries to the databases, but also generates SQL that will actually *work* on those databases. The solution here would probably be to add a function to the abstraction layers that looks like
return '(' . implode( ') UNION (', $queries ) . ')';
by default, and
return 'SELECT * FROM ((' . implode( ') UNION (', $queries ) . '))';
An option could be to abstract the queries more for each database type, and not need to use the same query for MySQL, PostGreSQL and Oracle. That way the optimum query for each database type could be used to retrieve the information.
for Oracle. How would this be achieved by just using ADOdb, without query-generation logic? Keep in mind that we can't be purists here about standard SQL or whatever -- the code must work on MySQL 4.0, and it must be *efficient* on MySQL 4.0, at any cost, since that's what Wikipedia runs.
A potential problem of using something like ADODB is the overhead it adds, although from their website it is smaller than other database abstraction layers.
It's all very well for ADOdb to claim to support zillions of DBs, but is there any large web application that actually *works* on all those DBs, just by using ADOdb and without lots of other DB-specific logic? I strongly suspect not.
There are some web applications that use ADoDB such as Mambo, Joomla, etc, but I have never tried installing them with a database other than one of the ones their website states is supported. So I have not been able to evaluate that it would work on all databases supported by ADoDB.
On Thu, May 14, 2009 at 5:35 PM, Karun Dambiec karun@fastmail.fm wrote:
In terms of performance, it could possibly be better to write queries for each database type. ... An option could be to abstract the queries more for each database type, and not need to use the same query for MySQL, PostGreSQL and Oracle. That way the optimum query for each database type could be used to retrieve the information.
In most cases this is unnecessary. If cases arise where it's necessary, they can be dealt with then. It's entirely possible to check the class of the database object and run different queries based on that, if desired. We rarely have to do it.
On Thu, May 14, 2009 at 6:50 AM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
wrote:
... Keep in mind that we can't be purists here about standard SQL or whatever -- the code must work on MySQL 4.0, and it must be *efficient* on MySQL 4.0, at any cost, since that's what Wikipedia runs.
Still?
MySQL 5.x have been out for a very long time now. 3.5 years of production stable release for 5.0.
Domas, I assume you're still on this list - can you give us some background why we're not on a closer to current release MySQL within the WMF environments?
Or is this a Brion question?
Hi!
MySQL 5.x have been out for a very long time now. 3.5 years of production stable release for 5.0.
Oh! Time flies fast.
Domas, I assume you're still on this list - can you give us some background why we're not on a closer to current release MySQL within the WMF environments?
What would it bring to us? I'm testing 5.4, but need few features to be hand-merged into it to make it more usable for us :)
BR, Domas
2009/5/15 George Herbert george.herbert@gmail.com:
Domas, I assume you're still on this list - can you give us some background why we're not on a closer to current release MySQL within the WMF environments?
Upgrading for the sake of upgrading is always a bad idea. The question should also be "why should we upgrade?" not "why shouldn't we upgrade?".
On Fri, May 15, 2009 at 5:01 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2009/5/15 George Herbert george.herbert@gmail.com:
Domas, I assume you're still on this list - can you give us some
background
why we're not on a closer to current release MySQL within the WMF environments?
Upgrading for the sake of upgrading is always a bad idea. The question should also be "why should we upgrade?" not "why shouldn't we upgrade?".
Eventually, supportability and bugfixes for newer versions surpass those for older versions.
I am not one for making changes just because, but having done long term system architecture and administration in industry, refreshing things every year or two years (be they Solaris, Linux, Oracle, or other tools) is a really good practice.
Among other things, if you wait too long between refreshes, you run the risk that it's too hard to roll the next upgrade, because of lack of experience and preparation with upgrading.
Upgrades should be regular and expected. Frequent is probably a mistake, barring active bugs, but trying to freeze anything in time works poorly over 5 year timespans. You eventually end up with hardware that's obsolete to the point of unreliability, software that's obsolete to the point of unreliability, etc.
I already run Mediawiki on ADODB (see bug 9767 over at Bugzilla) and concur with what I see as the intent of the start of this thread. As I began to move from 1.14 to 1.15, I noted a problem with Special:RecentChanges and Special:RecentChangesLinked that are characterized in that message (i.e., queries formed by the union of two or more subqueries). Those compound queries are being put together at the level of the special page, and are constructed from arrays of subqueries. I would much rather see a function added at the Database class level to form compound queries from subqueries, since the alternative is to introduce conditional code at the special page level that is selected on the basis of the SQL syntax of the database engine -- and that approach has been justifiably opposed by Brion and others in the past. I would then override that function in my derived DatabaseADODB class, as I already am doing with several other functions. Further, my current approach to this particular issue produces different results for SQL Server 2000 and SQL Server 2005. For SQL Server 2005, the approach uses CTEs (Common Table Expressions) -- but that approach is not available in SQL Server 2000, so I must resort to either using tables in the tempdb and accumulating results, or simply approximating the MySQL approach -- which is what I actually do. The problem with the UNIONs in SQL Server 2000 is that you can't limit the results of the component queries in addition to the results of the compound query, as you can with CTEs in SQL Server 2005.
On Fri, May 15, 2009 at 7:20 PM, George Herbert george.herbert@gmail.com wrote:
On Fri, May 15, 2009 at 5:01 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2009/5/15 George Herbert george.herbert@gmail.com:
Domas, I assume you're still on this list - can you give us some
background
why we're not on a closer to current release MySQL within the WMF environments?
Upgrading for the sake of upgrading is always a bad idea. The question should also be "why should we upgrade?" not "why shouldn't we upgrade?".
Eventually, supportability and bugfixes for newer versions surpass those for older versions.
I am not one for making changes just because, but having done long term system architecture and administration in industry, refreshing things every year or two years (be they Solaris, Linux, Oracle, or other tools) is a really good practice.
Among other things, if you wait too long between refreshes, you run the risk that it's too hard to roll the next upgrade, because of lack of experience and preparation with upgrading.
Upgrades should be regular and expected. Frequent is probably a mistake, barring active bugs, but trying to freeze anything in time works poorly over 5 year timespans. You eventually end up with hardware that's obsolete to the point of unreliability, software that's obsolete to the point of unreliability, etc.
-- -george william herbert george.herbert@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Why was this approach opposed? I am working on Oracle abstraction at the moment and i was planning to implement such a function.
DJ Bauch wrote:
I would much rather see a function added at the Database class level to form compound queries from subqueries, since the alternative is to introduce conditional code at the special page level that is selected on the basis of the SQL syntax of the database engine -- and that approach has been justifiably opposed by Brion and others in the past. I would then override that function in my derived DatabaseADODB class, as I already am doing with several other functions.
2009/5/16 Freako F. Freakolowsky freak@drajv.si:
Why was this approach opposed? I am working on Oracle abstraction at the moment and i was planning to implement such a function.
DJ Bauch wrote:
I would much rather see a function added at the Database class level to form compound queries from subqueries, since the alternative is to introduce conditional code at the special page level that is selected on the basis of the SQL syntax of the database engine -- and that approach has been justifiably opposed by Brion and others in the past. I would then override that function in my derived DatabaseADODB class, as I already am doing with several other functions.
What I think DJ Bauch meant was that conditional code in special pages deciding which SQL to use based on $wgDBtype is a bad idea and has been opposed. Adding a function to the Database class to form compound queries has, to my knowledge, not been opposed.
Roan Kattouw (Catrope)
George,
Eventually, supportability and bugfixes for newer versions surpass those for older versions.
True, though we don't hit bugs too much in 4.0, it is somewhat rocksolid for us. Eventually not going for 5.4 will be bad situation (once innodb plugin 1.0.3 features get merged in, e.g. fast index management is awesome :)
Among other things, if you wait too long between refreshes, you run the risk that it's too hard to roll the next upgrade, because of lack of experience and preparation with upgrading.
Well, there's that with proprietary/legacy internal projects. We actually run opensource project which works both on 4.0 and on 6.0. I did testing with 5.0, 5.1, 5.4 - so we kind of know what steps are needed to upgrade our farm from application side. We also know what is needed from DBA side - and that is a bit of work.
Upgrades should be regular and expected. Frequent is probably a mistake, barring active bugs, but trying to freeze anything in time works poorly over 5 year timespans. You eventually end up with hardware that's obsolete to the point of unreliability, software that's obsolete to the point of unreliability, etc.
Think of our 4.0.40 as less than two year old product. We have features people don't have in 5.0 ;-) It is not that bad then! Anyway, software, especially infrastructure one, gets obsolete way later, unless really interesting developments happen.
Cheers, Domas
On Sat, May 16, 2009 at 2:21 AM, Domas Mituzas midom.lists@gmail.comwrote:
George,
Eventually, supportability and bugfixes for newer versions surpass those for older versions.
True, though we don't hit bugs too much in 4.0, it is somewhat rocksolid for us. Eventually not going for 5.4 will be bad situation (once innodb plugin 1.0.3 features get merged in, e.g. fast index management is awesome :)
Among other things, if you wait too long between refreshes, you run the risk that it's too hard to roll the next upgrade, because of lack of experience and preparation with upgrading.
Well, there's that with proprietary/legacy internal projects. We actually run opensource project which works both on 4.0 and on 6.0. I did testing with 5.0, 5.1, 5.4 - so we kind of know what steps are needed to upgrade our farm from application side. We also know what is needed from DBA side - and that is a bit of work.
Upgrades should be regular and expected. Frequent is probably a mistake, barring active bugs, but trying to freeze anything in time works poorly over 5 year timespans. You eventually end up with hardware that's obsolete to the point of unreliability, software that's obsolete to the point of unreliability, etc.
Think of our 4.0.40 as less than two year old product. We have features people don't have in 5.0 ;-) It is not that bad then! Anyway, software, especially infrastructure one, gets obsolete way later, unless really interesting developments happen.
Cheers, Domas
I appreciate what you're saying, but I've been the person who had to clean up several large environments (much larger than WMF) after they let software get too old, and it was very much not pretty.
If you're not keeping currentish and testing new stuff, you miss that you can't buy hardware to support that OS version anymore. ("What do you mean, this Linux kernel can't deal with hyperthreading CPUs and won't boot?")
If you're not keeping currentish and testing new stuff, you miss that the easy upgrade to the next version doesn't work anymore. (Been there, done that with Mediawiki... and Oracle, and RHEL6, and FreeBSD, and Solaris, and Websphere and Weblogic and Java and Apache and ...)
If you're not keeping currentish and testing new stuff, you're missing security patches which nobody remembers to backport more than one rev of the OS / DB / apps (in this case, WMF is developing the big app and Domas' connection to the DB helps avoid that, but most places have none of these covered).
Having spent several years of my life at various companies pulling them forwards onto newer supportable OSes and hardware and app versions the hard way, I strongly recommend staying more current on EVERYTHING...
It's industry best practice. Not everyone is doing it, obviously, but it is a best practice.
Think of this as a subtle trap, that is slowly engulfing you. Getting out of it is easy at first, and eventually becomes unbearably painful around year 4 or 5.
El 5/16/09 2:43 PM, George Herbert escribió: [snip]
I appreciate what you're saying, but I've been the person who had to clean up several large environments (much larger than WMF) after they let software get too old, and it was very much not pretty.
If you're not keeping currentish and testing new stuff,
[snip]
As a matter of fact, we are keeping currentish and testing new stuff.
Just about everyone's development systems and most third-party MediaWiki installations run on MySQL 5.x, and Domas (a Sun/MySQL support engineer) has been testing MySQL 5.x in limited runs within the Wikimedia cluster over the last year or so. MediaWiki's functionality on MySQL 5.x is well known.
In the meantime, our primary production machines are running time-proven MySQL 4.0.x with up-to-date stability and performance patches maintained by Google -- fixes which have not yet all been rolled back to mainline 5.x.
-- brion vibber (brion @ wikimedia.org)
Hi!
In the meantime, our primary production machines are running time- proven MySQL 4.0.x with up-to-date stability and performance patches maintained by Google -- fixes which have not yet all been rolled back to mainline 5.x.
Google have switched to their own 'forever' version for now, based on 5.0.37 iirc :) It is just us!
Cheers, Domas
Google have switched to their own 'forever' version for now, based on 5.0.37 iirc :) It is just us!
Why Google uses branching to such an old version? Aren't the fixed being backported to latest builds of 5.0.x (there was quite a time since 5.0.37). As for MySQL 6.0, maybe it's worth a dedicated Database.php class for the testing purposes.. Dmitriy
Hi!
Why Google uses branching to such an old version? Aren't the fixed being backported to latest builds of 5.0.x (there was quite a time since 5.0.37).
Hehe, it was actually Google people who coined 'four oh forever' (a nickname for our 4.0.40 build ;-), and now they refer to their build as 'five oh forever' (probably rounding 37 to 40 ;-). Once you run software and you do quite a bit of engineering on top, you are not rushing to upgrade all the time :-) Oh, and they have quite a team of people working on improvements.
As for MySQL 6.0, maybe it's worth a dedicated Database.php class for the testing purposes..
Why? Current Database class works just fine.
Cheers, Domas
On Mon, May 18, 2009 at 9:13 PM, Dmitriy Sintsov questpc@rambler.ru wrote:
Why Google uses branching to such an old version? Aren't the fixed being backported to latest builds of 5.0.x (there was quite a time since 5.0.37).
Presumably Google doesn't get all its patches included in mainline MySQL quickly. This is probably because of some combination of 1) they break stuff Google doesn't care about but MySQL does and this would have to be fixed to upstream them, 2) effort would be required to get them adjusted to the taste of upstream and this doesn't directly benefit Google, 3) upstream is too slow to review large third-party patch sets (don't know if this is true of MySQL), 4) upstream is too slow to include new features in production releases (I've heard this is true of MySQL). So they have a bunch of big complicated patches and they don't want to port them to new versions too often. A common story with open-source software.
Hoi, There are several parts to that question. Typically you upgrade for the benefit of new functionality, security fixes and performance improvements. These are all sound reasons why we should upgrade. Reasons why not to upgrade are prerequisites for the upgrade are not in place, the logistics of it and also the need to retro fit customisations made to the current product.
There are powerful reasons why we should upgrade.. The most powerful to me is the much improved support for UTF-8 and the associated collation. I am of the opinion that the current version of MySQL is holding us back and prevents us from realising much needed functionality. Functionality that will particularly benefit the "other" languages. Functionality that is currently emulated by a hack that is broken by design. Thanks, GerardM
2009/5/16 Thomas Dalton thomas.dalton@gmail.com
2009/5/15 George Herbert george.herbert@gmail.com:
Domas, I assume you're still on this list - can you give us some
background
why we're not on a closer to current release MySQL within the WMF environments?
Upgrading for the sake of upgrading is always a bad idea. The question should also be "why should we upgrade?" not "why shouldn't we upgrade?".
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org