Hello,
I work on the DB2 database server at IBM. In my personal time, I've written a patch to add DB2 support to MediaWiki. It supports most of the MediaWiki Database API with some gaps in searching and admin. I'm involved in launching a site based on the patch and moving another wiki onto it, so the gaps should be plugged in due time.
I don't have commit access. Could someone help get this committed?
The patch is against r43499 of the trunk: http://lpetr.org/ibm_db2_patch/
The diff defines the changes to existing files: http://lpetr.org/ibm_db2_patch/add_ibm_db2.diff
Changes: - Improved database agnosticism in Special pages (good for DB2, MSSQL, Oracle) - DB2 interface options in the config index.php - Some new constants like TS_DB2 for DB2 timestamp format
There are also three new files: http://lpetr.org/ibm_db2_patch/includes/db/DatabaseIbm_db2.php http://lpetr.org/ibm_db2_patch/includes/SearchIBM_DB2.php http://lpetr.org/ibm_db2_patch/maintenance/ibm_db2/tables.sql
They are named to match the existing database files. Capitalization is a bit inconsistent for compatibility. 'ibm_db2' is the name of the required PHP extension, and the config/index.php algorithm requires Ibm_db2 spelling for the Database module. Etc.
IBM DB2 is available for free download at http://tinyurl.com/6lg5pa My public SSH key is http://lpetr.org/personal/leo-public.pub
Regards,
Leons Petrazickis http://lpetr.org/blog/
Leons Petrazickis wrote:
Hello,
I work on the DB2 database server at IBM. In my personal time, I've written a patch to add DB2 support to MediaWiki. It supports most of the MediaWiki Database API with some gaps in searching and admin. I'm involved in launching a site based on the patch and moving another wiki onto it, so the gaps should be plugged in due time.
I don't have commit access. Could someone help get this committed?
Will you continue to maintain it? I wouldn't like to see it committed once and then slowly rot. I think you should get commit access, commit it yourself, and then maintain it into the future.
Maintenance of non-MySQL DBMS code has been a problem in the past, we really need someone to refactor the common code to Database, move MySQL-specific code to DatabaseMysql, and generalise the installer and updater. If you're looking for something to do.
I'm doing some work on the installer frontend, but a DBMS-independent schema update system (like updaters.inc) would be orthogonal to that.
-- Tim Starling
Not that I don't have many other things to do with my time, but I think that the refactoring is a good idea, and that someone bringing the Oracle support up to speed and sustaining it as well would be useful too.
Of course, I said this a couple of years ago, and haven't had time to code anything...
-george
On Tue, Nov 18, 2008 at 12:22 AM, Tim Starling tstarling@wikimedia.org wrote:
Leons Petrazickis wrote:
Hello,
I work on the DB2 database server at IBM. In my personal time, I've written a patch to add DB2 support to MediaWiki. It supports most of the MediaWiki Database API with some gaps in searching and admin. I'm involved in launching a site based on the patch and moving another wiki onto it, so the gaps should be plugged in due time.
I don't have commit access. Could someone help get this committed?
Will you continue to maintain it? I wouldn't like to see it committed once and then slowly rot. I think you should get commit access, commit it yourself, and then maintain it into the future.
Maintenance of non-MySQL DBMS code has been a problem in the past, we really need someone to refactor the common code to Database, move MySQL-specific code to DatabaseMysql, and generalise the installer and updater. If you're looking for something to do.
I'm doing some work on the installer frontend, but a DBMS-independent schema update system (like updaters.inc) would be orthogonal to that.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Nov 18, 2008 at 3:22 AM, Tim Starling tstarling@wikimedia.org wrote:
Leons Petrazickis wrote:
Hello,
I work on the DB2 database server at IBM. In my personal time, I've written a patch to add DB2 support to MediaWiki. It supports most of the MediaWiki Database API with some gaps in searching and admin. I'm involved in launching a site based on the patch and moving another wiki onto it, so the gaps should be plugged in due time.
I don't have commit access. Could someone help get this committed?
Will you continue to maintain it? I wouldn't like to see it committed once and then slowly rot. I think you should get commit access, commit it yourself, and then maintain it into the future.
I do have the time to maintain DB2 support for the foreseeable future. There's a couple sites I'm involved with that would use it, so the long-term interest is there.
What do I need to do to get commit access? My public key is: http://lpetr.org/personal/leo-public.pub
Maintenance of non-MySQL DBMS code has been a problem in the past, we really need someone to refactor the common code to Database, move MySQL-specific code to DatabaseMysql, and generalise the installer and updater. If you're looking for something to do.
That's a fair bit of work. Moving the code wouldn't solve the maintenance issue by itself, though.
Would you make Database an abstract class? That would get us this: - DB-agnostic methods could still be defined in Database - DB-specific methods would have to be defined in any child class - there would be a clear error if a required DB-specific method was not implemented
This would be a definite improvement over the current situation where, say, the Oracle layer could have a missing method, escalate to a MySQL-specific one in Database, and fail with varying degrees of mystery.
I'm doing some work on the installer frontend, but a DBMS-independent schema update system (like updaters.inc) would be orthogonal to that.
Ah, that would be quite useful. If done right, it could also be used for installation in place of the DBMS-specific tables.sql files.
What I am envisioning is an abstraction over Data Definition Language -- create table, alter table, rename column stuff -- that's similar in architecture to the Database classes. There'd be a root class that defines a DB-agnostic interface, and DB-specific child classes that implement vendor-specific things.
This would solve the maintenance headache of having several different database schema definitions -- a separate tables.sql to maintain for MySQL, Postgres, SQLite, etc.
A rework of updaters.inc would be a good start. -- Leons Petrazickis http://lpetr.org/blog/
Leons Petrazickis wrote:
On Tue, Nov 18, 2008 at 3:22 AM, Tim Starling tstarling@wikimedia.org wrote:
Leons Petrazickis wrote:
Hello,
I work on the DB2 database server at IBM. In my personal time, I've written a patch to add DB2 support to MediaWiki. It supports most of the MediaWiki Database API with some gaps in searching and admin. I'm involved in launching a site based on the patch and moving another wiki onto it, so the gaps should be plugged in due time.
I don't have commit access. Could someone help get this committed?
Will you continue to maintain it? I wouldn't like to see it committed once and then slowly rot. I think you should get commit access, commit it yourself, and then maintain it into the future.
I do have the time to maintain DB2 support for the foreseeable future. There's a couple sites I'm involved with that would use it, so the long-term interest is there.
What do I need to do to get commit access? My public key is: http://lpetr.org/personal/leo-public.pub
Excellent. Specify your preferred username.
Maintenance of non-MySQL DBMS code has been a problem in the past, we really need someone to refactor the common code to Database, move MySQL-specific code to DatabaseMysql, and generalise the installer and updater. If you're looking for something to do.
That's a fair bit of work. Moving the code wouldn't solve the maintenance issue by itself, though.
Would you make Database an abstract class? That would get us this:
- DB-agnostic methods could still be defined in Database
- DB-specific methods would have to be defined in any child class
- there would be a clear error if a required DB-specific method was
not implemented
Yes, sounds good.
This would be a definite improvement over the current situation where, say, the Oracle layer could have a missing method, escalate to a MySQL-specific one in Database, and fail with varying degrees of mystery.
Indeed.
I'm doing some work on the installer frontend, but a DBMS-independent schema update system (like updaters.inc) would be orthogonal to that.
Ah, that would be quite useful. If done right, it could also be used for installation in place of the DBMS-specific tables.sql files.
What I am envisioning is an abstraction over Data Definition Language -- create table, alter table, rename column stuff -- that's similar in architecture to the Database classes. There'd be a root class that defines a DB-agnostic interface, and DB-specific child classes that implement vendor-specific things.
This would solve the maintenance headache of having several different database schema definitions -- a separate tables.sql to maintain for MySQL, Postgres, SQLite, etc.
Yes. We already have some degree of abstraction in our SQL files in the form of comments like /*$wgDBTableOptions*/. I'm not sure what the best way to do it is, but here are my ideas:
1. Extend that comments system 2. Introduce an SQL-like language that can be translated to real SQL 3. Introduce an entirely new data definition language, say XML-based
The idea of 1 is to maintain backwards compatibility for scripts that want to feed SQL files directly into a MySQL database. But it's the least flexible because it would get really ugly really fast as you add more features.
2 is nice in that the schema files would be easy to read and write. Scripts that previously sourced SQL can now just pipe through the translator instead. We could invent data types like "mwtimestamp" which translates to "binary(14)" for MySQL and "timestamptz" for PostgreSQL. The drawback is that the parser could be complex, especially if it has to deal with features like foreign key constraints and trigger functions (both of which are used in the current PostgreSQL schema).
3 could be very simple to implement, since you could just have DBMS-dependent cases written out in full. But the resulting files might be difficult to update.
It would probably be worthwhile to do a survey of existing open-source solutions.
-- Tim Starling
Hello.
I'm a new poster to this list, and I find this topic fascinating.
Tim Starling escribió:
Yes. We already have some degree of abstraction in our SQL files in the form of comments like /*$wgDBTableOptions*/. I'm not sure what the best way to do it is, but here are my ideas:
- Extend that comments system
- Introduce an SQL-like language that can be translated to real SQL
- Introduce an entirely new data definition language, say XML-based
I believe a better solution is to design a domain-specific language, an idea not very different from your first one. This DSL would model the interaction between the application and the DB as it is now, and would be designed to evolve. That's it.
Your other ideas aim to more general-purpose solutions. I believe those would be harder to develop, more complex, and not as useful.
The idea of 1 is to maintain backwards compatibility for scripts that want to feed SQL files directly into a MySQL database. But it's the least flexible because it would get really ugly really fast as you add more features.
If the DSL is well designed, new features would potentially require new domain-specific language elements to be added, and none to be removed. If it is badly designed, however, the end result is usually like you describe. But it's not an inescapable end.
If you like the idea, I can develop a throwaway testbed for an embryonic DSL, to clearly see if it is suitable.
Best regards.
Jesús Quiroga wrote:
Tim Starling escribió:
Yes. We already have some degree of abstraction in our SQL files in the form of comments like /*$wgDBTableOptions*/. I'm not sure what the best way to do it is, but here are my ideas:
- Extend that comments system
- Introduce an SQL-like language that can be translated to real SQL
- Introduce an entirely new data definition language, say XML-based
I believe a better solution is to design a domain-specific language, an idea not very different from your first one. This DSL would model the interaction between the application and the DB as it is now, and would be designed to evolve. That's it.
No, that would be a very bad solution. It would require that everyone who works on MediaWiki has to learn that domain-specific language, which would result in less people being able to work on MediaWiki.
On Wed, Dec 24, 2008 at 3:17 AM, Nikola Smolenski smolensk@eunet.yu wrote:
No, that would be a very bad solution. It would require that everyone who works on MediaWiki has to learn that domain-specific language, which would result in less people being able to work on MediaWiki.
This is not inherently different from people having to learn what MediaWiki classes and so on do. If it's a well-designed language, it will be as easy for people to pick up as alternative ways of writing the same info.
The first step absolutely must be to look at what other open-source products in our situation are doing, though. If we can take the entire system wholesale from phpBB, say, then there's no reason for us to reinvent the wheel.
Agh noooo.... don't base ideas of that crap coded system!
Last time I checked, phpBB was still using raw SQL statements, and instead of properly escaping with a clean system like MediaWiki currently is, they used a method for getting request values which would typecast the value into the same type as the default value. Agh, to be quite honest, the fact that they don't even bother escaping, and only on typecasting most of the input into numbers, is probably the reason why phpBB ends up with so many security issues.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) ~Profile/Portfolio: http://nadir-seen-fire.com -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Aryeh Gregor wrote:
On Wed, Dec 24, 2008 at 3:17 AM, Nikola Smolenski smolensk@eunet.yu wrote:
No, that would be a very bad solution. It would require that everyone who works on MediaWiki has to learn that domain-specific language, which would result in less people being able to work on MediaWiki.
This is not inherently different from people having to learn what MediaWiki classes and so on do. If it's a well-designed language, it will be as easy for people to pick up as alternative ways of writing the same info.
The first step absolutely must be to look at what other open-source products in our situation are doing, though. If we can take the entire system wholesale from phpBB, say, then there's no reason for us to reinvent the wheel.
On Wed, Dec 24, 2008 at 2:19 PM, Daniel Friesen dan_the_man@telus.net wrote:
Agh noooo.... don't base ideas of that crap coded system!
Last time I checked, phpBB was still using raw SQL statements, and instead of properly escaping with a clean system like MediaWiki currently is, they used a method for getting request values which would typecast the value into the same type as the default value. Agh, to be quite honest, the fact that they don't even bother escaping, and only on typecasting most of the input into numbers, is probably the reason why phpBB ends up with so many security issues.
I just looked, and you're right, they seem to rely on things like this for multi-DB support:
/** * Oracle specific code to handle it's lack of sanity * @access private */ function _rewrite_where($where_clause) { preg_match_all('/\s*(AND|OR)?\s*([\w_.]++)\s*(?:(=|<[=>]?|>=?)\s*((?>'(?>[^']++|'')*+'|[\d-.]+))|((NOT )?IN\s*((?>'(?>[^']++|'')*+',? ?|[\d-.]+,? ?)*+)))/', $where_clause, $result, PREG_SET_ORDER); $out = ''; foreach ($result as $val)
Probably not a model we want to follow, although I don't think that using raw SQL is necessarily bad in principle (using string concatenation to include variables certainly is, though).
That's just disgusting....
On Dec 24, 2008 2:35 PM, "Aryeh Gregor" <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com> wrote:
On Wed, Dec 24, 2008 at 2:19 PM, Daniel Friesen dan_the_man@telus.net wrote: > Agh noooo.... don't... I just looked, and you're right, they seem to rely on things like this for multi-DB support:
/** * Oracle specific code to handle it's lack of sanity * @access private */ function _rewrite_where($where_clause) {
preg_match_all('/\s*(AND|OR)?\s*([\w_.]++)\s*(?:(=|<[=>]?|>=?)\s*((?>'(?>[^']++|'')*+'|[\d-.]+))|((NOT )?IN\s*((?>'(?>[^']++|'')*+',? ?|[\d-.]+,? ?)*+)))/', $where_clause, $result, PREG_SET_ORDER); $out = ''; foreach ($result as $val)
Probably not a model we want to follow, although I don't think that using raw SQL is necessarily bad in principle (using string concatenation to include variables certainly is, though).
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia....
That's disgraceful. I might have to boycott phpBB just because of that...
Soxred93
On Dec 24, 2008, at 2:36 PM [Dec 24, 2008 ], Chad wrote:
That's just disgusting....
On Dec 24, 2008 2:35 PM, "Aryeh Gregor" <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com> wrote:
On Wed, Dec 24, 2008 at 2:19 PM, Daniel Friesen dan_the_man@telus.net wrote: > Agh noooo.... don't... I just looked, and you're right, they seem to rely on things like this for multi-DB support:
/**
- Oracle specific code to handle it's lack of sanity
- @access private
*/ function _rewrite_where($where_clause) {
preg_match_all('/\s*(AND|OR)?\s*([\w_.]++)\s*(?:(=|<[=>]?|>=?)\s* ((?>'(?>[^']++|'')*+'|[\d-.]+))|((NOT )?IN\s*((?>'(?>[^']++|'')*+',? ?|[\d-.]+,? ?)*+)))/', $where_clause, $result, PREG_SET_ORDER); $out = ''; foreach ($result as $val)
Probably not a model we want to follow, although I don't think that using raw SQL is necessarily bad in principle (using string concatenation to include variables certainly is, though).
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.... _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2008/12/24 Soxred93 soxred93@gmail.com:
That's disgraceful. I might have to boycott phpBB just because of that...
A sufficiently dedicated programmer can write TRS-80 BASIC in any language. PHP just makes it easier.
- d.
On Wed, Dec 24, 2008 at 2:36 PM, Chad innocentkiller@gmail.com wrote:
That's just disgusting....
On Wed, Dec 24, 2008 at 2:41 PM, Soxred93 soxred93@gmail.com wrote:
That's disgraceful. I might have to boycott phpBB just because of that...
There's no need to flame other open-source projects here. We have code that's at least as awful if not more so, including *plenty* of places where we still call $dbr->query() directly (like everything based on QueryPage). As far as I know, our abstraction layer was devised by Domas early on in MediaWiki's existence. If he hadn't decided to go in that direction, we'd likely be using something similar.
Besides, ugly or not, their multi-DB support looks to be a lot better than ours. As far as I know, MediaWiki barely works at all on anything other than MySQL and pgsql, and doesn't work fully on the latter either (various maintenance scripts, extensions, etc.).
On Wednesday 24 December 2008 14:46:29 Aryeh Gregor wrote:
On Wed, Dec 24, 2008 at 3:17 AM, Nikola Smolenski smolensk@eunet.yu wrote:
No, that would be a very bad solution. It would require that everyone who works on MediaWiki has to learn that domain-specific language, which would result in less people being able to work on MediaWiki.
This is not inherently different from people having to learn what MediaWiki classes and so on do. If it's a well-designed language, it will be as easy for people to pick up as alternative ways of writing the same info.
It is, as there exist various database abstraction classes, somewhat similar to each other, a lot of people have worked with. Add to that the benefits of your editor not able to highlight the code and no way to immediately see syntax errors and similar.
On Wed, Dec 24, 2008 at 3:25 PM, Nikola Smolenski smolensk@eunet.yu wrote:
It is, as there exist various database abstraction classes, somewhat similar to each other, a lot of people have worked with.
If the domain-specific language is SQL-like, that benefit would exist for it too, in fact even more so.
Add to that the benefits of your editor not able to highlight the code and no way to immediately see syntax errors and similar.
Syntax errors in a domain-specific language would often be logic errors in something shoehorned on top of PHP arrays, which your editor won't be able to highlight anyway. Plus, if it's SQL-like (and why shouldn't it be?), your editor probably *will* be able to highlight the code.
Aryeh Gregor wrote:
On Wed, Dec 24, 2008 at 3:25 PM, Nikola Smolenski smolensk@eunet.yu wrote:
It is, as there exist various database abstraction classes, somewhat similar to each other, a lot of people have worked with.
If the domain-specific language is SQL-like, that benefit would exist for it too, in fact even more so.
Add to that the benefits of your editor not able to highlight the code and no way to immediately see syntax errors and similar.
Syntax errors in a domain-specific language would often be logic errors in something shoehorned on top of PHP arrays, which your editor won't be able to highlight anyway. Plus, if it's SQL-like (and why shouldn't it be?), your editor probably *will* be able to highlight the code.
Then we're going to option 2 instead of option 3. I also find option 2 better than 3, but from Jesús message I understood it was #3 which was being considered.
Hello.
After a few days of pondering the issues, I would like to explain what I suggested in my previous message, in more detail and (hopefully) more clearly.
What I'm about to say is pretty abstract, so it's difficult to convey the right meaning. Please forgive me if I say something you already know, or just nonsense :-)
Jesús Quiroga escribió:
I believe a better solution is to design a domain-specific language, an idea not very different from your first one. This DSL would model the interaction between the application and the DB as it is now, and would be designed to evolve. That's it.
The problem I discuss is how to best access the data store from an application. I believe the right answer is different for each project, but it's not difficult to evaluate the alternatives, one by one, in a given context. I think it is worthwhile to do that in the context of MediaWiki.
I will refer to wiki modules and databases as if they were 'hosts' connected to a 'network', to highlight the role of languages in the operation of the system at runtime.
The first way to access the data store is the 'direct' one:
[polyglot wiki] <--- mysDataL ---> [mysql] [polyglot wiki] <--- posDataL ---> [postgresql] [polyglot wiki] <--- db2DataL ---> [db2]
Here, the polyglot wiki module talks to every database using the proper languages. 'mysDataL' means 'the data language understood by MySQL', 'posDataL' means 'the data language understood by PostgreSQL', etc.
The polyglot wiki promises to learn several languages and to speak them correctly forever, so, if a new database comes along or any of their data languages evolves, the polyglot wiki is forced to adapt at a potentially great cost. Besides, any change to the database schema can trigger lots of updates to the wiki code, and be very costly too.
The advantages of this way are well known: it is fast, no need to do design, easy to understand. The drawbacks are apparently few, but devastating: verbose and complex code in multiple places in the wiki module, very costly to maintain, even more costly to evolve. All changes cost a lot, in time and effort.
The second way to access the data store that is usually considered is the 'indirect' one:
[wiki] <--- wikiDataL ---> [polyglot translator]
[polyglot translator] <--- mysDataL ---> [mysql] [polyglot translator] <--- posDataL ---> [postgresql] [polyglot translator] <--- db2DataL ---> [db2]
Here, wikiDataL means 'some relational data definition and manipulation language suitable for use by the wiki'.
The polyglot translator promises to learn wikiDataL and the other dialects and to evolve with them, so it has all the problems the wiki had in the direct way, but now the cost is lower because a lot of complexity is 'hidden' inside the translator and can't reach the wiki. As a result, wiki code is not updated as much, and it's much cleaner and less verbose.
The advantages of this way are: wiki module code is simpler, cost of evolution is reduced. The drawbacks are apparently many: it's slower, design is needed, harder to understand, a new language (wikiDataL), translator can be very complex. However, the need to reduce the cost to achieve change is usually so great that these inconveniences are minor in comparison.
Now the interesting bit begins. A third possible way to access the data store, the 'interpreted' one:
[wiki] <--- wikiNeedL ---> [polyglot interpreter]
[polyglot interpreter] <--- mysDataL ---> [mysql] [polyglot interpreter] <--- posDataL ---> [postgresql] [polyglot interpreter] <--- db2DataL ---> [db2]
Here, wikiNeedL means 'some language adequate for the wiki to express its data access needs and nothing else'.
wikiNeedL is the domain-specific language I wrote about in my previous message.
The differences between wikiDataL and wikiNeedL are mainly these: - wikiNeedL would contain just enough wiki concepts to express the wiki's needs, so it's effectively confined to that domain. wikiDataL belongs to the relational data model domain, which is quite different. - in general, wikiNeedL would have different semantics than the dialects understood by the databases, so the translation step becomes more like interpretation, rather than just syntactic transformations. wikiDataL usually has the same semantics than the dialects. - wikiNeedL would contain just enough concepts to satisfy current needs, and will be open to extension. wikiDataL aims to be general-purpose and to fulfill current and future needs.
The main reason to consider the 'interpreted' way is, of course, that it helps reduce even more the cost to achieve change.
So that's what I was talking about. I will say more about the differences between the indirect and the interpreted ways in a future message.
Thanks for your attention.
On Sat, Dec 27, 2008 at 1:23 AM, Jesús Quiroga jquiroga@pobox.com wrote:
The second way to access the data store that is usually considered is the 'indirect' one:
[wiki] <--- wikiDataL ---> [polyglot translator]
[polyglot translator] <--- mysDataL ---> [mysql] [polyglot translator] <--- posDataL ---> [postgresql] [polyglot translator] <--- db2DataL ---> [db2]
Here, wikiDataL means 'some relational data definition and manipulation language suitable for use by the wiki'.
This is what we currently use, and I don't think we're going to seriously consider changing it without some very compelling arguments being presented. Incremental improvements to our current way of doing things (cutting back on raw queries, moving MySQL-specific stuff from Database to DatabaseMySql, defining more clearly what Database methods mean and avoiding undefined behavior) seem entirely sufficient to allow support for any number of additional database backends.
The differences between wikiDataL and wikiNeedL are mainly these:
- wikiNeedL would contain just enough wiki concepts to express the
wiki's needs, so it's effectively confined to that domain. wikiDataL belongs to the relational data model domain, which is quite different.
- in general, wikiNeedL would have different semantics than the
dialects understood by the databases, so the translation step becomes more like interpretation, rather than just syntactic transformations. wikiDataL usually has the same semantics than the dialects.
- wikiNeedL would contain just enough concepts to satisfy current
needs, and will be open to extension. wikiDataL aims to be general-purpose and to fulfill current and future needs.
In practice, wikiNeedL would be drastically more complicated, if I understand you correctly. Its basic semantic units would be things like articles, users, revisions, etc., instead of rows, columns, and tables. We *have* a wikiNeedL, in fact: it's called "calling the appropriate Article method" or whatever. Most code doesn't have to manually do queries. Further abstraction of the database queries would be possible, but I question its usefulness.
Aryeh Gregor escribió:
On Sat, Dec 27, 2008 at 1:23 AM, Jesús Quiroga jquiroga@pobox.com wrote:
The second way to access the data store that is usually considered is the 'indirect' one:
[wiki] <--- wikiDataL ---> [polyglot translator]
[polyglot translator] <--- mysDataL ---> [mysql] [polyglot translator] <--- posDataL ---> [postgresql] [polyglot translator] <--- db2DataL ---> [db2]
Here, wikiDataL means 'some relational data definition and manipulation language suitable for use by the wiki'.
This is what we currently use, and I don't think we're going to seriously consider changing it without some very compelling arguments being presented. Incremental improvements to our current way of doing things (cutting back on raw queries, moving MySQL-specific stuff from Database to DatabaseMySql, defining more clearly what Database methods mean and avoiding undefined behavior) seem entirely sufficient to allow support for any number of additional database backends.
I understand you mean the indirect way is the 'official' way now, and that's is quite understandable. However, there are parts that 'lag behind' because they still use the obsolete direct way, and at the same time there are wikiNeedL phrases already in some other places, so I see that MediaWiki uses parts of all three. Usually, transitions from one way to other happen slowly, and feel like gentle evolutions, either towards improvement or towards degradation.
I agree, the indirect way can support many backends, no question. In any case, it will be a comparison between 'good' and 'better', or to be clearer, between 'cheap' and 'cheaper', or perhaps 'successful' and 'wildly successful'.
The differences between wikiDataL and wikiNeedL are mainly these:
- wikiNeedL would contain just enough wiki concepts to express the
wiki's needs, so it's effectively confined to that domain. wikiDataL belongs to the relational data model domain, which is quite different.
- in general, wikiNeedL would have different semantics than the
dialects understood by the databases, so the translation step becomes more like interpretation, rather than just syntactic transformations. wikiDataL usually has the same semantics than the dialects.
- wikiNeedL would contain just enough concepts to satisfy current
needs, and will be open to extension. wikiDataL aims to be general-purpose and to fulfill current and future needs.
In practice, wikiNeedL would be drastically more complicated, if I understand you correctly. Its basic semantic units would be things like articles, users, revisions, etc., instead of rows, columns, and tables. We *have* a wikiNeedL, in fact: it's called "calling the appropriate Article method" or whatever. Most code doesn't have to manually do queries. Further abstraction of the database queries would be possible, but I question its usefulness
I agree completely, some wikiNeedL is already in there, and it's implemented using method calls and their arguments, which is just fine and it works. I'm confident you'll see it's not that complicated.
For example, I found these in Article.php, method updateRedirectOn():
$dbw->replace( 'redirect', array( 'rd_from' ), $set, __METHOD__ );
$dbw->delete( 'redirect', $where, __METHOD__);
These I consider wikiNeedL phrases, and they're (almost) perfect. Some helper agent is located and told about some need that ought to be fulfilled, and what is needed is expressed clearly using wiki concepts. No details about how to do it are included, except for that string 'rd_from' which seems to be a column name in some relational schema, and that would be forbidden in wikiNeedL proper.
Of course, simple expressions are easier to manage, even trivially so, but they are good examples of the ideas I described.
Another example, consider the query service (https://wiki.toolserver.org/view/Query_service), where someone with a need to access the data store manages to do so by delegation, expressing the need using wiki concepts. Of course, they are humans, but the model is the same.
The usefulness of the interpreted way can be compellingly argued, and I plan to do so in a separate message. Finally, to get the full benefits of wikiNeedL, if it is deemed to be a good idea, I believe the first step should be to begin to steer its evolution in a top-down fashion, to analyze what is there, to bring it forward, and to make it more 'official'.
Best regards.
On Sun, Dec 28, 2008 at 12:57 AM, Jesús Quiroga jquiroga@pobox.com wrote:
For example, I found these in Article.php, method updateRedirectOn():
$dbw->replace( 'redirect', array( 'rd_from' ), $set, __METHOD__ );
$dbw->delete( 'redirect', $where, __METHOD__);
These I consider wikiNeedL phrases, and they're (almost) perfect. Some helper agent is located and told about some need that ought to be fulfilled, and what is needed is expressed clearly using wiki concepts. No details about how to do it are included, except for that string 'rd_from' which seems to be a column name in some relational schema, and that would be forbidden in wikiNeedL proper.
You do realize that 'redirect' is the name of a table, and $where is an SQL WHERE clause (possibly prettified into an array of some sort, but only as syntactic sugar), and "replace" and "delete" are methods that map directly into the MySQL REPLACE and DELETE statements, and $dbw stands for "database (writable)", yes? Those are thin wrappers around database queries. Are you suggesting that your wikiNeedL could be achieved by suitable choice of table and column names?
The usefulness of the interpreted way can be compellingly argued, and I plan to do so in a separate message. Finally, to get the full benefits of wikiNeedL, if it is deemed to be a good idea, I believe the first step should be to begin to steer its evolution in a top-down fashion, to analyze what is there, to bring it forward, and to make it more 'official'.
I think you'll need to be considerably more specific and less abstract before anyone is even going to be *able* to listen to what you're saying.
$dbw->replace( 'redirect', array( 'rd_from' ), $set, __METHOD__ ); $dbw->delete( 'redirect', $where, __METHOD__);These I consider wikiNeedL phrases, and they're (almost) perfect. Some helper agent is located and told about some need that ought to be fulfilled, and what is needed is expressed clearly using wiki concepts. No details about how to do it are included, except for that string 'rd_from' which seems to be a column name in some relational schema, and that would be forbidden in wikiNeedL proper.
I think you are way overthinking things in general, and failing to understand the existing code in the specific two examples above. Both contain a lot more "details about how to do it" than just the 'rd_from' column (see: 'redirect', $set, $where). It might be helpful if you took the relevant surrounding code (e.g. where $set is defined) and rewrote it in the newly proposed system so we can see exactly what you are talking about.
Jesús Quiroga wrote:
The polyglot wiki promises to learn several languages and to speak them correctly forever, so, if a new database comes along or any of their data languages evolves, the polyglot wiki is forced to adapt at a potentially great cost. Besides, any change to the database schema can trigger lots of updates to the wiki code, and be very costly too.
Would it be possible to put an end to this thread with a simple veto?
No, forget it, anyone caught committing such code will be shot on sight. The issue at hand is installation, the rest of the codebase works just fine with multiple DBMSs. Any remaining issues in the bulk codebase can be fixed using the established abstractions in the Database class.
I'd let it run, but I'm afraid this sort of talk might either scare off new developers (such as the OP), or fool them into wasting their time rewriting 100k lines of code in the mistaken belief that the proposal is somehow desirable.
MediaWiki's installer is dysfunctional, and various parts of the user interface are practically unusuable except by seasoned hackers who eat ground-up silicon chips for breakfast. Extension message loading is slow enough to be prohibitive for installations without APC, and uses massive amounts of memory. Let's get our priorities straight and avoid fixing the things that aren't broken.
-- Tim Starling
On Tue, Dec 23, 2008 at 3:08 AM, Tim Starling tstarling@wikimedia.org wrote:
Leons Petrazickis wrote:
What do I need to do to get commit access? My public key is: http://lpetr.org/personal/leo-public.pub
Excellent. Specify your preferred username.
leonsp, please.:-)
-- Leons Petrazickis http://lpetr.org/blog/
wikitech-l@lists.wikimedia.org