I'm working on a WebDAV interface to MediaWiki, based on the WebDAV module I contributed to the Gallery project: http://www.mediawiki.org/wiki/WebDAV
The goal of this project is: * To support connecting to MediaWiki with WebDAV clients like [http://www.webdav.org/cadaver/ cadaver] and [http://0pointer.de/lennart/projects/fusedav/ fusedav]. * To support integrating MediaWiki with editors that support WebDAV, like Emacs and Eclipse. * To explore MediaWiki article histories with WebDAV clients that support the WebDAV versioning extension, DeltaV. * To support connecting to MediaWiki with a Subversion client like the command line or Eclipse Subclipse plugin.
Connecting with Subversion is in the scope of this project because Subversion supports a protocol which is very close to WebDAV and DeltaV: http://subversion.tigris.org/webdav-usage.html
By supporting Subversion clients, I can edit MediaWiki articles using the Emacs version control mode and I can explore MediaWiki article histories using the Eclipse Subclipse plugin. If I maintain a software project's documentation in MediaWiki, I can use the Subversion [http://svnbook.red-bean.com/nightly/en/svn.advanced.externals.html externals] feature to checkout MediaWiki articles along with the source code. These can then be distributed with the project or converted to PDF or manpage using XSL as part of the build process.
So far I have implemented: * Some WebDAV features: GET, PUT, PROPFIND and DELETE. I can edit articles with cadaver, fusedav, Emacs and Eclipse. * Some DeltaV features: version-tree and baseline support. I can explore article histories and old revisions with cadaver. * Subversion checkout: I can checkout articles from the command line and explore article histories with Subclipse. Checkin will need support for the svndiff format, which happily is well documented in the Subversion source.
You can install it by executing in your MediaWiki root directory:
svn co http://svn.freegeek.org/svn/mediawiki-webdav/trunk .
However the code is still very "proof of concept" - I'm still figuring out how the code will be finally organized. Unless I can contribute this interface to the MediaWiki project, I guess it should be organized as a MediaWiki extension? However I'm still getting familiar with how MediaWiki delegates requests to extensions. Most WebDAV clients demand hierarchical URLs and don't support query strings, so I currently use two additional PHP landing pages in the MediaWiki root directory: * webdav.php handles WebDAV requests for articles like webdav.php/<MediaWiki_Article_Name> * deltav.php is responsible for DeltaV functionality. Its layout is based on Subversion's, e.g. deltav.php/ver/<Revision_ID>, deltav.php/bc/<Revision_ID>, etc.
If I continue using this layout, I will spend some time cleaning and reorganizing the code. But before I do, I'd love some feedback from MediaWiki developers: Is this a reasonable design? What are the alternatives to and the consequences of introducing these two new landing pages?
Eventually, I would like to move this project to the MediaWiki Subversion repository. Here is my SSH public key, signed with my GPG key: http://cgi.sfu.ca/~jdbates/tmp/freegeek/id_dsa.pub.gpg
My username is "jablko".
I look forward to your input on this project! Thanks, Jack
Hey Jack!
I very much like your ideas for implementing a WebDAV interface into MediaWiki. In fact, this is something I suggested back in February...
http://bugzilla.wikimedia.org/show_bug.cgi?id=9133
You might want to post your notes there to keep everything in one place. :)
Let me know if I can help you 'kick the tires' a bit...
On Thu, 14 Jun 2007 17:57:03 -0700, Jack Bates wrote:
However the code is still very "proof of concept" - I'm still figuring out how the code will be finally organized. Unless I can contribute this interface to the MediaWiki project, I guess it should be organized as a MediaWiki extension? However I'm still getting familiar with how MediaWiki delegates requests to extensions. Most WebDAV clients demand hierarchical URLs and don't support query strings, so I currently use two additional PHP landing pages in the MediaWiki root directory:
- webdav.php handles WebDAV requests for articles like
webdav.php/<MediaWiki_Article_Name>
- deltav.php is responsible for DeltaV functionality. Its layout is
based on Subversion's, e.g. deltav.php/ver/<Revision_ID>, deltav.php/bc/<Revision_ID>, etc.
If I continue using this layout, I will spend some time cleaning and reorganizing the code. But before I do, I'd love some feedback from MediaWiki developers: Is this a reasonable design? What are the alternatives to and the consequences of introducing these two new landing pages?
Without a query string, the alternative would be to use special pages, which should be able to get the string; i.e. Special:WebDav/Article
I guess the consequence of using landing pages is that it's harder for people without special clients to find them, which seems fairly minor since this extension probably wouldn't be useful then.
On 6/14/07, Jack Bates ms419@freezone.co.uk wrote:
However the code is still very "proof of concept" - I'm still figuring out how the code will be finally organized. Unless I can contribute this interface to the MediaWiki project, I guess it should be organized as a MediaWiki extension? However I'm still getting familiar with how MediaWiki delegates requests to extensions. Most WebDAV clients demand hierarchical URLs and don't support query strings, so I currently use two additional PHP landing pages in the MediaWiki root directory:
- webdav.php handles WebDAV requests for articles like
webdav.php/<MediaWiki_Article_Name>
- deltav.php is responsible for DeltaV functionality. Its layout is
based on Subversion's, e.g. deltav.php/ver/<Revision_ID>, deltav.php/bc/<Revision_ID>, etc.
If I continue using this layout, I will spend some time cleaning and reorganizing the code. But before I do, I'd love some feedback from MediaWiki developers: Is this a reasonable design? What are the alternatives to and the consequences of introducing these two new landing pages?
It sounds like a reasonable approach for the external appearance. Unfortunately, it will require mod_rewrite or equivalent, in that the default behavior of HTTP servers would require that query strings be used, but from what you say (I'm not familiar with WebDAV) that might be unavoidable. It's akin to the query.php and api.php that we have now. Alternatively, perhaps you could talk with Yurik about modifying api.php so you could have api.php/WebDAV/ as your root, which would probably make more sense.
Eventually, I would like to move this project to the MediaWiki Subversion repository. Here is my SSH public key, signed with my GPG key: http://cgi.sfu.ca/~jdbates/tmp/freegeek/id_dsa.pub.gpg
My username is "jablko".
That should be possible to arrange immediately. Just e-mail Brion.
Having WebDAV interface sounds very very cool, but under no circumstances should we introduce yet another direct database access layer.
At present, Wiki is a web based single tier application -- the code that modifies the databases is intermixed with the UI code that renders web pages.
The API has to duplicate some of the db access logic, together with various security validations, to provide useful services. This obviously results in a major architectural NO-NO: any change to db access logic has to be done in two places (usually more). Hence the delay with implementing commit through API -- the internal data commit code simply cannot be duplicated, but instead must be cleanly separated into the commit logic and the Web UI logic, so that the API can call the same code as UI.
It seems that you have (partially?) accomplished that separation and I hope we can commit it to the trunk.
Hopefully at some point in the distant future we will have a "business logic tier" that will be the only code that access databases, and all UIs/WebDav/etc will only interact with it, not db. The current API implementation might be considered a starting point, but it lacks in several important areas like commit and caching. We should probably discuss it during the hacking days.
--Yurik
On 6/14/07, Simetrical Simetrical+wikilist@gmail.com wrote:
On 6/14/07, Jack Bates ms419@freezone.co.uk wrote:
However the code is still very "proof of concept" - I'm still figuring out how the code will be finally organized. Unless I can contribute this interface to the MediaWiki project, I guess it should be organized as a MediaWiki extension? However I'm still getting familiar with how MediaWiki delegates requests to extensions. Most WebDAV clients demand hierarchical URLs and don't support query strings, so I currently use two additional PHP landing pages in the MediaWiki root directory:
- webdav.php handles WebDAV requests for articles like
webdav.php/<MediaWiki_Article_Name>
- deltav.php is responsible for DeltaV functionality. Its layout is
based on Subversion's, e.g. deltav.php/ver/<Revision_ID>, deltav.php/bc/<Revision_ID>, etc.
If I continue using this layout, I will spend some time cleaning and reorganizing the code. But before I do, I'd love some feedback from MediaWiki developers: Is this a reasonable design? What are the alternatives to and the consequences of introducing these two new landing pages?
It sounds like a reasonable approach for the external appearance. Unfortunately, it will require mod_rewrite or equivalent, in that the default behavior of HTTP servers would require that query strings be used, but from what you say (I'm not familiar with WebDAV) that might be unavoidable. It's akin to the query.php and api.php that we have now. Alternatively, perhaps you could talk with Yurik about modifying api.php so you could have api.php/WebDAV/ as your root, which would probably make more sense.
Eventually, I would like to move this project to the MediaWiki Subversion repository. Here is my SSH public key, signed with my GPG key: http://cgi.sfu.ca/~jdbates/tmp/freegeek/id_dsa.pub.gpg
My username is "jablko".
That should be possible to arrange immediately. Just e-mail Brion.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Yuri Astrakhan wrote:
Having WebDAV interface sounds very very cool, but under no circumstances should we introduce yet another direct database access layer.
At present, Wiki is a web based single tier application -- the code that modifies the databases is intermixed with the UI code that renders web pages.
The API has to duplicate some of the db access logic, together with various security validations, to provide useful services.
The machine-readable API should *never* be duplicating any database access logic. Any remaining cases where DB code is intermixed with UI need to be refactored, as have been many places in the code already.
As I recommended before, and as I continue to recommend, nothing should be going into the API without doing that refactoring. Any time you add new DB code into the API, it's a mistake.
- -- brion vibber (brion @ wikimedia.org)
Brion, I agree that API should not duplicate DB access, but unfortunately most of the core code was targeted towards a single page request. Only some special pages return data for multiple items, and from what I understood, they are not easy to refactor to just get the data for the API (I might be wrong). Hence most normal wiki operations seem to be a special subset the theoretical internal API (e.g. just need content of a single page whereas API may provide content of multiple pages) - which validates the separate biz logic tier idea.
On 6/15/07, Brion Vibber brion@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Yuri Astrakhan wrote:
Having WebDAV interface sounds very very cool, but under no circumstances should we introduce yet another direct database access layer.
At present, Wiki is a web based single tier application -- the code that modifies the databases is intermixed with the UI code that renders web pages.
The API has to duplicate some of the db access logic, together with various security validations, to provide useful services.
The machine-readable API should *never* be duplicating any database access logic. Any remaining cases where DB code is intermixed with UI need to be refactored, as have been many places in the code already.
As I recommended before, and as I continue to recommend, nothing should be going into the API without doing that refactoring. Any time you add new DB code into the API, it's a mistake.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGcpr0wRnhpk1wk44RArpGAKCDeyi0RLMXU6JF4V3fuqkeqnkLRwCgm+bN ZoJboufUWdg0p9UjTmwg9y0= =BtA/ -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Yuri Astrakhan wrote:
Brion, I agree that API should not duplicate DB access, but unfortunately most of the core code was targeted towards a single page request. Only some special pages return data for multiple items, and from what I understood, they are not easy to refactor to just get the data for the API (I might be wrong). Hence most normal wiki operations seem to be a special subset the theoretical internal API (e.g. just need content of a single page whereas API may provide content of multiple pages) - which validates the separate biz logic tier idea.
Seems to me it validates refactoring the db access tier.
- -- brion vibber (brion @ wikimedia.org)
It depends on what you mean by DB access tier. I thought that usually ment database vendor independence, whereas business logic tier would handle:
* authentication * retrieval of "object oriented" data in raw wiki markup, both single page and in bulk, lists, system settings, etc. * access rights validation * committing changes back to DB * Markup => HTML rendering should be a separate library. This way both Web clients and API may convert wiki markup into HTML, even as a non-db-commiting service (converts input markup into HTML).
An API layer on top should provide access for external applications as well as cross-wiki transclusions and HTML rendering services.
The regular web interface should never directly access the DB, but only use the biz-logic tier. Alternative UI such as WAP would follow the same pattern.
--Yurik
On 6/15/07, Brion Vibber brion@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Yuri Astrakhan wrote:
Brion, I agree that API should not duplicate DB access, but unfortunately most of the core code was targeted towards a single page request. Only some special pages return data for multiple items, and from what I understood, they are not easy to refactor to just get the data for the API (I might be wrong). Hence most normal wiki operations seem to be a special subset the theoretical internal API (e.g. just need content of a single page whereas API may provide content of multiple pages) - which validates the separate biz logic tier idea.
Seems to me it validates refactoring the db access tier.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGctaLwRnhpk1wk44RAsCFAKCT0SySOTkC1z/qf9O4urUVCgPM0ACeNgux +xYDr29+3a72CBw7f1X2kv0= =2NdX -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
I guess it should look something like this, is what Brion's saying?
1) Database abstraction layer. Contains low-level methods like "select" and "replace". Basically our current Database class and descendants. 2) Database access layer. No calls to database abstraction layer should ever be made outside of here (maybe enforced eventually by inheritance and protected methods, or something, since PHP doesn't have friends). Classes in this layer might include Article, Title, User, and others closely tied to the database. Special pages and complicated API requests would involve new functions added to these, I guess, like fetchArticleIds(matching criteria) or whatever. 3) Display layer. Translates user request (through API, WebDAV, human web interface, ...) into series of calls to database access layer. Would include classes like the current API, Linker and descendants, web request initialization stuff, and special pages (but much special page logic would be moved to things like Article). These would of course be grouped according to various access methods, which would be independent of each other. i) General library functions that don't involve the database. Includes parser, utility functions for XML/strings/..., etc. Some might be called by any layer, conceivably (like string or IP address functions), others would probably only be called by the display layer (parser classes). Includes classes like Parser and friends, Xml, StringFunctions, IP, etc.
Is that sort of what you're thinking, Brion? It would require the database access layer to be *very* general, so general that it's not easy to see how it would be less complicated than just dropping it and writing the SQL queries manually as we do now, other than that it makes schema changes easier (which is nice, but if we wanted to do that it would probably be no harder to rewrite queries referring to obsolete tables or columns on the level of Database::select and whatnot). I also can't see how extensions would fit into this, in that they may well have to run literal SQL queries that we didn't anticipate.
Yuri,
Brion, I agree that API should not duplicate DB access, but unfortunately most of the core code was targeted towards a single page request.
It is not just core code, it is also all architecture. Our main business is presenting people with rendered wiki pages.
Only some special pages return data for multiple items, and from what I understood, they are not easy to refactor to just get the data for the API (I might be wrong).
Many special pages are a joke (due to 1000 limit), and they continue to die.
Hence most normal wiki operations seem to be a special subset the theoretical internal API (e.g. just need content of a single page whereas API may provide content of multiple pages) - which validates the separate biz logic tier idea.
We do more just than retrieving data - we end up crunching it, rerunning related queries (e.g. linkbatches), etc. The distance of UI and data access code is religious debate, but I still feel that frontend developers should understand what is needed at the backend to fulfill the task. Probably that is not that common in enterprise-y world, where abstractions are really common, but on the other hand, we're not running on enterprise-y budget.
API seems to be the work to solve many hypothetical problems, whereas mediawiki has a very specific task to handle. Now, throwing away the code and putting anything generic in between would not help with efficiency. Many of tasks we're used to do efficiently would fail quite a bit if say ActiveRecord would be used.
Generic code usually works as long as there's nobody actively using it (what applies to all API code, whenever someone starts using any function, we have either to disable or adapt bits..)
*shrug*, I feel that anyone telling about "we should have separate biz-logic level" doesn't really know biz-logic of mediawiki - but probably I'm wrong.
BR,
On 16/06/07, Domas Mituzas midom.lists@gmail.com wrote:
We do more just than retrieving data - we end up crunching it, rerunning related queries (e.g. linkbatches), etc. The distance of UI and data access code is religious debate, but I still feel that frontend developers should understand what is needed at the backend to fulfill the task. Probably that is not that common in enterprise-y world, where abstractions are really common, but on the other hand, we're not running on enterprise-y budget.
API seems to be the work to solve many hypothetical problems, whereas mediawiki has a very specific task to handle. Now, throwing away the code and putting anything generic in between would not help with efficiency. Many of tasks we're used to do efficiently would fail quite a bit if say ActiveRecord would be used.
Generic code usually works as long as there's nobody actively using it (what applies to all API code, whenever someone starts using any function, we have either to disable or adapt bits..)
*shrug*, I feel that anyone telling about "we should have separate biz-logic level" doesn't really know biz-logic of mediawiki - but probably I'm wrong.
I think this is an excellent point, and well summarised. Let's face it - we don't have a large, salaried, professional development team, nor a large, salaried systems administration team. We have to use every trick in the book (and quite a few that aren't in the book) to keep things running efficiently, and quite often, you'll notice this in the form of Domas making apparently subtle updates to the code, for example, which make a hell of a difference.
I don't think we can afford to sacrifice a large chunk of our existing methods by completely refactoring MediaWiki to the extent that all "business logic" and "UI code" are separate, but I *do* think there are distinct areas where this could probably be done without too much hassle; query pages (or their successor), login, etc.
I don't think it's appropriate to mindlessly refactor a lot of the existing stuff for page views, editing, histories, all the common operations the main application is used for..."for the sake of it", or because some "consultant" drew a bunch of pictures and convinced us all.
The right balance needs to be found.
Rob Church
wikitech-l@lists.wikimedia.org