In case anyone hasn't noticed, the number of MediaWiki extensions in existence has soared in the previous year. They are scattered all around the internet and it is a chore to make sure all of your extensions are up to date.
In an attempt to alleviate the confusion of managing extensions, I propose a more formal extension system.
Step 1: Overhaul how MediaWiki deals with extensions. Loading an extension via 'require_once' is silly and has all sorts of limitations (for example, if your extension file which modifies $wgExtensionFunctions is loaded from within a function, $wgExtensionFunctions won't actually get modified unless it is brought into scope of the calling function). In addition, there is no easy way to tell if an extension is a special page extension, parser hook extension, combination, etc. In my proposed system, MediaWiki extensions would all be derived from a base 'Extension" class. There would be interfaces that would allow extensions to become a SpecialPage extension, parser extension, hook extension, etc. Furthermore, if extensions were packaged as a class, we could give the base extension class useful variables, such as "sourceURL" which would allow developers to provide a URL to the most up-to-date version of an extension. Of course, the ultimate benefit to turning extensions into classes is that it would make developing extensions easier since OOP gives you a building block for your work, not a clean slate.
Step 2: Write a manager for MediaWiki that allows you to load and upgrade extensions remotely. Want to upgrade an extension? Just go to a special page, hit the button to refresh the list for updates, and click the checkbox next to the extension you want to update.
Critics out there will retort that this will slow things down. Yes, it won't be as fast as explicitly typing require_once in LocalSettings.php. However, the system could also be designed with speed in mind. For example, it would be possible to serialize all the loaded extension objects into a file (or shared memory) which is loaded for every page request. I take this approach with my new Farmer extension ( http://www.mediawiki.org/wiki/User:IndyGreg/Farmer), which allows you to specify which extensions are loaded via a web interface. The performance hit is negligible.
Thoughts?
Greg
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
A lot of the stuff sounds cool, so don't take it personally if I'm harsh. I'm presenting logical concerns as I see it.
Gregory Szorc wrote:
In case anyone hasn't noticed, the number of MediaWiki extensions in existence has soared in the previous year. They are scattered all around the internet and it is a chore to make sure all of your extensions are up to date.
Well, perhaps one should be a bit more discerning on which MediaWiki extensions to pick. If they're pre-beta and don't have an update announce-list, is this something you really want to be running?
Step 1: Overhaul how MediaWiki deals with extensions. Loading an extension via 'require_once' is silly and has all sorts of limitations (for example, if your extension file which modifies $wgExtensionFunctions is loaded from within a function, $wgExtensionFunctions won't actually get modified unless it is brought into scope of the calling function).
That shouldn't be a problem: the includes should be done unconditionally and the registered extension functions should serve only to initialize the extension.
In addition, there is no easy way to tell if an extension is a special page extension, parser hook extension, combination, etc.
I agree, that is troublesome, and is addressed below.
In my proposed system, MediaWiki extensions would all be derived from a base 'Extension" class. There would be interfaces that would allow extensions to become a SpecialPage extension, parser extension, hook extension, etc. Furthermore, if extensions were packaged as a class, we could give the base extension class useful variables, such as "sourceURL" which would allow developers to provide a URL to the most up-to-date version of an extension. Of course, the ultimate benefit to turning extensions into classes is that it would make developing extensions easier since OOP gives you a building block for your work, not a clean slate.
I'm not the Extension supertype would make much sense, as extensions in different areas of MediaWiki have radically different needs/APIs. As for SpecialPage, there's already the class you describe. sourceURL is already implemented with $wgExtensionCredits['extension-name'][] = array( 'url' => 'URL' ); (this needs to be better documented).
Overall, read Tim Starling's recent proposal ( http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/250... ) to overhaul large sections of code into modules in order to improve performance. Also, read about one developer's sentiments on breaking backwards compatibility with extensions ( http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/250... ). Work will move forward, albeit slowly. Tread carefully.
Step 2: Write a manager for MediaWiki that allows you to load and upgrade extensions remotely. Want to upgrade an extension? Just go to a special page, hit the button to refresh the list for updates, and click the checkbox next to the extension you want to update.
This, of course, would be an extension itself.
I would argue that such a feature, though neat, would never see the light of day in svn.wikimedia.org, because there are far too many implications:
* Extension trust - what this essentially means is that you give a group (often one person) rights to arbitrarily write code on your server. Sure, we could implement code diffs, but not everyone has the expertise to audit everything that goes on their server. Existing systems like the PEAR installer go through a regulatory system, and plus appeal mainly to people savvy enough to get PEAR installed in the first place.
* PHP code is writeable - It is often a good idea not to give your PHP scripts write access to the web directory. This prevents someone from exploiting a script installed and then writing something else into a directory. Granted, however, that this is only for those who keep an eye on security, the rest will just 777 their web directories...
* Running off the web is awkward - There's a reason why MediaWiki's upgrade.php is meant for command line: upgrading can take a long time, and it's not good for web scripts to just hang like that when you need to upgrade.
* You can already do it via Subversion - `svn up` anyone? (Of course, you need SSH access though, but it's mind-numbingly easy. I use this method to keep a tip-top development copy of MediaWiki and its extensions on my computer as well as production installation on a public website (Dreamhost for the win!)).
Critics out there will retort that this will slow things down. Yes, it won't be as fast as explicitly typing require_once in LocalSettings.php. However, the system could also be designed with speed in mind. For example, it would be possible to serialize all the loaded extension objects into a file (or shared memory) which is loaded for every page request. I take this approach with my new Farmer extension ( http://www.mediawiki.org/wiki/User:IndyGreg/Farmer), which allows you to specify which extensions are loaded via a web interface. The performance hit is negligible.
Tim Starling's proposal! Although: serializing the extension objects into a file doesn't prevent the need for require'ing them. That's irrelevant, however, because if includes are the issue, you really ought to install a compiler cache.
Thoughts?
You asked, I gave 'em. Feel free to retort!
Gregory Szorc wrote:
In case anyone hasn't noticed, the number of MediaWiki extensions in existence has soared in the previous year. They are scattered all around the internet and it is a chore to make sure all of your extensions are up to date.
In an attempt to alleviate the confusion of managing extensions, I propose a more formal extension system.
Step 1: Overhaul how MediaWiki deals with extensions. Loading an extension via 'require_once' is silly and has all sorts of limitations (for example, if your extension file which modifies $wgExtensionFunctions is loaded from within a function, $wgExtensionFunctions won't actually get modified unless it is brought into scope of the calling function). In addition, there is no easy way to tell if an extension is a special page extension, parser hook extension, combination, etc. In my proposed system, MediaWiki extensions would all be derived from a base 'Extension" class. There would be interfaces that would allow extensions to become a SpecialPage extension, parser extension, hook extension, etc. Furthermore, if extensions were packaged as a class, we could give the base extension class useful variables, such as "sourceURL" which would allow developers to provide a URL to the most up-to-date version of an extension. Of course, the ultimate benefit to turning extensions into classes is that it would make developing extensions easier since OOP gives you a building block for your work, not a clean slate.
Step 2: Write a manager for MediaWiki that allows you to load and upgrade extensions remotely. Want to upgrade an extension? Just go to a special page, hit the button to refresh the list for updates, and click the checkbox next to the extension you want to update.
Critics out there will retort that this will slow things down. Yes, it won't be as fast as explicitly typing require_once in LocalSettings.php. However, the system could also be designed with speed in mind. For example, it would be possible to serialize all the loaded extension objects into a file (or shared memory) which is loaded for every page request. I take this approach with my new Farmer extension ( http://www.mediawiki.org/wiki/User:IndyGreg/Farmer), which allows you to specify which extensions are loaded via a web interface. The performance hit is negligible.
Thoughts?
Please read my own proposal for reworking the extension interface at:
http://mail.wikipedia.org/pipermail/wikitech-l/2006-July/037035.html
Posted to this list 10 days ago.
500us per extension at startup time is too slow, I'm aiming to make extensions much faster than they are at the moment, not slightly slower. Under your proposal, serializing extension classes doesn't help much, since the code would still have to be loaded before unserialization could take place. You describe no system for registering capabilities, besides a vague suggestion that there would be an "interface" for various things. When a request comes in for Special:Whatchmacallit, or the parser encounters a {{#foo:xyz}} tag, how does it determine what function it needs to call? If everything about the extension is embodied in functional interfaces of the extension object, then hundreds of lines of code will need to be executed on startup to provide for registration. This is unacceptable. My suggestion is to use a module specification file to provide:
a) a map between capabilities and callbacks b) a map between classes and files, for autoloading
The classes containing the hook functions could be derived from some common Extension parent class, assuming some use could be found for such a paradigm. Since the essential thing which turns a class into an extension would be the specification file, there might not be much need for functionality in a common base class.
-- Tim Starling
Please read my own proposal for reworking the extension interface at:
http://mail.wikipedia.org/pipermail/wikitech-l/2006-July/037035.html
Posted to this list 10 days ago.
Forgive my ignorance. I read the first few paragraphs of the post when it was originally sent and ignored the rest, just thinking it was another Wikipedia-only message. Now, having read it...
I do like your proposal for static objects being initialized as-needed. There is great power in the just-in-time object::getInstance() method. However, one of my criticisms of MediaWiki's architecture has always been the over dependence on global objects, which are in some ways like static classes using the Singleton pattern (see http://blog.case.edu/gps10/2006/07/22/why_global_variables_in_php_is_bad_pro... why I don't like global objects). I would much rather see the Wiki class contain these "global objects" as static variables which can be accessed via a just-in-time getObject() static call to the Wiki class. This sounds like the same approach as the proposed wfGetFoo() methods (it basically is), but polluting the global symbol table with objects and functions not belonging to classes is unecessary when these could all belong to a master Wiki class. If you don't buy the "don't do it because you'd be polluting the symbol table" argument, do it for the sake of keeping everything organized into classes. Do wfGetFoo() functions really belong in the global namespace, or do they belong to a class representing a wiki? Hell, if you get rid of all the global functions and attach them to existing classes, that is one less file to include! </rant on global objects>
If you are talking about 500us, why are there still require and require_once calls in the trunk? These both require system calls (require_once actually requires an additional one and hence is slower). I know work has been done developing the __autoload function (if you ever commit to 5.1, spl_autoload_register() is preferred), but at the level of commitment you give to performance, every require_once has to seem like a monkey on your back.
Also, how do you accurately profile MediaWiki? I've used xdebug and Kcachegrind to profile scripts before, but it always bothers me because I cannot use xdebug alongside APC or eaccelerator to get results reflective of my production deployment. I know APC and eaccelerator completely change the chokepoints, but it is impossible for me to see what the new chokepoints are! Can you feed MediaWiki's internal profiling output into Kcachegrind?
Now, getting back to the topic of extensions. For a base extension class, I was thinking of an abstract class that has numerous methods, providesSpecialPage(), providesHook(), providesWhichHooks(), providesParserTags(), etc. Let's say we establish a defined extensions root directory. When MediaWiki loads, it periodically checks this directory for all files representing extensions and loads them (perhaps this is triggered manually via CRON, Special Page, filemtime(), etc). When the extensions are loaded from the directory, a map is established that records the abilities of each. This map is serialized for quick retrieval. Whenever MediaWiki loads, it just goes to the map and loads extensions just-in-time. This would require an extension manager class that would initialize extensions as called for by the map. For example, when the parser sees a tag it doesn't recognize, it would go ExtensionManager::getExtensionForParserTag($foo)->parse($content); Or, when a special page is called, we have ExtensionManager::executeSpecialPage($foo); For hooks, the same deal.
I like the idea for a map between capabilities and callbacks, etc, but I don't like the idea of a module specification file. Why should you need to provide a specification file when the same information can be obtained from methods inherited from a base extension class? As long as you cache the output of these methods, there is zero performance overhead and extensions have the added bonus of being much more structured. Yes, it would break existing functionality. But if you are already talking about making just-in-time calls to instantiate global objects like $wgTitle, $wgOut, etc, then many existing extensions will be broken anyway. Sometimes you just have to make sacrifices for the sake of progress.
Just my $0.02
Greg
Gregory Szorc wrote:
Please read my own proposal for reworking the extension interface at:
http://mail.wikipedia.org/pipermail/wikitech-l/2006-July/037035.html
Posted to this list 10 days ago.
Forgive my ignorance. I read the first few paragraphs of the post when it was originally sent and ignored the rest, just thinking it was another Wikipedia-only message. Now, having read it...
I do like your proposal for static objects being initialized as-needed. There is great power in the just-in-time object::getInstance() method. However, one of my criticisms of MediaWiki's architecture has always been the over dependence on global objects, which are in some ways like static classes using the Singleton pattern (see http://blog.case.edu/gps10/2006/07/22/why_global_variables_in_php_is_bad_pro... why I don't like global objects). I would much rather see the Wiki class contain these "global objects" as static variables which can be accessed via a just-in-time getObject() static call to the Wiki class. This sounds like the same approach as the proposed wfGetFoo() methods (it basically is), but polluting the global symbol table with objects and functions not belonging to classes is unecessary when these could all belong to a master Wiki class. If you don't buy the "don't do it because you'd be polluting the symbol table" argument, do it for the sake of keeping everything organized into classes. Do wfGetFoo() functions really belong in the global namespace, or do they belong to a class representing a wiki? Hell, if you get rid of all the global functions and attach them to existing classes, that is one less file to include! </rant on global objects>
I'll answer this at the end.
If you are talking about 500us, why are there still require and require_once calls in the trunk? These both require system calls (require_once actually requires an additional one and hence is slower). I know work has been done developing the __autoload function (if you ever commit to 5.1, spl_autoload_register() is preferred), but at the level of commitment you give to performance, every require_once has to seem like a monkey on your back.
I got rid of about half of the require_once calls from Setup.php in localisation-work. The remaining ones are mostly for global functions.
Also, how do you accurately profile MediaWiki? I've used xdebug and Kcachegrind to profile scripts before, but it always bothers me because I cannot use xdebug alongside APC or eaccelerator to get results reflective of my production deployment. I know APC and eaccelerator completely change the chokepoints, but it is impossible for me to see what the new chokepoints are!
http://noc.wikimedia.org/cgi-bin/report.py
Data is generated with ProfilerSimpleUDP. Averaging over a million requests gives you excellent accuracy, thanks to the central limit theorem. However, that data may be subject to slight systemic inaccuracies due to the profiling overhead.
Can you feed MediaWiki's internal profiling output into Kcachegrind?
No.
Now, getting back to the topic of extensions. For a base extension class, I was thinking of an abstract class that has numerous methods, providesSpecialPage(), providesHook(), providesWhichHooks(), providesParserTags(), etc. Let's say we establish a defined extensions root directory. When MediaWiki loads, it periodically checks this directory for all files representing extensions and loads them (perhaps this is triggered manually via CRON, Special Page, filemtime(), etc). When the extensions are loaded from the directory, a map is established that records the abilities of each. This map is serialized for quick retrieval. Whenever MediaWiki loads, it just goes to the map and loads extensions just-in-time. This would require an extension manager class that would initialize extensions as called for by the map. For example, when the parser sees a tag it doesn't recognize, it would go ExtensionManager::getExtensionForParserTag($foo)->parse($content); Or, when a special page is called, we have ExtensionManager::executeSpecialPage($foo); For hooks, the same deal.
I like the idea for a map between capabilities and callbacks, etc, but I don't like the idea of a module specification file. Why should you need to provide a specification file when the same information can be obtained from methods inherited from a base extension class? As long as you cache the output of these methods, there is zero performance overhead and extensions have the added bonus of being much more structured. Yes, it would break existing functionality. But if you are already talking about making just-in-time calls to instantiate global objects like $wgTitle, $wgOut, etc, then many existing extensions will be broken anyway. Sometimes you just have to make sacrifices for the sake of progress.
You obviously also missed my post on stub globals. After I made the first post, I discovered a method for painless migration to deferred object initialisation, and I made that the topic of a second post. I share your concerns about the flexibility of global variables, in fact my discussion of the issue in phase3/docs/globals.txt mirrors your blog post very closely. But there doesn't seem to be any pressing need to sacrifice backwards compatibility while we pursue this goal. But a singleton object shares all the flexibilty problems of global variables, so that's not a solution.
You make a good point about the fact that capabilities can be provided by a member function and cached. There's still no need to sacrifice backwards compatibility though, that I can see. We can keep both the ability of extensions to operate across multiple MediaWiki versions, and the ability for most old extensions to continue to work properly in new versions of MediaWiki, if we design the interface carefully enough. If backwards compatibility for old extensions proves to be too much of a performance burden, then we can drop it after a couple of releases. What I want to avoid is the requirement that extensions be simultaneously updated along with the core. That's a hassle for both site administrators and extension developers. Especially since many extensions are unreleased, their versions unnumbered.
-- Tim Starling
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
...
Especially since many extensions are unreleased, their versions unnumbered.
Ugh. Perhaps we can strongly encourage extensions to have a standard (x.y.z) version string returned via a version() function? Or perhaps mandate use of the version element inside of wgExtensionCredits?
- -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200607240757 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
Hi!
Also, how do you accurately profile MediaWiki? I've used xdebug and Kcachegrind to profile scripts before, but it always bothers me because I cannot use xdebug alongside APC or eaccelerator to get results reflective of
Strange, I use xdebug alongside APC. And we use that when we do xdebug profiling runs on cluster.
my production deployment. I know APC and eaccelerator completely change the chokepoints, but it is impossible for me to see what the new chokepoints are! Can you feed MediaWiki's internal profiling output into Kcachegrind?
I'm not sure if there'd be lots of sense there. Of course, one can write compatible trace log, but number of events is carefully crafted to provide with general overview rather than what one would do with kcachegrind+xdebug.
Cheers, Domas
-- http://dammit.lt/ or [[user:midom]]
On 7/24/06, Domas Mituzas midom.lists@gmail.com wrote:
Hi!
Also, how do you accurately profile MediaWiki? I've used xdebug and Kcachegrind to profile scripts before, but it always bothers me because
I
cannot use xdebug alongside APC or eaccelerator to get results
reflective of
Strange, I use xdebug alongside APC. And we use that when we do xdebug profiling runs on cluster.
So what is this warning, straight from xdebug's site about:
Xdebug does not work together with the Zend Optimizer or any other Zend extension (DBG, APC, APD etc). This is due to compatibility problems with those modules. We will be working on figuring out what the problems are, and of course try to fix those.
That was scary enough for me to avoid the combination.
Greg
Hi!
So what is this warning, straight from xdebug's site about:
Warnings? Who reads warnings?
Xdebug does not work together with the Zend Optimizer or any other Zend extension (DBG, APC, APD etc). This is due to compatibility problems with those modules. We will be working on figuring out what the problems are, and of course try to fix those.
Actually I loaded it as regular extension, not zend extension. It complained about that, but profiling worked, and that was what I needed. Of course, karma matters too.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Moin,
On Monday 24 July 2006 02:58, Gregory Szorc wrote:
In case anyone hasn't noticed, the number of MediaWiki extensions in existence has soared in the previous year. They are scattered all around the internet and it is a chore to make sure all of your extensions are up to date.
Thanx for the list of cool ideas. I can't comment to much on them, especialyl the "turn-it-into-a-class" parts.
As for the distribution, upload, update etc: I do think what you need is a CPAN for mediawiki extensions. See http://search.cpan.org as example.
(Yes, my extensions are released there for a lack of a better place to put them in).
CPAN makes it easy index, package, upload, search and download (perl) packages. And if the packages are done right (which CPAN testers ensure), you can even automatically unpack, build and test them. (even if the package doesn't actually contain a Perl package :-D
In fact, I thin instead of reinventing CPAN, it could be just (ab-?) used. Well, actually I already used it :)
See http://search.cpan.org/~tels/mediawiki-graph/ :)
Updating all installed extensions could then be done by a very lightweight PHP script which simple fetches the latest version from CPAN and tests and installs it.
Of course, the maintainers of CPAN would probably want to hear of this before 100+ non-perl packages hit their server :)
Best wishes,
Tels
- -- Signed on Mon Jul 24 18:08:57 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"I intend to live forever, or die trying." -- Groucho Marx
wikitech-l@lists.wikimedia.org