I've been doing a lot of thinking lately about globals and their place in MediaWiki in the long term. I rewrote globals.txt to reflect the fact that PHP does not love globals, in fact the need for a declaration to bring globals into the local scope puts it among the more global-hostile languages.
In many cases, use of globals obscures data flow and makes classes less flexible, inhibiting reuse. This is patently true in the case of $wgTitle and $wgArticle, the existence of which encourage lazy programmers to write code which fails in the common case where more than one of these objects exist. At present, these two objects are almost exclusively used in the output phase, so it would make sense to make them members of OutputPage or Skin instead of globals.
The most extreme anti-global architecture would be one involving application objects:
$mw = new MediaWiki; $mw->executeWebRequest();
The application object could theoretically be passed to most class constructors, providing a form of global context. That, however, would make writing new classes a bit tedious. In my experience, it turns out to be easier to make the application object a global, and pull it in wherever it is needed. This would have advantages when MediaWiki needs to be embedded as a library, since it keeps the global scope cleaner, but it's not really more flexible than what we're doing now.
After some thinking, I was forced to admit that there are some cases where globals make sense, from a data flow perspective. The clearest example is caching. A cache should have the widest possible scope. If you have two application objects, you would want them to share the same caches if possible. Indeed, it's better if different threads, processes and even servers can share their caches.
There are, however, disadvantages to using global variables for this or any other similar purpose. The problem is that the use of global variables inhibit lazy initialisation. The familiar solution is to use an accessor function, and indeed this approach has already been implemented in several places in MediaWiki. I would like to make such accessor functions more pervasive.
There is also the problem that the global namespace is somewhat crowded. Using a global function for an accessor just moves this problem to somewhere else. The alternative is to use a static class member as an accessor. This concept is well known, and where the static object is the only one ever needed, the object is called a singleton. The PHP 5 manual recommends calling the accessor function singleton(), and I'll go along with that despite personally preferring getInstance().
The disadvantage to the singleton pattern is that it requires the class name to be hard-coded throughout the code base, removing some flexibility. We could get around that by having base classes construct derived classes, if you don't mind the dependency implications.
I'm currently working on converting $wgLinkCache to a singleton pattern, and I also have a few other objects in my sights. But I still don't know exactly how far we want to go with this. What do we want our long-term architecture to be?
What should we do with the User class? $wgUser is used very heavily. If not global, the scope of the object would have to be very wide. There are a few applications for multiple user objects, but they don't really interfere with the use of $wgUser elsewhere.
Another tricky case is configuration. There's about 300 configuration-related globals, it might be nice to encapsulate them purely from a namespace perspective. We already have a SiteConfiguration object, and on Wikimedia sites, this object has a configuration array which is extracted into the global namespace. Should we just use it directly instead? The conversion cost would obviously be high.
There might also be some need for encapsulating configuration from a data flow perspective. setupGlobals() in dumpHTML.inc could perhaps be made a bit more elegant.
Should objects such as $wgUser and $wgConf be members of an application object? Should the application object be global? Some other heavily-used globals are $wgLang, $wgContLang, $wgOut and $wgParser. What should we do with them?
We need to be guided by our applications, and choose the simplest architecture which supports all of them. Are we interested in:
* Embedding? Need to avoid namespace pollution. * Per-wiki daemons to do background tasks? Need a means for periodically refreshing configuration and caches. * A daemon that responds to requests for multiple wikis? Needs multiple language objects, and a caching system which discriminates between different wikis.
I'm interested in daemon (or servlet) applications because of the efficiency implications.
-- Tim Starling