PHP design basics

List overview All Threads
Download

newer

older

Re: [Wikitech-l] [MediaWiki-CVS]...

Re: [Wikitech-l] [mediawiki] /...

Tim Starling

22 May 2008 22 May '08

3:05 a.m.

A few PHP design basics for people who have forgotten them.

== Lazy load everything ==

The hottest hotspot is always initialisation. Initialisation runs on every single request. Every extension adds its own initialisation overhead, and there may be many extensions. Therefore, it is critical that initialisation is heavily optimised.

So if you're writing an extension, don't do anything at initialisation time (either file scope or $wgExtensionFunctions) except setting globals to literals.

Don't initialise data that you think you might need. Wait until you're asked for it and load it then. Cache it if necessary. Think in terms of a pull model, not a push model.

Loading large amounts of code is slow and memory hungry, especially on installations without an opcode cache. So lazy-load your code, by putting everything into autoloaded classes. Use static member functions for hooks.

== Global variables are evil ==

In MediaWiki, global variables should be used for configuration, and nothing else. There are some legacy object globals defined in old code, and you might have to use them from time to time, but don't go adding more of them. And don't reference any global variable unless you have to.

If you do need to use a legacy global variable such as $wgUser, just put the global statement where it's needed. Don't pull the object out of the global namespace and pass it from function to function unless there's a concrete need for that versatility.

Passing global objects from function to function is not a reasonable substitute for direct reference to a global variable. When the revolution comes, you will know it.

== Objects are hashtables ==

Member variables can be added and removed dynamically. It is your fundamental right as a PHP programmer to do this, nobody will ever take it away from you.

You can test for existence with isset($this->var), delete a variable with unset($this->var), reference variables with dynamic names using $this->$name, and even convert to and from the array type, (array)$obj.

It is nice from a self-documentation standpoint to put var declarations at the top of your classes. But understand that a var declaration takes up time and space when the object is initialised. If you leave it out, that overhead can be deferred, and maybe skipped altogether.

But the most important part of this paradigm is that extensions can create custom member variables in core objects.

Say if you're writing a hook and you need to cache some data. You could put it in a global variable or static member variable, but the trouble is, unless you're very careful, you lose versatility in the hook caller.

Most hooks have an object as their first parameter. This object is a great place to put your extension data relating to that object. When the caller is done with the object, it will go out of scope and your data will be deleted. And your code will implicitly be versatile enough to support any number of objects present in the system at any given time.

Prefix your custom member variables with the name of your extension, to avoid conflicts.

Show replies by date

DanTMan

22 May 22 May

6:18 a.m.

Tim Starling wrote:

...

A few PHP design basics for people who have forgotten them.

== Lazy load everything ==

The hottest hotspot is always initialisation. Initialisation runs on every single request. Every extension adds its own initialisation overhead, and there may be many extensions. Therefore, it is critical that initialisation is heavily optimised.

So if you're writing an extension, don't do anything at initialisation time (either file scope or $wgExtensionFunctions) except setting globals to literals.

Don't initialise data that you think you might need. Wait until you're asked for it and load it then. Cache it if necessary. Think in terms of a pull model, not a push model.

Loading large amounts of code is slow and memory hungry, especially on installations without an opcode cache. So lazy-load your code, by putting everything into autoloaded classes. Use static member functions for hooks.

^_^ I've been creating classes with the name of my extension for awhile and putting all the hooks and extension functions into them and autoloading the class for awhile now. Heh, I just did it because it looked cleaner than using ugly efExtName functions, didn't know it was a design basic.

...

== Global variables are evil ==

In MediaWiki, global variables should be used for configuration, and nothing else. There are some legacy object globals defined in old code, and you might have to use them from time to time, but don't go adding more of them. And don't reference any global variable unless you have to.

;) If I had it my way, globals wouldn't even be used for extension configuration. Even got the beautiful side effect you can load the extension inside of a function.

...

If you do need to use a legacy global variable such as $wgUser, just put the global statement where it's needed. Don't pull the object out of the global namespace and pass it from function to function unless there's a concrete need for that versatility.

Passing global objects from function to function is not a reasonable substitute for direct reference to a global variable. When the revolution comes, you will know it.

== Objects are hashtables ==

Member variables can be added and removed dynamically. It is your fundamental right as a PHP programmer to do this, nobody will ever take it away from you.

You can test for existence with isset($this->var), delete a variable with unset($this->var), reference variables with dynamic names using $this->$name, and even convert to and from the array type, (array)$obj.

It is nice from a self-documentation standpoint to put var declarations at the top of your classes. But understand that a var declaration takes up time and space when the object is initialised. If you leave it out, that overhead can be deferred, and maybe skipped altogether.

But the most important part of this paradigm is that extensions can create custom member variables in core objects.

Say if you're writing a hook and you need to cache some data. You could put it in a global variable or static member variable, but the trouble is, unless you're very careful, you lose versatility in the hook caller.

Most hooks have an object as their first parameter. This object is a great place to put your extension data relating to that object. When the caller is done with the object, it will go out of scope and your data will be deleted. And your code will implicitly be versatile enough to support any number of objects present in the system at any given time.

Prefix your custom member variables with the name of your extension, to avoid conflicts.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)

MinuteElectron

7:38 a.m.

DanTMan wrote:

...

Tim Starling wrote:

...
Loading large amounts of code is slow and memory hungry, especially on installations without an opcode cache. So lazy-load your code, by putting everything into autoloaded classes. Use static member functions for hooks.

^_^ I've been creating classes with the name of my extension for awhile and putting all the hooks and extension functions into them and autoloading the class for awhile now. Heh, I just did it because it looked cleaner than using ugly efExtName functions, didn't know it was a design basic.

Of course, this loads the entire class -- even anything you might not need at that time; any extension functions you want in $wgExtensionFunctions could just be put in the global scope anyway since they are always going to be loaded regardless of what code path is chosen.

MinuteElectron.

DanTMan

7:53 a.m.

Well part of it is offloading the time. Yes it will be loaded at some point, but it's always nice if that point is after MW has loaded a fair bit of itself.

Whatever the case... It's kinda a compromise for me... Really, I would like to put everything in the class and screw the classic loading files. And I mean config with that... WikiVid::$cAutoload = false; Does send a fair bit more of a signal of "This extension owns this configuration option" than $egWikiVidAutoload does.

And alluding to one of my other messages: $wgLoadExtensions['WikiVid'] = array( 'Autoload' => false );

~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)

MinuteElectron wrote:

...

DanTMan wrote:

...
Tim Starling wrote:

...
Loading large amounts of code is slow and memory hungry, especially on installations without an opcode cache. So lazy-load your code, by putting everything into autoloaded classes. Use static member functions for hooks.

^_^ I've been creating classes with the name of my extension for awhile and putting all the hooks and extension functions into them and autoloading the class for awhile now. Heh, I just did it because it looked cleaner than using ugly efExtName functions, didn't know it was a design basic.

Of course, this loads the entire class -- even anything you might not need at that time; any extension functions you want in $wgExtensionFunctions could just be put in the global scope anyway since they are always going to be loaded regardless of what code path is chosen.

MinuteElectron.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Jim Hu

8:12 a.m.

On May 22, 2008, at 9:38 AM, MinuteElectron wrote:

...

DanTMan wrote:

...
Tim Starling wrote:

...
Loading large amounts of code is slow and memory hungry, especially on installations without an opcode cache. So lazy-load your code, by putting everything into autoloaded classes. Use static member functions for hooks.

^_^ I've been creating classes with the name of my extension for awhile and putting all the hooks and extension functions into them and autoloading the class for awhile now. Heh, I just did it because it looked cleaner than using ugly efExtName functions, didn't know it was a design basic.

Of course, this loads the entire class -- even anything you might not need at that time; any extension functions you want in $wgExtensionFunctions could just be put in the global scope anyway since they are always going to be loaded regardless of what code path is chosen.

Being a self-taught amateur, I need an explanation of how to do this...is this what you mean?

Bad: one file: myExtension.php: $wgHooks[the hook][] = efMyExtension; function efMyExtension(){ lots of code ... }

Good: two or more files: 1) myExtension/includes/myExtension.php $wgHooks[the hook][] = efMyExtension; $wgAutoloadClasses['myExtensionClass'] = dirname(__FILE__).myExtensionClass.php;

function efMyExtension(params){ $foo = new myExtensionClass; $foo->execute(params) }

2) myExtension/class.myExtensionClass.php class myExtensionClass{ function __construct(){ } function execute(param){ lots of code } }

Jim

...

MinuteElectron.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054

DanTMan

9:25 a.m.

extensions/ExtensionName/ExtensionName.php:

$wgHooks['LanguageGetMagic'][] = 'ExtensionName::LanguageGetMagic'; $wgExtensionFunctions[] = array( 'ExtensionName', 'ExtensionFunction' ); $wgAutoloadClasses['ExtensionName'] = dirname(__FILE__).'/ExtensionName.class.php'; $wgAutoloadClasses['ExtensionName_Functions'] = dirname(__FILE__).'/ExtensionName.functions.php';

extensions/ExtensionName/ExtensionName.class.php:

class ExtensionName { static public function LoadParserFunctions() { global $wgParser; $wgParser->setFunctionHook( 'foo', array( 'ExtensionName_Functions', 'foo' ) ); return true; } static public function SetupLanguageMagic( &$magicWords, $langCode ) { /* ... Initilization for your ParserFunction magic words ... */ return true; } }

extension/ExtensionName/ExtensionName.functions.php:

class ExtensionName_Functions { static public function foo( &$parser, $a = '', $b = '', $c = '' ) { return "$a foo $b foo $c"; } }

Though my thoughts may diverge from how other people do it. I particularly like in this setup how only the things like initialization of parser functions, and their magic words go in the base class. And that's autoloaded after MediaWiki is well and ready to load it. And the actual heavy parserfunctions that make up the extension is off in a completely separate file which never ends up loaded unless we're doing parsing and someone called that parser function. This is basically how I set up most of my extensions, namely WikiCode. However, in that case with how many varying functions I have, I went the road of splitting them up into groups, and doing the autoloading inside of the ExtensionFunction. As well as registration of the parserfunctions.

~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)

Jim Hu wrote:

...

On May 22, 2008, at 9:38 AM, MinuteElectron wrote:

...
DanTMan wrote:

...
Tim Starling wrote:

...
Loading large amounts of code is slow and memory hungry, especially on installations without an opcode cache. So lazy-load your code, by putting everything into autoloaded classes. Use static member functions for hooks.

^_^ I've been creating classes with the name of my extension for awhile and putting all the hooks and extension functions into them and autoloading the class for awhile now. Heh, I just did it because it looked cleaner than using ugly efExtName functions, didn't know it was a design basic.

Of course, this loads the entire class -- even anything you might not need at that time; any extension functions you want in $wgExtensionFunctions could just be put in the global scope anyway since they are always going to be loaded regardless of what code path is chosen.

Being a self-taught amateur, I need an explanation of how to do this...is this what you mean?

Bad: one file: myExtension.php: $wgHooks[the hook][] = efMyExtension; function efMyExtension(){ lots of code ... }

Good: two or more files:

myExtension/includes/myExtension.php

$wgHooks[the hook][] = efMyExtension; $wgAutoloadClasses['myExtensionClass'] = dirname(__FILE__).myExtensionClass.php;

function efMyExtension(params){ $foo = new myExtensionClass; $foo->execute(params) }

myExtension/class.myExtensionClass.php

class myExtensionClass{ function __construct(){ } function execute(param){ lots of code } }

Jim

...
MinuteElectron.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Brion Vibber

10 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

DanTMan wrote:

...

extensions/ExtensionName/ExtensionName.php:

$wgHooks['LanguageGetMagic'][] = 'ExtensionName::LanguageGetMagic'; $wgExtensionFunctions[] = array( 'ExtensionName', 'ExtensionFunction' );

Oooh, avoid that last -- that'll load your class on every request, whether it's needed or not.

$wgExtensionFunctions is for initialization functions which will need to be run during Setup.php, after most of the basic infrastructure is up.

These days there's actually usually little if any need for such functions, since we've got the various hook arrays of all sorts which are designed for extensibility and lazy-loading; you can set your various information directly in the arrays in your config/loader code.

- -- brion vibber (brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkg1ppMACgkQwRnhpk1wk45KkQCg2iSiouyAnhT7lYpdAmjkViVD WywAoNZ77bk3rTPe+0KKzou3yYn9QpzD =y7Nd -----END PGP SIGNATURE-----

Chad

10:43 a.m.

On Thu, May 22, 2008 at 1:00 PM, Brion Vibber brion@wikimedia.org wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

DanTMan wrote:

...
extensions/ExtensionName/ExtensionName.php:

$wgHooks['LanguageGetMagic'][] = 'ExtensionName::LanguageGetMagic'; $wgExtensionFunctions[] = array( 'ExtensionName', 'ExtensionFunction' );

Oooh, avoid that last -- that'll load your class on every request, whether it's needed or not.

$wgExtensionFunctions is for initialization functions which will need to be run during Setup.php, after most of the basic infrastructure is up.

These days there's actually usually little if any need for such functions, since we've got the various hook arrays of all sorts which are designed for extensibility and lazy-loading; you can set your various information directly in the arrays in your config/loader code.

-- brion vibber (brion @ wikimedia.org)

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg1ppMACgkQwRnhpk1wk45KkQCg2iSiouyAnhT7lYpdAmjkViVD WywAoNZ77bk3rTPe+0KKzou3yYn9QpzD =y7Nd -----END PGP SIGNATURE-----

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

All of this is great advice (especially for someone new to the code, like myself), and would be of great use to be on Mediawiki.org as well. On that note, perhaps an update of the example extension in SVN might be in order, with some better documentation, so things like this are clear to everyone.

-Chad

DanTMan

23 May 23 May

7:59 a.m.

^_^ Can we get one for ParserFunctions and ParserTags then? If LanguageGetMagic loads on every load, then there might be reason to do the same.

We still have to load all the parserfunctions and tags ourself. And even worse we have to do it by calling on the parser and unstubbing it during load instead of letting it wait till it's actually unstubbed.

Actually... What I'm thinking is something like LanguageGetMagic. A hook, something like ParserGetHooks or whatever which passes the parser object into the function, and is used to add in all the hooks and tags to it. That also has the bonus side effect that if both parsers are being used instead of only one (ie: For things like ParserDiffTest) the parserfunctions are not excluded. Same goes for if for some odd reason an extension needs to initialize it's own Parser (Though, in that case there should probably be an option to disable loading of the pfuncs and tags).

~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)

Brion Vibber wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

DanTMan wrote:

...
extensions/ExtensionName/ExtensionName.php:

$wgHooks['LanguageGetMagic'][] = 'ExtensionName::LanguageGetMagic'; $wgExtensionFunctions[] = array( 'ExtensionName', 'ExtensionFunction' );

Oooh, avoid that last -- that'll load your class on every request, whether it's needed or not.

$wgExtensionFunctions is for initialization functions which will need to be run during Setup.php, after most of the basic infrastructure is up.

These days there's actually usually little if any need for such functions, since we've got the various hook arrays of all sorts which are designed for extensibility and lazy-loading; you can set your various information directly in the arrays in your config/loader code.

-- brion vibber (brion @ wikimedia.org)

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg1ppMACgkQwRnhpk1wk45KkQCg2iSiouyAnhT7lYpdAmjkViVD WywAoNZ77bk3rTPe+0KKzou3yYn9QpzD =y7Nd -----END PGP SIGNATURE-----

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Brion Vibber

11:41 a.m.

DanTMan wrote:

...

^_^ Can we get one for ParserFunctions and ParserTags then? If LanguageGetMagic loads on every load, then there might be reason to do the same.

We still have to load all the parserfunctions and tags ourself. And even worse we have to do it by calling on the parser and unstubbing it during load instead of letting it wait till it's actually unstubbed.

Actually... What I'm thinking is something like LanguageGetMagic. A hook, something like ParserGetHooks or whatever which passes the parser object into the function, and is used to add in all the hooks and tags to it.

It's called 'ParserFirstCallInit', and seems to be widely used by existing parser hook extensions.

-- brion vibber (brion @ wikimedia.org)

DanTMan

12:59 p.m.

O_o It doesn't appear to be used by ParserFunctions, the most widely used one. Or at least I never saw it used till now. Well then, guess it's time to tweak code... Well, perhaps after I do some contract work. :/ And restart this damn machine.

~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)

Brion Vibber wrote:

...

DanTMan wrote:

...
^_^ Can we get one for ParserFunctions and ParserTags then? If LanguageGetMagic loads on every load, then there might be reason to do the same.

We still have to load all the parserfunctions and tags ourself. And even worse we have to do it by calling on the parser and unstubbing it during load instead of letting it wait till it's actually unstubbed.

Actually... What I'm thinking is something like LanguageGetMagic. A hook, something like ParserGetHooks or whatever which passes the parser object into the function, and is used to add in all the hooks and tags to it.

It's called 'ParserFirstCallInit', and seems to be widely used by existing parser hook extensions.

-- brion vibber (brion @ wikimedia.org)

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Brion Vibber

2:59 p.m.

DanTMan wrote:

...

O_o It doesn't appear to be used by ParserFunctions, the most widely used one. Or at least I never saw it used till now.

Well, let's look... :D

if ( defined( 'MW_SUPPORTS_PARSERFIRSTCALLINIT' ) ) { $wgHooks['ParserFirstCallInit'][] = array( &$wgExtParserFunctions, 'registerParser' );

That'd done conditionally in a setup function called from $wgExtensionFunctions, in order to provide backwards compatibility with older MediaWiki host versions -- if the ParserFirstCallInit hook isn't available, only then does it go after the parser immediately.

-- brion

Quasar Jarosz

22 May 22 May

6:59 a.m.

...

== Objects are hashtables ==

Member variables can be added and removed dynamically. It is your fundamental right as a PHP programmer to do this, nobody will ever take it away from you.

Do you know of anywhere i can get some more information on this behavior? I believe i have read every page of the PHP manual and I had no idea this behavior was possible!

Quasar

Tim Starling

23 May 23 May

7:36 a.m.

Quasar Jarosz wrote:

...

...
== Objects are hashtables ==

Member variables can be added and removed dynamically. It is your fundamental right as a PHP programmer to do this, nobody will ever take it away from you.

Do you know of anywhere i can get some more information on this behavior? I believe i have read every page of the PHP manual and I had no idea this behavior was possible!

It doesn't appear to say it, no. But it's certainly not the only undocumented feature in PHP. For example, the essential syntax:

$class = 'ClassName'; $obj = new $class;

...is not in the manual either, it's only in a user comment. Luckily the PHP developers are sufficiently concerned about backwards compatibility to avoid breaking things like this.

I know what I know about PHP thanks to a few years of experience. If you want to learn more, you could try:

* Reading other people's PHP code, there's plenty of open source projects around * Reading mailing lists and blogs * Reading the PHP source code

The popular techniques are least likely to be broken by future PHP releases, so it helps to read lots of code and know what is popular.

-- Tim Starling

Edward Z. Yang

22 May 22 May

12:31 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Tim Starling wrote:

...

It is nice from a self-documentation standpoint to put var declarations at the top of your classes. But understand that a var declaration takes up time and space when the object is initialised. If you leave it out, that overhead can be deferred, and maybe skipped altogether.

I find this to be a very interesting viewpoint. How much time/space, exactly, is saved by moving variable declarations from the object declaration to, say, the constructor? I've always felt that the self-documentation ability derived from having explicit member variables is more important.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFINcn/qTO+fYacSNoRAni3AJ9D9FrdvfkwuPuTQB3+J8jMp+SkyACdHNLY AQsHIEzlYn08EuTN7aSxlfQ= =Oiok -----END PGP SIGNATURE-----

Tim Starling

4:45 p.m.

Edward Z. Yang wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Tim Starling wrote:

...
It is nice from a self-documentation standpoint to put var declarations at the top of your classes. But understand that a var declaration takes up time and space when the object is initialised. If you leave it out, that overhead can be deferred, and maybe skipped altogether.

I find this to be a very interesting viewpoint. How much time/space, exactly, is saved by moving variable declarations from the object declaration to, say, the constructor? I've always felt that the self-documentation ability derived from having explicit member variables is more important.

Nothing is saved by moving them to the constructor, in fact it'll probably be slower. The idea is to defer variable initialisation until the variable is required. The saving, assuming the variable is never required, is on the order of a microsecond plus 84 bytes. So unless there's a lot of objects or it's in a tight loop, we're in the realm of micro-optimisation, and other considerations, such as personal style, are probably going to take precedence.

I wouldn't recommend leaving the var out when there is a need to initialise it to something simple, and you would need to add an isset():

var $x = 1; ... $this->doStuff( $x );

versus

if ( !isset( $this->x ) ) { $this->x = 1; } $this->doStuff( $x );

The second one is slower.

But please don't forget my main point: an object is a hashtable, you can add and remove variables. Don't be such a stickler for self-documentation and the way things are "meant" to be done that you tie yourself in knots trying to avoid dynamic variable creation. You can always document with a comment instead.

-- Tim Starling

Edward Z. Yang

8:12 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Tim Starling wrote:

...

I wouldn't recommend leaving the var out when there is a need to initialise it to something simple, and you would need to add an isset(): [snip] The second one is slower.

Right. A little murkier would be something like

var $foo;

Which declares the variable and is a convenient point for documentation. I've always thought that adding this little marker allowed the Zend Engine to optimize access to the variable, much like a C struct. If I'm reading you correctly, apparently not! (It's really hard to test for micro-optimizations like these, and I can't read C well enough to make determinations on my own).

And, of course, conditional includes, although annoying to bytecode caches, let you bypass the overhead of loading all the opcodes altogether. There's a balance to be had there, and I'm curious to know what works and what doesn't.

...

But please don't forget my main point: an object is a hashtable, you can add and remove variables. Don't be such a stickler for self-documentation and the way things are "meant" to be done that you tie yourself in knots trying to avoid dynamic variable creation. You can always document with a comment instead.

There's an interesting question: why objects over arrays? (There's a simple answer to that, and it is that most of the time, you're dealing with an object, not an array, and shouldn't have to allocate another hash table for some custom properties.)

Also, you might be interested in the "Class Properties in Interfaces?" discussion that was recently floating around on php.internals, that proposed that unsetting a variable from an interface should cause a fatal error. :-P (You're probably not; it's a really long discussion).

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFINjY6qTO+fYacSNoRAntiAJ9NczepsswzI8h3Tg0u5WfzmXaq6QCeO2Ww +peoZNen2GlPtwj0hCx+tFI= =nVh4 -----END PGP SIGNATURE-----

DanTMan

23 May 23 May

8:03 a.m.

Edward Z. Yang wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Tim Starling wrote:

...
I wouldn't recommend leaving the var out when there is a need to initialise it to something simple, and you would need to add an isset(): [snip] The second one is slower.

Right. A little murkier would be something like

var $foo;

Which declares the variable and is a convenient point for documentation. I've always thought that adding this little marker allowed the Zend Engine to optimize access to the variable, much like a C struct. If I'm reading you correctly, apparently not! (It's really hard to test for micro-optimizations like these, and I can't read C well enough to make determinations on my own).

And, of course, conditional includes, although annoying to bytecode caches, let you bypass the overhead of loading all the opcodes altogether. There's a balance to be had there, and I'm curious to know what works and what doesn't.

...
But please don't forget my main point: an object is a hashtable, you can add and remove variables. Don't be such a stickler for self-documentation and the way things are "meant" to be done that you tie yourself in knots trying to avoid dynamic variable creation. You can always document with a comment instead.

There's an interesting question: why objects over arrays? (There's a simple answer to that, and it is that most of the time, you're dealing with an object, not an array, and shouldn't have to allocate another hash table for some custom properties.)

^_^ For me... I just like accessing data with a $foo->bar rather than $foo['bar'], former one looks a bit nicer to me.

...

Also, you might be interested in the "Class Properties in Interfaces?" discussion that was recently floating around on php.internals, that proposed that unsetting a variable from an interface should cause a fatal error. :-P (You're probably not; it's a really long discussion). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFINjY6qTO+fYacSNoRAntiAJ9NczepsswzI8h3Tg0u5WfzmXaq6QCeO2Ww +peoZNen2GlPtwj0hCx+tFI= =nVh4 -----END PGP SIGNATURE-----

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)

Nick Jenkins

22 May 22 May

6:59 p.m.

...

...
It is nice from a self-documentation standpoint to put var declarations at the top of your classes. But understand that a var declaration takes up time and space when the object is initialised. If you leave it out, that overhead can be deferred, and maybe skipped altogether.

I find this to be a very interesting viewpoint. How much time/space, exactly, is saved by moving variable declarations from the object declaration to, say, the constructor? I've always felt that the self-documentation ability derived from having explicit member variables is more important.

Yeah, I feel the same. Let me outline why. With any code, there are at least 3 dimensions of quality:

1) Performance. How quickly it does its job, and with what resources. Can be improved by having short code paths, minimizing memory use, minimizing compiled code size (so that more of the program fits into L1 or L2 cache), reducing disk access, using low-level languages with less runtime overhead (hand-optimized assembler in the extreme), various caches (opcode caches, object caches, HTTP caches, etc), and doing things only when you need them (lazy loading, just-in-time systems, etc), better algorithms, and so forth. Or performance can be improved by increasing the resources allocated (e.g. more database servers, more apache servers, more squids, more RAM, faster disks, faster CPU, faster network, etc). I.e. do less, and/or do it with more grunt.

2) Maintenance. How easily and quickly other programmers can fix or add functionality to your code when you are away: - documentation. What is the overall purpose of the code, what problem is it trying to solve, how are you trying to solve it, what are main functions, what are their parameters. - making things obvious. E.g. understandable and short variable names, function names, and class names. Any bits that do something tricky or critical should be documented or explained. - making things short, and simple. Simpler and shorter things are easier to hold in your head and understand. - using a programming language and a style that is familiar to many people.

3) Functionality. How much the code does, how useful what it does is, and how closely its actual behaviour matches the expected behaviour, and how flexible and general the code is.

There way well be more dimensions and other aspects I haven't covered above, but it'll probably suffice.

I'd argue that most everything that committers are trying to do in MediaWiki is aimed at giving an improvement in one or more of the above dimensions. E.g. fix a bug = improved functionality. Add a feature = improved functionality. Add some documentation = improved maintenance. Standardize an awkward non-standard file to use the same approach as the rest of the code base, which makes it shorter and simpler = improved maintenance. And so forth, with combinations of the above possible.

I'd also argue that anything that is an overall regression in the above dimensions should probably be reverted. E.g. introduce a bug and make performance worse but add one line of documentation = revert.

Now, some of the cases outlined are a clear overall win (that is, they entail a significant improvement in one dimension with no regressions in another, or a very minor regression in another).

E.g. lazy loading probably makes the code a bit longer, and a tiny bit less clear, but improves performance a lot.

However, not declaring class variables seems to me to be a significant overall loss. I for one have looked at MediaWiki code trying to work out where some variable in a huge class came from. It wasn't declared. It wasn't inherited from the parent class (which was also very long). It wasn't inherited from the parent's parent class. Nope. It wasn't documented anywhere. It was just used a few times, without explanation, and without declaration. And to understand what it did you had to read the function that initialized it. That function was also not documented. And that function called another function which you had to understand to understand that the first function did. That function was also not documented. Then that function called a third function, which you have to understand to understand what the second function did to understand what the first function did to understand what the purpose of the class variable was. The whole process wasted about 20 minutes, and by the end of it, I was, to say the least, not very impressed. For a minimal gain in performance by not declaring a variable (and for zero gain in performance by not having any documentation), the maintainability of that code was severely reduced.

So personally, I'm very much in favour of declaring variables (for the simple reason that the performance increase would need to be f*ing huge to counterbalance the enormous reduction in maintainability). But if people _really_ don't want to do this for performance reasons, then fair enough, but at the very least can they please consider documenting those variables, with their scope, name, type, and purpose. E.g. ------------------------------------- // This class does batch processing for [insert some reason here] class whatever { // Local variables, not declared for performance reasons: // private $count int How many pages we have looked at thus far in the batch processing. // private $title Title The current page's title that we are currently working on for the batch processing. // ... etc -------------------------------------

-- All the best, Nick.

Tim Starling

7:50 p.m.

Nick Jenkins wrote:

...

However, not declaring class variables seems to me to be a significant overall loss. I for one have looked at MediaWiki code trying to work out where some variable in a huge class came from. It wasn't declared. It wasn't inherited from the parent class (which was also very long). It wasn't inherited from the parent's parent class. Nope. It wasn't documented anywhere. It was just used a few times, without explanation, and without declaration. And to understand what it did you had to read the function that initialized it. That function was also not documented. And that function called another function which you had to understand to understand that the first function did. That function was also not documented. Then that function called a third function, which you have to understand to understand what the second function did to understand what the first function did to understand what the purpose of the class variable was. The whole process wasted about 20 minutes, and by the end of it, I was, to say the least, not very impressed. For a minimal gain in performance by not declaring a variable (and for zero gain in performance by not having any documentation), the maintainability of that code was severely reduced.

If you've got no clue whatsoever what is going on in a section of code, an uncommented var declaration is hardly going to do you any good. It seems to me your problem is with documentation, not var statements.

-- Tim Starling

Bryan Tong Minh

23 May 23 May

12:38 p.m.

On Thu, May 22, 2008 at 12:05 PM, Tim Starling tstarling@wikimedia.org wrote:

...

== Global variables are evil ==

[...]

...

If you do need to use a legacy global variable such as $wgUser, just put the global statement where it's needed. Don't pull the object out of the global namespace and pass it from function to function unless there's a concrete need for that versatility.

Time to deprecate all of them in favor of something like MediaWiki::singleton()->getUser(); ? It's probably slower but less evil.

Bryan

DanTMan

1:14 p.m.

Bah, that looks even uglier. If you're going to make a singleton, make it something sane and reasonable.

User::getCurrent();

It's would be part of the class actually being mentioned, and it specifies what you are trying to get, the user of the current transaction. Same would go for other things like Title::getCurrent();

However, unfortunately part of things done at times are testing $wgTitle if it's null (has been set yet), and sometimes there are reasons to override it (Special use, primarily things outside of the scope of a article view). In those cases we should also probably setup things like TITLE_ISINSTANTIATED and also allow a Title::setCurrent();

Unfortunately, the parser isn't that easy. You'd think Parser::singleton() would be ok, but it's not. We'd need to be passing around a $wgParserconf and use the 'class' to determine what to get, but even that wouldn't work right.

~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)

Bryan Tong Minh wrote:

...

On Thu, May 22, 2008 at 12:05 PM, Tim Starling tstarling@wikimedia.org wrote:

...
== Global variables are evil ==

[...]

...
If you do need to use a legacy global variable such as $wgUser, just put the global statement where it's needed. Don't pull the object out of the global namespace and pass it from function to function unless there's a concrete need for that versatility.

Time to deprecate all of them in favor of something like MediaWiki::singleton()->getUser(); ? It's probably slower but less evil.

Bryan

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Bryan Tong Minh

1:38 p.m.

On Fri, May 23, 2008 at 10:14 PM, DanTMan dan_the_man@telus.net wrote:

...

Bah, that looks even uglier. If you're going to make a singleton, make it something sane and reasonable.

User::getCurrent();

It's would be part of the class actually being mentioned, and it specifies what you are trying to get, the user of the current transaction. Same would go for other things like Title::getCurrent();

It does not necessarily look nice, but keeping the current stuff in one class sounds like a better idea to me.

...

However, unfortunately part of things done at times are testing $wgTitle if it's null (has been set yet), and sometimes there are reasons to override it (Special use, primarily things outside of the scope of a article view). In those cases we should also probably setup things like TITLE_ISINSTANTIATED and also allow a Title::setCurrent();

Directly scraping $wgTitle is of course not possible. MediaWiki::singleton()->getTitle(); would for now just be return $wgTitle; but in 2020 when no globals are used anymore, the globals can be killed.

Bryan

Roan Kattouw

1:55 p.m.

Bryan Tong Minh schreef:

...

Directly scraping $wgTitle is of course not possible. MediaWiki::singleton()->getTitle(); would for now just be return $wgTitle; but in 2020 when no globals are used anymore, the globals can be killed.

Like you said, 2020. Maybe it's better to just wait until a group of people decides to rewrite the codebase (phase4), otherwise we'll just make stuff uglier.

Roan Kattouw (Catrope)

DanTMan

2:36 p.m.

^_^ That far ahead... By that point in time someone inspired by the user end of MediaWiki may have rewritten an entire Wiki engine in another engine which can actually be compiled into machine code and doesn't rely on a slow system like PHP's bytecode. And not only that uses a parser designed from the use point that people are currently hacking the MediaWiki parser mean as a templating engine, to do. ^_^ if things end up really nicely, you could even login to your wiki engine with the cli and tweak things from the inside, heh... At the very least, by then we'll have PHP6 and all the wonderful new OOP stuff. Including __callStatic, static::, and namespaces. With all that stuff it'll be easier to exclude ugly functional globals. Heh... we could even be using things like Xml::span which __callStatic wraps to Xml::element( 'span', ... ); Extensions could be contained within namespaces, and we could be using static:: inside of classes instead of self:: to better support any subclassing by extensions. At the very least, we won't need to have completely separated Parsers and can at least use static:: where needed and such to subclass the parser to make new ones instead of duplicating it. ^_^ It'll probably be designed in a way better supportive of WYSIWYG by then.

~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com)

Roan Kattouw wrote:

...

Bryan Tong Minh schreef:

...
Directly scraping $wgTitle is of course not possible. MediaWiki::singleton()->getTitle(); would for now just be return $wgTitle; but in 2020 when no globals are used anymore, the globals can be killed.

Like you said, 2020. Maybe it's better to just wait until a group of people decides to rewrite the codebase (phase4), otherwise we'll just make stuff uglier.

Roan Kattouw (Catrope)

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Simetrical

3:12 p.m.

On Fri, May 23, 2008 at 5:36 PM, DanTMan dan_the_man@telus.net wrote:

...

Heh... we could even be using things like Xml::span which __callStatic wraps to Xml::element( 'span', ... );

Not if that's like the current __get and __set, which I assume it will be. Those are broken beyond belief: if one __get is nested within another, to any depth, it *silently* returns null. Purportedly to avoid infinite loops, but it's a spectacularly stupid way to do so. Extremely easy to cause bugs that will be a big pain to track down, unless you do almost nothing in the relevant functions or anything called by them.

Roan Kattouw

24 May 24 May

1:42 a.m.

Simetrical schreef:

...

On Fri, May 23, 2008 at 5:36 PM, DanTMan dan_the_man@telus.net wrote:

...
Heh... we could even be using things like Xml::span which __callStatic wraps to Xml::element( 'span', ... );

Not if that's like the current __get and __set, which I assume it will be. Those are broken beyond belief: if one __get is nested within another, to any depth, it *silently* returns null. Purportedly to avoid infinite loops, but it's a spectacularly stupid way to do so. Extremely easy to cause bugs that will be a big pain to track down, unless you do almost nothing in the relevant functions or anything called by them.

That's the point, we *do* do almost nothing in Xml::* functions. They're either wrappers to other Xml::* functions (in which case all we have to do is watch out that we use real functions inside Xml::* functions, not __callStatic-powered fake ones) or one- or two-line returns.

Roan Kattouw (Catrope)

Michael Daly

23 May 23 May

3:57 p.m.

Bryan Tong Minh wrote:

...

...
User::getCurrent();

It's would be part of the class actually being mentioned, and it specifies what you are trying to get, the user of the current transaction. Same would go for other things like Title::getCurrent();

It does not necessarily look nice, but keeping the current stuff in one class sounds like a better idea to me.

I think that it would be better to use something like:

Mediawiki::getUser();

Mediawiki::gettitle();

etc. The singleton knows its own members. Is there a reason to have more than one instance of Mediawiki at any time (i.e. is it ever not a singleton)?

Consider the complications of having each class have its own prime instance as well as other possible instances and knowing the difference. That would involve a lot of messy code (could be in a common base class, but nonetheless, each of these has to do its own thing).

As well, a class shouldn't know about it's instances (garbage collection schemes aside) but a class's instance should know about instances of its own members.

Mike

6037

Age (days ago)

6039

Last active (days ago)

wikitech-l@lists.wikimedia.org

27 comments

13 participants

tags (0)

participants (13)

Brion Vibber
Bryan Tong Minh
Chad
DanTMan
Edward Z. Yang
Jim Hu
Michael Daly
MinuteElectron
Nick Jenkins
Quasar Jarosz
Roan Kattouw
Simetrical
Tim Starling