Re: [Wikitech-l] PHP design basics

22 May 2008


      ...
...
It is nice from a self-documentation standpoint to put var declarations at 
the top of your classes. But understand that a var declaration takes up 
time and space when the object is initialised. If you leave it out, that 
overhead can be deferred, and maybe skipped altogether.
I find this to be a very interesting viewpoint. How much time/space,
exactly, is saved by moving variable declarations from the object
declaration to, say, the constructor? I've always felt that the
self-documentation ability derived from having explicit member variables
is more important.
Yeah, I feel the same. Let me outline why. With any code, there are at least 3 dimensions of quality:
1) Performance. How quickly it does its job, and with what resources. Can be improved by
   having short code paths, minimizing memory use, minimizing compiled code size (so that more of the program fits
   into L1 or L2 cache), reducing disk access, using low-level languages with less runtime overhead (hand-optimized
   assembler in the extreme), various caches (opcode caches, object caches, HTTP caches, etc), and doing things only
   when you need them (lazy loading, just-in-time systems, etc), better algorithms, and so forth. Or performance can
   be improved by increasing the resources allocated (e.g. more database servers, more apache servers, more squids, 
   more RAM, faster disks, faster CPU, faster network, etc). I.e. do less, and/or do it with more grunt.
2) Maintenance. How easily and quickly other programmers can fix or add functionality to your code when you are away:
  - documentation. What is the overall purpose of the code, what problem is it trying to solve, 
    how are you trying to solve it, what are main functions, what are their parameters.
  - making things obvious. E.g. understandable and short variable names, function names, and class names.
    Any bits that do something tricky or critical should be documented or explained.
  - making things short, and simple. Simpler and shorter things are easier to hold in your head and understand.
  - using a programming language and a style that is familiar to many people.
3) Functionality. How much the code does, how useful what it does is, and how closely its actual behaviour matches
   the expected behaviour, and how flexible and general the code is.
There way well be more dimensions and other aspects I haven't covered above, but it'll probably suffice.
I'd argue that most everything that committers are trying to do in MediaWiki is aimed at giving an improvement in 
one or more of the above dimensions. E.g. fix a bug = improved functionality. Add a feature = improved functionality.
Add some documentation = improved maintenance. Standardize an awkward non-standard file to use the same approach as
the rest of the code base, which makes it shorter and simpler = improved maintenance. And so forth, with
combinations of the above possible.
I'd also argue that anything that is an overall regression in the above dimensions should probably be reverted.
E.g. introduce a bug and make performance worse but add one line of documentation = revert.
Now, some of the cases outlined are a clear overall win (that is, they entail a significant improvement in one 
dimension with no regressions in another, or a very minor regression in another).
E.g. lazy loading probably makes the code a bit longer, and a tiny bit less clear, but improves performance a lot.
However, not declaring class variables seems to me to be a significant overall loss. I for one have looked at MediaWiki 
code trying to work out where some variable in a huge class came from. It wasn't declared. It wasn't inherited from the parent
class (which was also very long). It wasn't inherited from the parent's parent class. Nope. It wasn't documented anywhere.
It was just used a few times, without explanation, and without declaration. And to understand what it did you had to read
the function that initialized it. That function was also not documented. And that function called another function which
you had to understand to understand that the first function did. That function was also not documented. Then that function
called a third function, which you have to understand to understand what the second function did to understand what the 
first function did to understand what the purpose of the class variable was. The whole process wasted about 20 minutes, and 
by the end of it, I was, to say the least, not very impressed. For a minimal gain in performance by not declaring a
variable (and for zero gain in performance by not having any documentation), the maintainability of that code was severely
reduced.
So personally, I'm very much in favour of declaring variables (for the simple reason that the performance increase would
need to be f*ing huge to counterbalance the enormous reduction in maintainability). But if people _really_ don't want to do
this for performance reasons, then fair enough, but at the very least can they please consider documenting those variables,
with their scope, name, type, and purpose. E.g.
-------------------------------------
// This class does batch processing for [insert some reason here]
class whatever {
     // Local variables, not declared for performance reasons:
     // private $count int How many pages we have looked at thus far in the batch processing.
     // private $title Title The current page's title that we are currently working on for the batch processing.
     // ... etc
-------------------------------------
-- All the best,
Nick.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] PHP design basics