WHY THE FUCK WON'T YOU STOP FUCKING SPAMMING ME MOTHERFUCKER GO TO HELL AND DIE
On Thu, May 22, 2008 at 4:51 PM, wikitech-l-request@lists.wikimedia.org wrote:
Send Wikitech-l mailing list submissions to wikitech-l@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/wikitech-l or, via email, send a message with subject or body 'help' to wikitech-l-request@lists.wikimedia.org
You can reach the person managing the list at wikitech-l-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikitech-l digest..."
Today's Topics:
- Re: [MediaWiki-CVS] SVN: [35156] trunk/phase3/includes/filerepo (Brion Vibber)
- Re: PHP design basics (Brion Vibber)
- Re: PHP design basics (Chad)
- Re: PHP design basics (Edward Z. Yang)
- Re: So... status of category intersections? (Roan Kattouw)
- Re: PHP design basics (Tim Starling)
- Re: PHP design basics (Nick Jenkins)
- Re: PHP design basics (Tim Starling)
Message: 1 Date: Thu, 22 May 2008 09:27:47 -0700 From: Brion Vibber brion@wikimedia.org Subject: Re: [Wikitech-l] [MediaWiki-CVS] SVN: [35156] trunk/phase3/includes/filerepo To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: 48359F03.2050305@wikimedia.org Content-Type: text/plain; charset=ISO-8859-1
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Roan Kattouw wrote:
brion@svn.wikimedia.org schreef:
- Asks server for mime type, but doesn't get it back yet
That's due to a typo:
- function findFile( $title, $time = false ) {
$info = $this->queryImage( array(
'titles' => 'Image:' . $title->getText(),
'prop' => 'imageinfo',
'iiprop' =>
'timestamp|user|comment|url|size|sha1|metadata|mimetype' ) );
There it is. You should use iiprop=mime, not iiprop=mimetype.
Ah, but I didn't get a mime back with my query with "mime" until I fixed the API's own typo -- it looked for "mimetype" but that didn't get past the input validation -- in r35185... :)
The rest of this stuff looks good. You seem to have fixed the canonical namespace thing. To what degree did you test/profile/benchmark this?
Pretty much not profiled or benchmarked at all. :)
Just poking at it a bit on my test comp. I wouldn't recommend it for general use at the moment.
- -- brion
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkg1nwMACgkQwRnhpk1wk46e7QCgxIOcZZYOKQOT+tUpMbGYQxA1 bYcAn05gvvUa7WzBGVlOh/6s9uc+exYT =mwZt -----END PGP SIGNATURE-----
Message: 2 Date: Thu, 22 May 2008 10:00:03 -0700 From: Brion Vibber brion@wikimedia.org Subject: Re: [Wikitech-l] PHP design basics To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: 4835A693.90102@wikimedia.org Content-Type: text/plain; charset=ISO-8859-1
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
DanTMan wrote:
extensions/ExtensionName/ExtensionName.php:
$wgHooks['LanguageGetMagic'][] = 'ExtensionName::LanguageGetMagic'; $wgExtensionFunctions[] = array( 'ExtensionName', 'ExtensionFunction' );
Oooh, avoid that last -- that'll load your class on every request, whether it's needed or not.
$wgExtensionFunctions is for initialization functions which will need to be run during Setup.php, after most of the basic infrastructure is up.
These days there's actually usually little if any need for such functions, since we've got the various hook arrays of all sorts which are designed for extensibility and lazy-loading; you can set your various information directly in the arrays in your config/loader code.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkg1ppMACgkQwRnhpk1wk45KkQCg2iSiouyAnhT7lYpdAmjkViVD WywAoNZ77bk3rTPe+0KKzou3yYn9QpzD =y7Nd -----END PGP SIGNATURE-----
Message: 3 Date: Thu, 22 May 2008 13:43:36 -0400 From: Chad innocentkiller@gmail.com Subject: Re: [Wikitech-l] PHP design basics To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Message-ID: 5924f50a0805221043p4a19b3e9x7cbce95798278bf@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On Thu, May 22, 2008 at 1:00 PM, Brion Vibber brion@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
DanTMan wrote:
extensions/ExtensionName/ExtensionName.php:
$wgHooks['LanguageGetMagic'][] = 'ExtensionName::LanguageGetMagic'; $wgExtensionFunctions[] = array( 'ExtensionName', 'ExtensionFunction' );
Oooh, avoid that last -- that'll load your class on every request, whether it's needed or not.
$wgExtensionFunctions is for initialization functions which will need to be run during Setup.php, after most of the basic infrastructure is up.
These days there's actually usually little if any need for such functions, since we've got the various hook arrays of all sorts which are designed for extensibility and lazy-loading; you can set your various information directly in the arrays in your config/loader code.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkg1ppMACgkQwRnhpk1wk45KkQCg2iSiouyAnhT7lYpdAmjkViVD WywAoNZ77bk3rTPe+0KKzou3yYn9QpzD =y7Nd -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
All of this is great advice (especially for someone new to the code, like myself), and would be of great use to be on Mediawiki.org as well. On that note, perhaps an update of the example extension in SVN might be in order, with some better documentation, so things like this are clear to everyone.
-Chad
Message: 4 Date: Thu, 22 May 2008 15:31:11 -0400 From: "Edward Z. Yang" edwardzyang@thewritingpot.com Subject: Re: [Wikitech-l] PHP design basics To: wikitech-l@lists.wikimedia.org Message-ID: g14hlu$rei$1@ger.gmane.org Content-Type: text/plain; charset=UTF-8
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Starling wrote:
It is nice from a self-documentation standpoint to put var declarations
at
the top of your classes. But understand that a var declaration takes up time and space when the object is initialised. If you leave it out, that overhead can be deferred, and maybe skipped altogether.
I find this to be a very interesting viewpoint. How much time/space, exactly, is saved by moving variable declarations from the object declaration to, say, the constructor? I've always felt that the self-documentation ability derived from having explicit member variables is more important.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFINcn/qTO+fYacSNoRAni3AJ9D9FrdvfkwuPuTQB3+J8jMp+SkyACdHNLY AQsHIEzlYn08EuTN7aSxlfQ= =Oiok -----END PGP SIGNATURE-----
Message: 5 Date: Thu, 22 May 2008 23:53:23 +0200 From: Roan Kattouw roan.kattouw@home.nl Subject: Re: [Wikitech-l] So... status of category intersections? To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: 4835EB53.8040507@home.nl Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Robert Stojnic schreef:
Let me briefly repeat what I said earlier about my experience with this category intersection thingy. Adding categories to lucene index is easy *IF* they are inside the article, e.g. try this:
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%2Binc...
This will give you category intersection of "Living People" and "English comedy writers" in fraction of the second.
That's the dirty way. I've gone ahead and written an alternative way of implementing category intersections using a fulltext search, which means you can run the most crazy intersections; in fact, you can search in an article's categories as if they were the page's contents. It's part of the AdvancedSearch extension which I'm paid to write, but it'll be easy to split off just the intersection functionality into another extension. The upside is that I also have a special page front end ready to go. I'll commit AdvancedSearch into SVN once I've worked out the bugs (provided there are any; it's close to midnight now so I don't really feel like testing stuff any more) and worked out stuff with my 'employer', which shouldn't take more than a few days.
On a technical level, the extension adds the categorysearch table (you need to run update.php to actually create the table), which is basically a rip-off from the searchindex table. It has a cs_page field referencing page_id, and keeps itself updated using the LinksUpdate and ArticleDeleteComplete hooks. There's also a maintenance script to populate the table from scratch.
What I found that the hard part is keeping the index updated. If we want a fancy category intersection system discussed here before we need to have an index that is frequently updated, that will be integrated with the job queue, that will understand templates etc..
Understanding templates is no problem here, since the updater uses the parser's notion of which categories the page is in, and the populate script uses the categorylinks table.
Lucene is not that good with very frequent updates. The usual setting is to have an indexer, make snapshots of the index at regular intervals and then rsync it onto searchers. The whole process takes time, although for a category-only index it will probably be fast. I assume there would be at least few tens of minutes lag anyhow. Our current lucene framework could easily be used for index distribution and such.
I really don't have the faintest idea how Lucene works or how MediaWiki interfaces with it, but I do know that Lucene can handle the stuff we put into the searchindex table. Since the categorysearch table is no different, I think Lucene *should* be able to handle it pretty easily as well. Could someone who actually has a clue about all this reply?
Roan Kattouw (Catrope)
Message: 6 Date: Fri, 23 May 2008 09:45:41 +1000 From: Tim Starling tstarling@wikimedia.org Subject: Re: [Wikitech-l] PHP design basics To: wikitech-l@lists.wikimedia.org Message-ID: g150j7$cak$1@ger.gmane.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Edward Z. Yang wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Starling wrote:
It is nice from a self-documentation standpoint to put var declarations
at
the top of your classes. But understand that a var declaration takes up time and space when the object is initialised. If you leave it out, that overhead can be deferred, and maybe skipped altogether.
I find this to be a very interesting viewpoint. How much time/space, exactly, is saved by moving variable declarations from the object declaration to, say, the constructor? I've always felt that the self-documentation ability derived from having explicit member variables is more important.
Nothing is saved by moving them to the constructor, in fact it'll probably be slower. The idea is to defer variable initialisation until the variable is required. The saving, assuming the variable is never required, is on the order of a microsecond plus 84 bytes. So unless there's a lot of objects or it's in a tight loop, we're in the realm of micro-optimisation, and other considerations, such as personal style, are probably going to take precedence.
I wouldn't recommend leaving the var out when there is a need to initialise it to something simple, and you would need to add an isset():
var $x = 1; ... $this->doStuff( $x );
versus
if ( !isset( $this->x ) ) { $this->x = 1; } $this->doStuff( $x );
The second one is slower.
But please don't forget my main point: an object is a hashtable, you can add and remove variables. Don't be such a stickler for self-documentation and the way things are "meant" to be done that you tie yourself in knots trying to avoid dynamic variable creation. You can always document with a comment instead.
-- Tim Starling
Message: 7 Date: Fri, 23 May 2008 11:59:43 +1000 From: "Nick Jenkins" nickpj@gmail.com Subject: Re: [Wikitech-l] PHP design basics To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Message-ID: JNEIKFDPGFACDNNDIHNMGEICILAA.nickpj@gmail.com Content-Type: text/plain; charset="us-ascii"
It is nice from a self-documentation standpoint to put var declarations
at
the top of your classes. But understand that a var declaration takes up time and space when the object is initialised. If you leave it out,
that
overhead can be deferred, and maybe skipped altogether.
I find this to be a very interesting viewpoint. How much time/space, exactly, is saved by moving variable declarations from the object declaration to, say, the constructor? I've always felt that the self-documentation ability derived from having explicit member variables is more important.
Yeah, I feel the same. Let me outline why. With any code, there are at least 3 dimensions of quality:
- Performance. How quickly it does its job, and with what resources. Can
be improved by having short code paths, minimizing memory use, minimizing compiled code size (so that more of the program fits into L1 or L2 cache), reducing disk access, using low-level languages with less runtime overhead (hand-optimized assembler in the extreme), various caches (opcode caches, object caches, HTTP caches, etc), and doing things only when you need them (lazy loading, just-in-time systems, etc), better algorithms, and so forth. Or performance can be improved by increasing the resources allocated (e.g. more database servers, more apache servers, more squids, more RAM, faster disks, faster CPU, faster network, etc). I.e. do less, and/or do it with more grunt.
- Maintenance. How easily and quickly other programmers can fix or add
functionality to your code when you are away:
- documentation. What is the overall purpose of the code, what problem is
it trying to solve, how are you trying to solve it, what are main functions, what are their parameters.
- making things obvious. E.g. understandable and short variable names,
function names, and class names. Any bits that do something tricky or critical should be documented or explained.
- making things short, and simple. Simpler and shorter things are easier
to hold in your head and understand.
- using a programming language and a style that is familiar to many
people.
- Functionality. How much the code does, how useful what it does is, and
how closely its actual behaviour matches the expected behaviour, and how flexible and general the code is.
There way well be more dimensions and other aspects I haven't covered above, but it'll probably suffice.
I'd argue that most everything that committers are trying to do in MediaWiki is aimed at giving an improvement in one or more of the above dimensions. E.g. fix a bug = improved functionality. Add a feature = improved functionality. Add some documentation = improved maintenance. Standardize an awkward non-standard file to use the same approach as the rest of the code base, which makes it shorter and simpler = improved maintenance. And so forth, with combinations of the above possible.
I'd also argue that anything that is an overall regression in the above dimensions should probably be reverted. E.g. introduce a bug and make performance worse but add one line of documentation = revert.
Now, some of the cases outlined are a clear overall win (that is, they entail a significant improvement in one dimension with no regressions in another, or a very minor regression in another).
E.g. lazy loading probably makes the code a bit longer, and a tiny bit less clear, but improves performance a lot.
However, not declaring class variables seems to me to be a significant overall loss. I for one have looked at MediaWiki code trying to work out where some variable in a huge class came from. It wasn't declared. It wasn't inherited from the parent class (which was also very long). It wasn't inherited from the parent's parent class. Nope. It wasn't documented anywhere. It was just used a few times, without explanation, and without declaration. And to understand what it did you had to read the function that initialized it. That function was also not documented. And that function called another function which you had to understand to understand that the first function did. That function was also not documented. Then that function called a third function, which you have to understand to understand what the second function did to understand what the first function did to understand what the purpose of the class variable was. The whole process wasted about 20 minutes, and by the end of it, I was, to say the least, not very impressed. For a minimal gain in performance by not declaring a variable (and for zero gain in performance by not having any documentation), the maintainability of that code was severely reduced.
So personally, I'm very much in favour of declaring variables (for the simple reason that the performance increase would need to be f*ing huge to counterbalance the enormous reduction in maintainability). But if people _really_ don't want to do this for performance reasons, then fair enough, but at the very least can they please consider documenting those variables, with their scope, name, type, and purpose. E.g.
// This class does batch processing for [insert some reason here] class whatever { // Local variables, not declared for performance reasons: // private $count int How many pages we have looked at thus far in the batch processing. // private $title Title The current page's title that we are currently working on for the batch processing. // ... etc
-- All the best, Nick.
Message: 8 Date: Fri, 23 May 2008 12:50:49 +1000 From: Tim Starling tstarling@wikimedia.org Subject: Re: [Wikitech-l] PHP design basics To: wikitech-l@lists.wikimedia.org Message-ID: g15bea$3tc$1@ger.gmane.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Nick Jenkins wrote:
However, not declaring class variables seems to me to be a significant overall loss. I for one have looked at MediaWiki code trying to work out where some variable in a huge class came from. It wasn't declared. It wasn't inherited from the parent class (which was also very long). It wasn't inherited from the parent's parent class. Nope. It wasn't documented anywhere. It was just used a few times, without explanation, and without declaration. And to understand what it did you had to read the function that initialized it. That function was also not documented. And that function called another function which you had to understand to understand that the first function did. That function was also not documented. Then that function called a third function, which you have to understand to understand what the second function did to understand what the first function did to understand what the purpose of the class variable was. The whole process wasted about 20 minutes, and by the end of it, I was, to say the least, not very impressed. For a minimal gain in performance by not declaring a variable (and for zero gain in performance by not having any documentation), the maintainability of that code was severely reduced.
If you've got no clue whatsoever what is going on in a section of code, an uncommented var declaration is hardly going to do you any good. It seems to me your problem is with documentation, not var statements.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
End of Wikitech-l Digest, Vol 58, Issue 36
wikitech-l@lists.wikimedia.org