WHY THE FUCK WON'T YOU STOP FUCKING SPAMMING ME MOTHERFUCKER GO TO HELL AND
DIE
On Thu, May 22, 2008 at 4:51 PM, <wikitech-l-request(a)lists.wikimedia.org>
wrote:
Send Wikitech-l mailing list submissions to
wikitech-l(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
or, via email, send a message with subject or body 'help' to
wikitech-l-request(a)lists.wikimedia.org
You can reach the person managing the list at
wikitech-l-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikitech-l digest..."
Today's Topics:
1. Re: [MediaWiki-CVS] SVN: [35156]
trunk/phase3/includes/filerepo (Brion Vibber)
2. Re: PHP design basics (Brion Vibber)
3. Re: PHP design basics (Chad)
4. Re: PHP design basics (Edward Z. Yang)
5. Re: So... status of category intersections? (Roan Kattouw)
6. Re: PHP design basics (Tim Starling)
7. Re: PHP design basics (Nick Jenkins)
8. Re: PHP design basics (Tim Starling)
----------------------------------------------------------------------
Message: 1
Date: Thu, 22 May 2008 09:27:47 -0700
From: Brion Vibber <brion(a)wikimedia.org>
Subject: Re: [Wikitech-l] [MediaWiki-CVS] SVN: [35156]
trunk/phase3/includes/filerepo
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID: <48359F03.2050305(a)wikimedia.org>
Content-Type: text/plain; charset=ISO-8859-1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Roan Kattouw wrote:
brion(a)svn.wikimedia.org schreef:
* Asks server for mime type, but doesn't get
it back yet
That's due to a typo:
>
> +
> + function findFile( $title, $time = false ) {
> + $info = $this->queryImage( array(
> + 'titles' => 'Image:' . $title->getText(),
> + 'prop' => 'imageinfo',
> + 'iiprop' =>
'timestamp|user|comment|url|size|sha1|metadata|mimetype' ) );
There it is. You should use iiprop=mime, not iiprop=mimetype.
Ah, but I didn't get a mime back with my query with "mime" until I fixed
the API's own typo -- it looked for "mimetype" but that didn't get
past
the input validation -- in r35185... :)
The rest of this stuff looks good. You seem to
have fixed the canonical
namespace thing. To what degree did you test/profile/benchmark this?
Pretty much not profiled or benchmarked at all. :)
Just poking at it a bit on my test comp. I wouldn't recommend it for
general use at the moment.
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkg1nwMACgkQwRnhpk1wk46e7QCgxIOcZZYOKQOT+tUpMbGYQxA1
bYcAn05gvvUa7WzBGVlOh/6s9uc+exYT
=mwZt
-----END PGP SIGNATURE-----
------------------------------
Message: 2
Date: Thu, 22 May 2008 10:00:03 -0700
From: Brion Vibber <brion(a)wikimedia.org>
Subject: Re: [Wikitech-l] PHP design basics
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID: <4835A693.90102(a)wikimedia.org>
Content-Type: text/plain; charset=ISO-8859-1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
DanTMan wrote:
extensions/ExtensionName/ExtensionName.php:
$wgHooks['LanguageGetMagic'][] = 'ExtensionName::LanguageGetMagic';
$wgExtensionFunctions[] = array( 'ExtensionName', 'ExtensionFunction' );
Oooh, avoid that last -- that'll load your class on every request,
whether it's needed or not.
$wgExtensionFunctions is for initialization functions which will need to
be run during Setup.php, after most of the basic infrastructure is up.
These days there's actually usually little if any need for such
functions, since we've got the various hook arrays of all sorts which
are designed for extensibility and lazy-loading; you can set your
various information directly in the arrays in your config/loader code.
- -- brion vibber (brion @
wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkg1ppMACgkQwRnhpk1wk45KkQCg2iSiouyAnhT7lYpdAmjkViVD
WywAoNZ77bk3rTPe+0KKzou3yYn9QpzD
=y7Nd
-----END PGP SIGNATURE-----
------------------------------
Message: 3
Date: Thu, 22 May 2008 13:43:36 -0400
From: Chad <innocentkiller(a)gmail.com>
Subject: Re: [Wikitech-l] PHP design basics
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<5924f50a0805221043p4a19b3e9x7cbce95798278bf(a)mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Thu, May 22, 2008 at 1:00 PM, Brion Vibber <brion(a)wikimedia.org> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
DanTMan wrote:
extensions/ExtensionName/ExtensionName.php:
$wgHooks['LanguageGetMagic'][] = 'ExtensionName::LanguageGetMagic';
$wgExtensionFunctions[] = array( 'ExtensionName', 'ExtensionFunction' );
Oooh, avoid that last -- that'll load your class on every request,
whether it's needed or not.
$wgExtensionFunctions is for initialization functions which will need to
be run during Setup.php, after most of the basic infrastructure is up.
These days there's actually usually little if any need for such
functions, since we've got the various hook arrays of all sorts which
are designed for extensibility and lazy-loading; you can set your
various information directly in the arrays in your config/loader code.
- -- brion vibber (brion @
wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkg1ppMACgkQwRnhpk1wk45KkQCg2iSiouyAnhT7lYpdAmjkViVD
WywAoNZ77bk3rTPe+0KKzou3yYn9QpzD
=y7Nd
-----END PGP SIGNATURE-----
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
All of this is great advice (especially for someone new to the code, like
myself), and would be of great use to be on
Mediawiki.org as well. On
that note, perhaps an update of the example extension in SVN might be
in order, with some better documentation, so things like this are clear to
everyone.
-Chad
------------------------------
Message: 4
Date: Thu, 22 May 2008 15:31:11 -0400
From: "Edward Z. Yang" <edwardzyang(a)thewritingpot.com>
Subject: Re: [Wikitech-l] PHP design basics
To: wikitech-l(a)lists.wikimedia.org
Message-ID: <g14hlu$rei$1(a)ger.gmane.org>
Content-Type: text/plain; charset=UTF-8
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Tim Starling wrote:
It is nice from a self-documentation standpoint
to put var declarations
at
the top of your classes. But understand that a
var declaration takes up
time and space when the object is initialised. If you leave it out, that
overhead can be deferred, and maybe skipped altogether.
I find this to be a very interesting viewpoint. How much time/space,
exactly, is saved by moving variable declarations from the object
declaration to, say, the constructor? I've always felt that the
self-documentation ability derived from having explicit member variables
is more important.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iD8DBQFINcn/qTO+fYacSNoRAni3AJ9D9FrdvfkwuPuTQB3+J8jMp+SkyACdHNLY
AQsHIEzlYn08EuTN7aSxlfQ=
=Oiok
-----END PGP SIGNATURE-----
------------------------------
Message: 5
Date: Thu, 22 May 2008 23:53:23 +0200
From: Roan Kattouw <roan.kattouw(a)home.nl>
Subject: Re: [Wikitech-l] So... status of category intersections?
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID: <4835EB53.8040507(a)home.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Robert Stojnic schreef:
Let me briefly repeat what I said earlier about
my experience with this
category
intersection thingy. Adding categories to lucene index is easy *IF* they
are inside
the article, e.g. try this:
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%2Bin…
This will give you category intersection of "Living People" and "English
comedy writers"
in fraction of the second.
That's the dirty way. I've gone ahead and written an alternative way of
implementing category intersections using a fulltext search, which means
you can run the most crazy intersections; in fact, you can search in an
article's categories as if they were the page's contents. It's part of
the AdvancedSearch extension which I'm paid to write, but it'll be easy
to split off just the intersection functionality into another extension.
The upside is that I also have a special page front end ready to go.
I'll commit AdvancedSearch into SVN once I've worked out the bugs
(provided there are any; it's close to midnight now so I don't really
feel like testing stuff any more) and worked out stuff with my
'employer', which shouldn't take more than a few days.
On a technical level, the extension adds the categorysearch table (you
need to run update.php to actually create the table), which is basically
a rip-off from the searchindex table. It has a cs_page field referencing
page_id, and keeps itself updated using the LinksUpdate and
ArticleDeleteComplete hooks. There's also a maintenance script to
populate the table from scratch.
What I found that the hard part is keeping the
index updated. If we want
a fancy category
intersection system discussed here before we need to have an index that
is frequently updated,
that will be integrated with the job queue, that will understand
templates etc..
Understanding templates is no problem here, since the updater uses the
parser's notion of which categories the page is in, and the populate
script uses the categorylinks table.
Lucene is not that good with very frequent
updates. The usual setting is
to have an indexer,
make snapshots of the index at regular intervals and then rsync it onto
searchers. The whole
process takes time, although for a category-only index it will probably
be fast. I assume there
would be at least few tens of minutes lag anyhow. Our current lucene
framework could
easily be used for index distribution and such.
I really don't have the faintest idea how Lucene works or how MediaWiki
interfaces with it, but I do know that Lucene can handle the stuff we
put into the searchindex table. Since the categorysearch table is no
different, I think Lucene *should* be able to handle it pretty easily as
well. Could someone who actually has a clue about all this reply?
Roan Kattouw (Catrope)
------------------------------
Message: 6
Date: Fri, 23 May 2008 09:45:41 +1000
From: Tim Starling <tstarling(a)wikimedia.org>
Subject: Re: [Wikitech-l] PHP design basics
To: wikitech-l(a)lists.wikimedia.org
Message-ID: <g150j7$cak$1(a)ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Edward Z. Yang wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Tim Starling wrote:
> It is nice from a self-documentation standpoint to put var declarations
at
the top
of your classes. But understand that a var declaration takes up
time and space when the object is initialised. If you leave it out, that
overhead can be deferred, and maybe skipped altogether.
I find this to be a very interesting viewpoint. How much time/space,
exactly, is saved by moving variable declarations from the object
declaration to, say, the constructor? I've always felt that the
self-documentation ability derived from having explicit member variables
is more important.
Nothing is saved by moving them to the constructor, in fact it'll probably
be slower. The idea is to defer variable initialisation until the variable
is required. The saving, assuming the variable is never required, is on
the order of a microsecond plus 84 bytes. So unless there's a lot of
objects or it's in a tight loop, we're in the realm of micro-optimisation,
and other considerations, such as personal style, are probably going to
take precedence.
I wouldn't recommend leaving the var out when there is a need to
initialise it to something simple, and you would need to add an isset():
var $x = 1;
...
$this->doStuff( $x );
versus
if ( !isset( $this->x ) ) {
$this->x = 1;
}
$this->doStuff( $x );
The second one is slower.
But please don't forget my main point: an object is a hashtable, you can
add and remove variables. Don't be such a stickler for self-documentation
and the way things are "meant" to be done that you tie yourself in knots
trying to avoid dynamic variable creation. You can always document with a
comment instead.
-- Tim Starling
------------------------------
Message: 7
Date: Fri, 23 May 2008 11:59:43 +1000
From: "Nick Jenkins" <nickpj(a)gmail.com>
Subject: Re: [Wikitech-l] PHP design basics
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
Message-ID: <JNEIKFDPGFACDNNDIHNMGEICILAA.nickpj(a)gmail.com>
Content-Type: text/plain; charset="us-ascii"
>
It is nice from a self-documentation
standpoint to put var declarations
at
> the top of your classes. But understand that
a var declaration takes up
> time and space when the object is initialised. If you leave it out,
that
overhead
can be deferred, and maybe skipped altogether.
I find this to be a very interesting viewpoint. How much time/space,
exactly, is saved by moving variable declarations from the object
declaration to, say, the constructor? I've always felt that the
self-documentation ability derived from having explicit member variables
is more important.
Yeah, I feel the same. Let me outline why. With any code, there are at
least 3 dimensions of quality:
1) Performance. How quickly it does its job, and with what resources. Can
be improved by
having short code paths, minimizing memory use, minimizing compiled code
size (so that more of the program fits
into L1 or L2 cache), reducing disk access, using low-level languages
with less runtime overhead (hand-optimized
assembler in the extreme), various caches (opcode caches, object caches,
HTTP caches, etc), and doing things only
when you need them (lazy loading, just-in-time systems, etc), better
algorithms, and so forth. Or performance can
be improved by increasing the resources allocated (e.g. more database
servers, more apache servers, more squids,
more RAM, faster disks, faster CPU, faster network, etc). I.e. do less,
and/or do it with more grunt.
2) Maintenance. How easily and quickly other programmers can fix or add
functionality to your code when you are away:
- documentation. What is the overall purpose of the code, what problem is
it trying to solve,
how are you trying to solve it, what are main functions, what are their
parameters.
- making things obvious. E.g. understandable and short variable names,
function names, and class names.
Any bits that do something tricky or critical should be documented or
explained.
- making things short, and simple. Simpler and shorter things are easier
to hold in your head and understand.
- using a programming language and a style that is familiar to many
people.
3) Functionality. How much the code does, how useful what it does is, and
how closely its actual behaviour matches
the expected behaviour, and how flexible and general the code is.
There way well be more dimensions and other aspects I haven't covered
above, but it'll probably suffice.
I'd argue that most everything that committers are trying to do in
MediaWiki is aimed at giving an improvement in
one or more of the above dimensions. E.g. fix a bug = improved
functionality. Add a feature = improved functionality.
Add some documentation = improved maintenance. Standardize an awkward
non-standard file to use the same approach as
the rest of the code base, which makes it shorter and simpler = improved
maintenance. And so forth, with
combinations of the above possible.
I'd also argue that anything that is an overall regression in the above
dimensions should probably be reverted.
E.g. introduce a bug and make performance worse but add one line of
documentation = revert.
Now, some of the cases outlined are a clear overall win (that is, they
entail a significant improvement in one
dimension with no regressions in another, or a very minor regression in
another).
E.g. lazy loading probably makes the code a bit longer, and a tiny bit less
clear, but improves performance a lot.
However, not declaring class variables seems to me to be a significant
overall loss. I for one have looked at MediaWiki
code trying to work out where some variable in a huge class came from. It
wasn't declared. It wasn't inherited from the parent
class (which was also very long). It wasn't inherited from the parent's
parent class. Nope. It wasn't documented anywhere.
It was just used a few times, without explanation, and without declaration.
And to understand what it did you had to read
the function that initialized it. That function was also not documented.
And that function called another function which
you had to understand to understand that the first function did. That
function was also not documented. Then that function
called a third function, which you have to understand to understand what
the second function did to understand what the
first function did to understand what the purpose of the class variable
was. The whole process wasted about 20 minutes, and
by the end of it, I was, to say the least, not very impressed. For a
minimal gain in performance by not declaring a
variable (and for zero gain in performance by not having any
documentation), the maintainability of that code was severely
reduced.
So personally, I'm very much in favour of declaring variables (for the
simple reason that the performance increase would
need to be f*ing huge to counterbalance the enormous reduction in
maintainability). But if people _really_ don't want to do
this for performance reasons, then fair enough, but at the very least can
they please consider documenting those variables,
with their scope, name, type, and purpose. E.g.
-------------------------------------
// This class does batch processing for [insert some reason here]
class whatever {
// Local variables, not declared for performance reasons:
// private $count int How many pages we have looked at thus far in the
batch processing.
// private $title Title The current page's title that we are currently
working on for the batch processing.
// ... etc
-------------------------------------
-- All the best,
Nick.
------------------------------
Message: 8
Date: Fri, 23 May 2008 12:50:49 +1000
From: Tim Starling <tstarling(a)wikimedia.org>
Subject: Re: [Wikitech-l] PHP design basics
To: wikitech-l(a)lists.wikimedia.org
Message-ID: <g15bea$3tc$1(a)ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Nick Jenkins wrote:
However, not declaring class variables seems to
me to be a significant
overall loss. I for one have looked at MediaWiki code trying to work
out where some variable in a huge class came from. It wasn't declared.
It wasn't inherited from the parent class (which was also very long).
It wasn't inherited from the parent's parent class. Nope. It wasn't
documented anywhere. It was just used a few times, without explanation,
and without declaration. And to understand what it did you had to read
the function that initialized it. That function was also not
documented. And that function called another function which you had to
understand to understand that the first function did. That function was
also not documented. Then that function called a third function, which
you have to understand to understand what the second function did to
understand what the first function did to understand what the purpose
of the class variable was. The whole process wasted about 20 minutes,
and by the end of it, I was, to say the least, not very impressed. For
a minimal gain in performance by not declaring a variable (and for zero
gain in performance by not having any documentation), the
maintainability of that code was severely reduced.
If you've got no clue whatsoever what is going on in a section of code, an
uncommented var declaration is hardly going to do you any good. It seems
to me your problem is with documentation, not var statements.
-- Tim Starling
------------------------------
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
End of Wikitech-l Digest, Vol 58, Issue 36
******************************************