Re: Many things - Wikitech-l

22 Jun 2003


      ...
Subject:
[Wikitech-l] search in spanish wikipedia is not workin
Fixed.
...
----- The following addresses had permanent fatal errors -----
wikidown@wikipedia.org
   (reason: 550 5.1.1 wikidown@wikipedia.org... User unknown)
Should be wikidown@bomis.com; this is now set in LocalSettings.php (it 
used to be hard coded), I've corrected it there.
...
Brion goes on vacation and everything starts to fall apart. First order of 
business of the Wikimedia Foundation is to set up a fund to clone Brion. :-)
Hey, that could be fun. :)
...
There must be something like
set-variable =   max_connections=somebignumber
in mysql.conf.
At present my.cnf has:
set-variable    = max_connections=560
vs Apaches':
MaxClients 175 (on pliny)
MaxClients 200 (on larousse)
so we might have at most 375 apache processes attacking us at once. 
However, they might each take two mysql connections -- if the persistent 
connection is broken, it can't be closed (at least from PHP) short of 
killing the process, so it just opens a second non-persistent 
connection. And, in theory, we might see a handful more from SQL 
queries, which open another connection using a separate user for 
restricted permissions.
We could probably do with lowering the max apaches on pliny a bit and 
upping the max connections on mysql a bit just to keep that particular 
part from blowing up; however if they are blowing up, that's going to be 
a symptom of something else...
...
We do dynamic gzipping of pages on a rather large website (~3.000.000 
dynamic hits daily). The experience we gathered so far showed us, that 
the gzipping itself is actually rather fast, compared to the page 
generation process through PHP/Perl. The main problem with dynamic 
gzipping is, that you have to build up the whole page in memory, 
instead of sending out lines as they are generated (don't know, how 
the Wikipedia software currently works).
Currently the page is output in several chunks, but usually the majority 
of it is the wiki page itself, which is processed (over and over and 
over) and eventually output as one chunk. The other chunks are in the 
headers and footers, generally.
If we're generating a newly cachable page, we turn on complete page 
buffering and capture the buffer to save it to disk (gzipped and not).
There are, of course, improvements that can be made to our parser...
-- brion vibber (brion @ pobox.com)