Is there anyone out there who has considered contributing to the
MediaWiki PHP script, but decided it against for reasons such as:
* Don't know PHP/MySQL
* Poor documentation, can't understand the code
* Wouldn't know where to start
* Don't have a test server
* Don't have CVS/server access, and you find contributing otherwise to
be tedious and frustrating
Or perhaps you've contributed on occasion but you're deterred from
contributing again for reasons such as those above?
If so, please speak up! What areas could we improve to make it easier to
If you prefer, you can contact me privately at tstarlingphysicsunimelbeduau.
-- Tim Starling.
What are our plans for the network architecture as soon as the new
server arrives (sometime next week)?
At that time, we hope that larousse and pliny will both be
successfully upgraded (though I believe that Jason has not yet
successfully gotten 4 gig of RAM to work, but 2 gig each and dual
Athlon 2800+ should be do-able) and equivalent to each other.
The DB server will be the DB server, that much we know for sure. :-)
Beyond that, I think that the easiest thing to do would be to have en
served by one machine, and everything else by the other machine.
Based on total article count, which is roughly comparable for en vs
rest-of-the-world, that seems good, but is it really? What about
In the longer term, the right way to do this is not to load balance by
domain names, but to load balance properly.
I have had very good success in the past using iptables and a
configuration that looks a lot like this picture:
Of course, I did this years ago, and the "poor man's" way -- I think
there are probably packages (like ultramonkey!) that are quick
The beauty of this kind of architecture is:
1. high availability -- if one webserving node falls over, traffic
automatically goes to the ones that are still up
2. easy expandability -- just add more webservers, at $2000 a crack
for 'good enough' machines, and install the software and there you go.
Anyhow, to really do something like this, we'd need one more machine,
but it need not be very powerful, since it's only going to be doing
Perhaps we could set up test.wikipedia.org as an ISP-style user script
directory, i.e. with open_basedir and disable_functions and all that. We
could then make a simple upload form which allows people to upload
scripts to it using a web interface.
I'm not suggesting someone else do this, I'm suggesting someone try to
talk me out of it, if there's any reason it won't work.
-- Tim Starling.
After enthusiastically accepting Jimbo's offer to provide me
with "developer access", I quickly soured on having any
It was as I feared: no requirements documents, nothing about
architecture or detailed design. The hardest-working, most
universally respected developer(s) kept basically saying
"just read the code".
So I've kept on the sidelines. I've been a cheerleader and
catalyst (for speed tweaks) and for many months a sort of
chief sysop (carrying out bans and promotions).
As for your request for suggestions, I think the best thing
would be to adopt some sort of "best practices" methodology.
I don't care if it's XP, CMM, or just a checklist of common-
sence guidelines from Steve McConnell's "Rapid Development".
Adopt some mutually agreeable (and useful!) guiding
principles, and follow them: that's what I suggest.
Highly Paid Professional Software Engineer
>Perhaps we could set up test.wikipedia.org ....
Wikipedia is not the only project that runs MediaWiki. Please use
test.wikimedia.org instead. Part of the problem with attracting users and
developers of MediaWiki has been the very strong perception that MediaWiki is
for encyclopedias only.
-- Daniel Mayer (aka mav)
>Wikipedia is not the only project that runs MediaWiki.
>Please use test.wikimedia.org instead. Part of the problem
>with attracting users and developers of MediaWiki has
>been the very strong perception that MediaWiki is for
I should have wrote "for Wikipedia only."
With the sudden reaching of our $6,000 goal, we have money to go
shopping, and I want to do it right away. Jason will be in San Diego
on Monday to finish the first round of upgrades, and I'd love it if he
had to drive back down there on Friday to install the new big db
We actually have $7,000 in Paypal which I can transfer to our regular
bank account, where we already have almost $2,500 of my money that I
put in to kick things off. So that's an actual available funds of
$9,500, although of course we should NOT spend it all on the database
machine unless that's the best thing to do.
Here's what I have lined up at penguincomputing.com for $7,108.00.
Please comment. One thing totally left open here is how to partition
the RAID volume, probably Brion and others can give good advice about
the best way to do that.
One of the toughest decisions is which exact RAM to buy. 1GB pieces
are a lot cheaper, and there are 8 total slots available. So I
thought: buy 4 1GB pieces, and there's plenty of room for growth.
When 2GB pieces drop in price (which I'm sure they will, and quickly),
we have room for 8 gig more or 12 total. Or, if absolutely needed,
we could move these 1GB pieces to other machines in our future network,
and fill this server up to the full 16gig.
It would cost $778 more to have 2x2Gig = 4 versus what I have, which
is 1x4Gig = 4.
For the RAID, I selected 4x36gig in a RAID 5 array, for a logical
drive capacity of 105 gig. I also selected 1 extra hot spare drive,
just for that much more added reliability. There are many other
possibilities for this, and I'm open to recommendations. My
impression is that with RAID 5, more drives means more performance,
but with enough RAM, we shouldn't be hitting the drives that hard
3U (5.25") Rackmount Chassis
Dual AMD Opteron 200 Series Processors
Up to 16GB of PC2100 ECC Reg. DDR RAM
Integrated Dual Channel ATA-100 Controller
Up to Seven Hot-swappable 3.5" Hard Drive Bays
Dual Integrated Gigabit NICs
Three Available PCI Slots
SuSE Linux Enterprise Server 8 for AMD64 Preload
Altus 3200 Documentation
Penguin Computing Three Year Warranty
Hot Swap Power Supply
Dual AMD Opteron 246 Processors
4GB Low Profile PC2700 ECC DDR (4 x 1GB)
Up to 6 Drives on 2 SCSI Channels in two 3-bay SCA Internal Enclosures
LSI 320-2: 2 Channel RAID 64MB w/battery backup
105.0 GB RAID 5 Volume (4+1) 36GB, 10,000RPM Low Profile SCA
52X IDE CD-ROM
Rackmount Ball-Bearing Rails (Rack Depth 26")
Preload ONLY, SuSE Enterprise Server 8 for AMD64
Standard Three Year Warranty
Total installed memory -> 4096 MB
Free memory slots -> 1 SIMM
Free memory slots -> 4 DIMM
1GB/sec Copper -> 2 RJ-45
Total SCA Drive Bays (Low Profile) -> 6 bays
Free SCA Drive Bays (Low Profile) -> 1 bays
Total 5.25" Drive Bays -> 6 bays
IDE Interface -> 4
RAID Channels -> 2
PCI / ISA Slots
PCI slots -> 7 total PCI slots
PCI slots -> 6 free PCI slots
Disk Storage Summary
Mount Point Partition Size File System
RAID 5 Volume
(4+1) x 36GB, 10,000RPM Low Profile SCA Logical Capacity: 105 GB
unallocated 105.0 GB (100.0)
/ ext3 6.00 GB
/boot ext3 80 MB
/home ext3 0 MB
/var ext3 1.02 GB
swap swap 2.04 GB
TOTALS Logical Capacity: 105.0 GB, Allocated Capacity: 0.0 GB (0.0 %)
Unallocated Capacity 105.0 GB (100.0)
I just doubled the speed of the PHP parser.
In my test page ([[Anime]], ~60 links with half broken), I cut the time
for replaceInternalLinks from 800ms to 350ms, and the time for
Article::view from 1310 to 610ms.
This was acheived by eliminating redundant calls to secureAndSplit,
using static variables for constants, and catering to PHP oddities such
as the fact that === is slower than ==.
Okay, you may shower me with praise now.
-- Tim Starling.
In fact the question is no more the choice between Debian, RedHat and SuSE,
but only between RedHat and SuSE, since Debian does not really fit on Athlon
64 for the moment. So this software, which could be an interesting feature
for Wikipedia could make a difference, isn't it ?
Well, Taw, I appreciate that you remember my request. Some people at
wikipedia indicated that being able to use Japanese in TeX would be good.
I personally do not know much about how good, or how crucial, it is. But I
seen japanese wikipedians mentioning to it multiple times, and one time I
asked if I should make a request to others (outside japanese wikipedia).
Some wikipedians responded that that would be nice. So I believe it is at
meaningful thing for math and other TeX-intensive articles.
Oh. Yes, that's right.
>Also note that this machine will be running the database only, not the
>web servers, so the TeX packages don't make any difference here.
>-- brion vibber (brion @ pobox.com)