Hoi,
So we have a problem. The problem is the architecture. The architecture
is being worked on. Release 1.5 has an improvement in the database
design. The improvements made possible by better software and hardware
are used up by the relentless growth of our consumer base. Every bit of
capacity is used as it becomes available. This is also what makes it so
interesting, Where do we get the money from to pay for all the nice
toys.. ( he/she who dies with the most toys, wins)
When you compare what is done on the Mediawiki software and compare it
with what is done in a commercial environment, than the first thing that
strikes you is the budgets involved. The big achievement of Mediawiki is
what it enables for what budget. The price performance ratio is
staggering. Yes it comes with occasional downtime. We do not have a SLA,
we run on BETA software, we have volunteers making the impossible a
daily occurence and it does work on a best effort basis. Is it not great ??
When you consider that some stuff should be deleted in order to get the
extra bit of umf out of our systems, you have to realise that we do not
produce a "classic" encyclopedia. What we have with Wikipedia is
different, it is all the weird and wonderfull stuff we have that makes
us different. Some have argued that we do not need all these other
wikipedia's because all the knowledge should be included in the one, the
English one. I congratulate the en:wikipedia with their 500.000th
article. At the same time the biggest growth is in the other projects.
Removing a few articles, the so called fancruft, will not make a dent in
a pack of butter as the growth is elsewhere and it will continue to grow
exponentially for some time to come.
So what will the solution be ?? Your guess is as good as mine (have also
been in steamy rooms with highflyers). But I strongly believe that a way
will be found as this is one of the most amazing projects to work on. It
does change the concepts of how you do business, it is a completely
different ecology from the typical comercial world (by the way a
professional is someone who is payed to do a job).
Thanks,
GerardM
Alex J. Avriette wrote:
On Thu, 17 Mar 2005 10:04:00 +0000, David Gerard
<dgerard(a)gmail.com> wrote:
Indeed. keats appears to be starting from a
personal distaste and
then claiming this will be the destruction of Wikipedia. I note a
curious lack of substantiating, ahh, numbers. keats, do you have any?
First off, you can call me "Alex". That is my name. If you want to
argue on IRC call me keats or whatever you choose to call me. I am not
John Keats, nor am I a character in a book. A nick is chosen on IRC to
be distinct from others. It could be waerth, jwales, TimStarling, or
whatever. But, we are discussing this on a public mailing list, where
my name is quite apparent, and I have signed my emails as such. Let's
be adults about this.
Second, I do not have any numbers. I said that I felt that we had some
problems. I proposed some solutions, and gave my analysis of said
problems. The first step in solving a problem is to identify the
problem. Then you go and find what might be possible solutions (more
disk on the master db, squids on freebsd, postgres on the backend,
multiply redundant masters, colos on different continents, a BigIP or
two), and you /test them/. The fact that I haven't gone out and built
my own wikipedia cluster and tested every solution I offered is hardly
a fair criticism. I have at my disposal two powerbooks, two ibooks,
and a single 2x 866mhz Linux firewall. I don't have the resources to
do all this testing. Nobody just cut me a check for $86,000, either. I
am offering my time and my expertise, and even some money. But the
testing has to be done by the foundation.
Fundamentally, taking out the word
"fancruft": keats appears to be
claiming that the mere fact of having 500k articles is unsustainable
in MediaWiki. Is this the case? Do we stop all article creation now?
If not, what do we do? Ration them?
Do you not see that we are having weekly outages? So we hit 500k
articles and lo, it holds together. Where do we go from here? With our
one master database server. Somebody goes out and drops $100k on an
8-way 848 opteron. Somebody drops a further $100k on disk. Problem
solved until we hit 100M articles. Right, because every single power
ranger should have their own page, and every single villain, and every
single care bear, and every character from charmed, and every dicdef
that never makes it into the wiktionary, and so on and so forth.
You can't just say "let's just stick with what we're doing, it works
for now, and we'll just grow the architecture we've got by throwing
cubic dollars at it until it works properly."
You're wasting my and the foundation's money by doing so. FIx the
architecture and you reduce the cost of operation.
I mean, have you actually ever designed anything near as complicated
as the wikipedia? Have you actually ever been in a board room with the
program manager, project manager, VP, eight developers, and two
sysadmins when you all realize at the same time that the architecture
you've got just won't scale to the point you need it to?
I've been there. I can draw you two distinct pictures on the
whiteboard I mentioned in my previous email. One will be called
"horribly fucked" and the other will be called "ideal". We can get
from "horribly fucked" to "ideal" (that's what I do for a living),
but
we have to stop calling eachother names, and start figuring out what
we can do to fix the problem. Test. Benchmark. I mean, start doing
what it takes. Right now we're doing nothing. We're putting out fires.
Adding more servers is only going to help you until you reach some new
unsustainable point. The current architecture is (let me spell it out
real big for you)
U N S C A L A B L E
period.