Hoi, So we have a problem. The problem is the architecture. The architecture is being worked on. Release 1.5 has an improvement in the database design. The improvements made possible by better software and hardware are used up by the relentless growth of our consumer base. Every bit of capacity is used as it becomes available. This is also what makes it so interesting, Where do we get the money from to pay for all the nice toys.. ( he/she who dies with the most toys, wins)
When you compare what is done on the Mediawiki software and compare it with what is done in a commercial environment, than the first thing that strikes you is the budgets involved. The big achievement of Mediawiki is what it enables for what budget. The price performance ratio is staggering. Yes it comes with occasional downtime. We do not have a SLA, we run on BETA software, we have volunteers making the impossible a daily occurence and it does work on a best effort basis. Is it not great ??
When you consider that some stuff should be deleted in order to get the extra bit of umf out of our systems, you have to realise that we do not produce a "classic" encyclopedia. What we have with Wikipedia is different, it is all the weird and wonderfull stuff we have that makes us different. Some have argued that we do not need all these other wikipedia's because all the knowledge should be included in the one, the English one. I congratulate the en:wikipedia with their 500.000th article. At the same time the biggest growth is in the other projects. Removing a few articles, the so called fancruft, will not make a dent in a pack of butter as the growth is elsewhere and it will continue to grow exponentially for some time to come.
So what will the solution be ?? Your guess is as good as mine (have also been in steamy rooms with highflyers). But I strongly believe that a way will be found as this is one of the most amazing projects to work on. It does change the concepts of how you do business, it is a completely different ecology from the typical comercial world (by the way a professional is someone who is payed to do a job).
Thanks, GerardM
Alex J. Avriette wrote:
On Thu, 17 Mar 2005 10:04:00 +0000, David Gerard dgerard@gmail.com wrote:
Indeed. keats appears to be starting from a personal distaste and then claiming this will be the destruction of Wikipedia. I note a curious lack of substantiating, ahh, numbers. keats, do you have any?
First off, you can call me "Alex". That is my name. If you want to argue on IRC call me keats or whatever you choose to call me. I am not John Keats, nor am I a character in a book. A nick is chosen on IRC to be distinct from others. It could be waerth, jwales, TimStarling, or whatever. But, we are discussing this on a public mailing list, where my name is quite apparent, and I have signed my emails as such. Let's be adults about this.
Second, I do not have any numbers. I said that I felt that we had some problems. I proposed some solutions, and gave my analysis of said problems. The first step in solving a problem is to identify the problem. Then you go and find what might be possible solutions (more disk on the master db, squids on freebsd, postgres on the backend, multiply redundant masters, colos on different continents, a BigIP or two), and you /test them/. The fact that I haven't gone out and built my own wikipedia cluster and tested every solution I offered is hardly a fair criticism. I have at my disposal two powerbooks, two ibooks, and a single 2x 866mhz Linux firewall. I don't have the resources to do all this testing. Nobody just cut me a check for $86,000, either. I am offering my time and my expertise, and even some money. But the testing has to be done by the foundation.
Fundamentally, taking out the word "fancruft": keats appears to be claiming that the mere fact of having 500k articles is unsustainable in MediaWiki. Is this the case? Do we stop all article creation now? If not, what do we do? Ration them?
Do you not see that we are having weekly outages? So we hit 500k articles and lo, it holds together. Where do we go from here? With our one master database server. Somebody goes out and drops $100k on an 8-way 848 opteron. Somebody drops a further $100k on disk. Problem solved until we hit 100M articles. Right, because every single power ranger should have their own page, and every single villain, and every single care bear, and every character from charmed, and every dicdef that never makes it into the wiktionary, and so on and so forth.
You can't just say "let's just stick with what we're doing, it works for now, and we'll just grow the architecture we've got by throwing cubic dollars at it until it works properly."
You're wasting my and the foundation's money by doing so. FIx the architecture and you reduce the cost of operation.
I mean, have you actually ever designed anything near as complicated as the wikipedia? Have you actually ever been in a board room with the program manager, project manager, VP, eight developers, and two sysadmins when you all realize at the same time that the architecture you've got just won't scale to the point you need it to?
I've been there. I can draw you two distinct pictures on the whiteboard I mentioned in my previous email. One will be called "horribly fucked" and the other will be called "ideal". We can get from "horribly fucked" to "ideal" (that's what I do for a living), but we have to stop calling eachother names, and start figuring out what we can do to fix the problem. Test. Benchmark. I mean, start doing what it takes. Right now we're doing nothing. We're putting out fires. Adding more servers is only going to help you until you reach some new unsustainable point. The current architecture is (let me spell it out real big for you)
U N S C A L A B L E
period.