Chat about Wikipedia performance?

List overview All Threads
Download

newer

older

Apache updates

Mail server box - since Jason is...

erik_moeller＠gmx.de

28 Apr 2003 28 Apr '03

11:37 p.m.

How about organizing a chat this week about the ongoing Wikipedia performance crisis and how to solve it? Talking to people can provide additional motivation for getting things done, and help us organize our priorities. It might also reduce some frustration. If we do this, all the relevant people should be present:

- Jimbo - Jason - Brion - Lee - Magnus - ...

It might be best to meet on the weekend, so that work does not interfere. My suggestion would be Saturday, 20:00 UTC.

What do you think?

Regards,

Erik

Show replies by date

Lee Daniel Crocker

28 Apr 28 Apr

11:47 p.m.

...

(Erik Moeller erik_moeller@gmx.de): How about organizing a chat this week about the ongoing Wikipedia performance crisis and how to solve it? Talking to people can provide additional motivation for getting things done, and help us organize our priorities. It might also reduce some frustration. If we do this, all the relevant people should be present:

Jimbo

Jason

Brion

Lee

Magnus

...

It might be best to meet on the weekend, so that work does not interfere. My suggestion would be Saturday, 20:00 UTC.

What do you think?

I'll have some more performance numbers by then, and I'm happy to participate in whatever the group wants to do. But personally, I've never been big on online chats, and I don't see that this one could accomplish anything that wouldn't be better accomplished here on wikitech-l.

-- Lee Daniel Crocker lee@piclab.com http://www.piclab.com/lee/ "All inventions or works of authorship original to me, herein and past, are placed irrevocably in the public domain, and may be used or modified for any purpose, without permission, attribution, or notification."--LDC

erik_moeller＠gmx.de

29 Apr 29 Apr

12:14 a.m.

...

I'll have some more performance numbers by then, and I'm happy to participate in whatever the group wants to do. But personally, I've never been big on online chats, and I don't see that this one could accomplish anything that wouldn't be better accomplished here on wikitech-l.

Cool. The problem with mailing list discussion is that they can die quickly, for many reasons, which can delay things unnecessarily. I've seen many situations where a mailing list was used to report a serious problem, but the post (in spite of hundreds of members) was ignored.

We all know that the performance issue is one of our most pressing problems right now -- many people can't use the site anymore, and the international Wikipedians are getting a bit irritated. So I think the best way to address this *on time* is to sit down (virtually) and go through an agenda.

Regards,

Erik

Brion Vibber

1:34 a.m.

On Mon, 2003-04-28 at 14:14, Erik Moeller wrote:

...

The problem with mailing list discussion is that they can die quickly, for many reasons, which can delay things unnecessarily. I've seen many situations where a mailing list was used to report a serious problem, but the post (in spite of hundreds of members) was ignored.

Reporting a serious problem is all well and good, but isn't the same as _fixing_ it.

...

We all know that the performance issue is one of our most pressing problems right now -- many people can't use the site anymore, and the international Wikipedians are getting a bit irritated. So I think the best way to address this *on time* is to sit down (virtually) and go through an agenda.

We don't need to sit and chat. We need *code* and we need a second server to divide the "must-do-fast" web work the and "chug-chug-chug" database labor.

Here are some things you can work on if you've got time to spend on Wikipedia coding:

* Page viewing is still kinda inefficient. Rendering everything on every view is not so good... Caching can save both processing time in conversion to HTML, and in various database accesses (checking link tables, etc) with its associated potential locking overhead.

We need to either be able to cache the HTML of entire pages (followed by insertion of user-specific data/links or simple options through style sheet selection or string replacement) or to cache just the generated HTML of the wiki pages for insertion into the page structure (plus associated data, like interlanuage links, need to be accessible without parsing the page).

We need to tell which pages are or aren't cacheable (not a diff, not a special page, not a history revision, not a user with really weird display options -- or on the other hand, maybe we _could_ cache those, if only we can distinguish them), we need to be able to generate and save the cached material appropriately, we need to make sure it's invalidated properly, and we need to be able to do mass invalidation when, for instance, the software is upgraded. Cached pages may be kept in files, rather than the database.

I should point out that while there are several possible choices here, any of them is better than what we're running now. We need living, running _code_, which can then be improved upon later.

* The page saving code is rather inefficient, particularly with how it deals with the link tables (and potentially buggy -- sometimes pages end up with their link table entries missing, possibly due to the system timing out between the main save chunk and the link table update). If someone would like to work on this, it would be very welcome. Nothing that needs to be _discussed_, it just needs to be _done_ and changes checked in.

* Various special pages are so slow they've been disabled. Most of them could be made much more efficient with better queries and/or by maintaining summary tables. Some remaining ones are also pretty inefficient, like the Watchlist. Someone needs to look into these and make the necessary adjustments to the code. Nothing to _chat_ about; if you know how to make them more efficient, please rewrite them and check in the _code_.

* Can MySQL 4 handle fulltext searches better under load? Is boolean mode faster or slower? Someone needs to test this (Lee has a test rig with mysql4 already, but as far as I know hasn't tested the fulltext search with boolean mode yet), and if it's good news, we need to make an upgrade a high priority. Not much to _chat_ about, it just needs to get _done_.

* Alternately, would a completely separate search system (not using MySQL) be more efficient? Or even just running searches on a dedicated box with a replicated database to keep it from bogging down the main db? Which leads us back to hardware...

For the server; I don't know what's going on here. What I do know is that Jimbo posted this to wikitech-l in February:

-----Forwarded Message-----

From: Jimmy Wales jwales@bomis.com To: wikitech-l@wikipedia.org Subject: [Wikitech-l] Hardware inventory Date: 07 Feb 2003 02:56:57 -0800

Jason and I are taking stock of our hardware, and I'm going to find a secondary machine to devote exclusively to doing apache for wikipedia, i.e. with no other websites on it or anything. I'll loan the machine to the Wikipedia Foundation until the Foundation has money to buy a new machine later on this year.

We'll keep the MYSQL where it is, on the powerful machine. The new machine will be no slouch, either.

Today is Friday, and I think we'll have to wait for Jason to take a trip to San Diego next week sometime (or the week following) to get this all setup. (The machine I have in mind is actually in need of minor repair right now.)

By having this new machine be exclusively wikipedia, I can give the developers access to it, which is a good thing.

This will *not* involve a "failover to read-only" mechanism, I guess, but then, it's still going to be a major improvement -- such a mechanism is really a band-aid on a fundamental problem, anyway.

------

Lots of people think it's a good thing to set up mirror servers all over the Internet. It's really not that simple. There are issues of organizational trust with user data, issues with network latency, etc. Some things should be decentralized, some things should be centralized.

--- end forwarded message ---

and this to wikipedia-l in March:

-----Forwarded Message-----

From: Jimmy Wales jwales@bomis.com To: wikipedia-l@wikipedia.org, wikien-l@wikipedia.org Subject: [Wikipedia-l] Off today Date: 19 Mar 2003 04:47:52 -0800

My wife and little girl are feeling ill today with a cold, so I'm going to be taking off work to help out. I'm already a little behind in wikipedia email, so I'll probably be slow for a few days as I dig out.

We're getting a new (second) machine for wikipedia -- the parts have been ordered and are being shipped to Jason, and then at some point soon, he'll drive down to San Diego to install everything.

--Jimbo

--- end forwarded message ---

I e-mailed Jimbo and Jason the other day about this; I haven't heard back from Jimbo, and Jason still doesn't know anything concrete about the new server.

Jimbo, we really need some news on this front. If parts and/or a whole machine really *is* on order and can be set up in the near future, we need to know that. If it's *not*, then it may be time to pass around the plate and have interested parties make sure one does get ordered, as had begun to be discussed prior to the March 19 announcement.

-- brion vibber (brion @ pobox.com)

Lee Daniel Crocker

1:45 a.m.

...

We don't need to sit and chat. We need *code* and we need a second server to divide the "must-do-fast" web work the and "chug-chug-chug" database labor.

The code issue is mostly a matter of focus: one or two developers is probably sufficient to keep the codebase up to date, but neither Brion nor I are focused on that right now.

So while Brion has issued a call for coders, that could be answered in other ways: for example, if a good admin stepped up to take some of admin tasks Brion is currently swamped with, he might be more free to code (assuming he's interested, which is not a given either). I've chosen to focus more on long-term goals because like Brion I was expecting hardware to bail us out in the short term. If that's going to be delayed, then I can put off things like testing file systems and focus on caching and tuning.

Nick Reinking

7:44 p.m.

On Mon, Apr 28, 2003 at 05:45:18PM -0500, Lee Daniel Crocker wrote:

...

...
We don't need to sit and chat. We need *code* and we need a second server to divide the "must-do-fast" web work the and "chug-chug-chug" database labor.

The code issue is mostly a matter of focus: one or two developers is probably sufficient to keep the codebase up to date, but neither Brion nor I are focused on that right now.

So while Brion has issued a call for coders, that could be answered in other ways: for example, if a good admin stepped up to take some of admin tasks Brion is currently swamped with, he might be more free to code (assuming he's interested, which is not a given either). I've chosen to focus more on long-term goals because like Brion I was expecting hardware to bail us out in the short term. If that's going to be delayed, then I can put off things like testing file systems and focus on caching and tuning.

I'm certainly willing to help out here. I'm not in SoCal, but I should be able to help out with most administrivial tasks. I'm going to be able to help out a much more with tuning at a file system/OS/Apache level than I will be at a PHP/SQL level.

-- Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN

John R. Owens

30 Apr 30 Apr

2:19 a.m.

On Tue, 29 Apr 2003, Nick Reinking wrote:

...

Date: Tue, 29 Apr 2003 11:44:45 -0500 From: Nick Reinking nick@twoevils.org Subject: Re: [Wikitech-l] Chat about Wikipedia performance?

On Mon, Apr 28, 2003 at 05:45:18PM -0500, Lee Daniel Crocker wrote:

...
The code issue is mostly a matter of focus: one or two developers is probably sufficient to keep the codebase up to date, but neither Brion nor I are focused on that right now.

So while Brion has issued a call for coders, that could be answered in other ways: for example, if a good admin stepped up to take some of admin tasks Brion is currently swamped with, he might be more free to code (assuming he's interested, which is not a given either). I've chosen to focus more on long-term goals because like Brion I was expecting hardware to bail us out in the short term. If that's going to be delayed, then I can put off things like testing file systems and focus on caching and tuning.

I'm certainly willing to help out here. I'm not in SoCal, but I should be able to help out with most administrivial tasks. I'm going to be able to help out a much more with tuning at a file system/OS/Apache level than I will be at a PHP/SQL level.

Since I've just joined on to the tech list, might as well introduce myself in the tech context. My specialty is administration of routers and WANs, and along the way I've come to know general Linux quite well, HTML & HTTP, Apache, Perl & CGI, & MySQL. A few other things that I don't think would be very relevant, but you never know, would be DNS (Bind, of course), Sendmail, and TCP/IP details. The only thing holding me back from being a useful coder so far seems to be that I don't know beans about PHP, but I could certainly be similarly helpful in that "relief pitcher" kind of way.

-- John R. Owens http://www.ghiapet.homeip.net/ Sleep is a common substitute for adequate caffeine intake. --John Owens

erik_moeller＠gmx.de

29 Apr 29 Apr

2:46 a.m.

Brion, you're missing my point. I agree with you entirely that things need to "get done". My suggestion to have a public discussion was to find out which things we can get done reasonably quickly (because, realistically, we all have other things to do) with substantial impact; to figure out the server situation, which features should be disabled, who might contribute which piece of code etc. If we can sort these things out in the next few days via mail, fine. I'm no IRC junkie. But we need to implement at least some reasonable emergency fixes, and think about a mid term strategy.

As for code, this is one thing I'd like to talk about: If we have the Nupedia Foundation set up, we can collect donations. It would be stupid not to use some of that money for funding development. I don't care who is funded, but I think this could greatly speed things up. If we can't get the NF set up reasonably quickly, we should collect donations regardless, tax-deductible or not.

...

Page viewing is still kinda inefficient. Rendering everything on every

view is not so good...

Why? It's just PHP stuff. Our bottleneck is the database server. Fetching stuff from CUR and converting it into HTML is not an issue. 20 pass parser? Add another zero. Until I see evidence that this has any impact on performance, I don't care. Turn off link checking and all pages are rendered lightning fast.

What would be useful is to maintain a persistent (over several sessions) index of all existing and non existing pages in memory for the link checking. A file on a ramdisk maybe? I think it would be worth giving it a try at least, and not a lot of work.

...

We need to tell which pages are or aren't cacheable (not a diff, not a special page, not a history revision, not a user with really weird display options -- or on the other hand, maybe we _could_ cache those, if only we can distinguish them), we need to be able to generate and save the cached material appropriately, we need to make sure it's invalidated properly, and we need to be able to do mass invalidation when, for instance, the software is upgraded. Cached pages may be kept in files, rather than the database.

Wasted effort, IMHO. Cache improvements have added little measurable performance benefits, and there are many, many different situations to test here (different browsers, different browser cache settings etc.). Meanwhile, our real bottlenecks (search, special pages, out of control queries) remain in place.

...

The page saving code is rather inefficient, particularly with how it

deals with the link tables (and potentially buggy -- sometimes pages end up with their link table entries missing, possibly due to the system timing out between the main save chunk and the link table update). If someone would like to work on this, it would be very welcome. Nothing that needs to be _discussed_, it just needs to be _done_ and changes checked in.

I doubt that a *relatively* rare activity like that makes much of an impact, but I'll be happy to be proven wrong. Bugs are annoying, but I'm writing this for one reason: we need to make Wikipedia usable again on a regular basis. There are countless small problems that need to be fixed. This is not the issue here.

...

Various special pages are so slow they've been disabled. Most of them

could be made much more efficient with better queries and/or by maintaining summary tables. Some remaining ones are also pretty inefficient, like the Watchlist. Someone needs to look into these and make the necessary adjustments to the code.

Caching special pages seems like a reasonable approach. Watchlists could definitely be improved, haven't seen a good way to do this yet, though. It could be done on page save, but with a much-watched page, this again would add severe drain, with possibly no overall benefit. Improve the SQL and indexes? Maybe, but I'm no SQL guru.

...

Can MySQL 4 handle fulltext searches better under load? Is boolean

mode faster or slower? Someone needs to test this (Lee has a test rig with mysql4 already, but as far as I know hasn't tested the fulltext search with boolean mode yet), and if it's good news, we need to make an upgrade a high priority.

Sounds good to me. If safe enough, we should update in any case; it is my understanding that MySQL4 has support for subqueries which could, if we know what we're doing, potentially be used to write significantly more effective queries.

Regards,

Erik

Brion Vibber

4:05 a.m.

On Mon, 2003-04-28 at 16:46, Erik Moeller wrote:

...

Brion, you're missing my point. I agree with you entirely that things need to "get done". My suggestion to have a public discussion was to find out which things we can get done reasonably quickly (because, realistically, we all have other things to do) with substantial impact; to figure out the server situation, which features should be disabled, who might contribute which piece of code etc. If we can sort these things out in the next few days via mail, fine. I'm no IRC junkie. But we need to implement at least some reasonable emergency fixes, and think about a mid term strategy.

Well maybe, but my experience with using online chats like this is: * Everyone sits around for several hours babbling, waiting for the other folks to show up and complaining about the problems they're having logging in. * By the end, someone has scribbled up a page with a work plan, which everyone ignores in the future. * During all this time, they _could_ have been doing something productive instead...

...

As for code, this is one thing I'd like to talk about: If we have the Nupedia Foundation set up...

The status of the non-profit is indeed another thing Jimbo could shed some light on...

...

...

Page viewing is still kinda inefficient. Rendering everything on every

view is not so good...

Why? It's just PHP stuff.

Obviously not, since that PHP stuff needs data from the database to work. :) Buying milk at the grocery store is more convenient than keeping and milking cows at home not because the milking process is more time consuming than grabbing a bottle from the fridge, but because maintaining the cow is a huge effort and milk is only available from the cow under certain conditions. Or, um, something like that.

...

Our bottleneck is the database server. Fetching stuff from CUR and converting it into HTML is not an issue. 20 pass parser? Add another zero. Until I see evidence that this has any impact on performance, I don't care. Turn off link checking and all pages are rendered lightning fast.

And that would be a pretty piss-poor wiki, wouldn't it? :)

...

What would be useful is to maintain a persistent (over several sessions) index of all existing and non existing pages in memory for the link checking. A file on a ramdisk maybe? I think it would be worth giving it a try at least, and not a lot of work.

Sure, it _might_ help. Code it up and see!

...

...

The page saving code is rather inefficient, particularly with how it

deals with the link tables...

I doubt that a *relatively* rare activity like that makes much of an impact, but I'll be happy to be proven wrong.

Slow saving impacts everyone who tries to edit articles; four edits per minute may be _relatively_ rare compared to page views, but we're still running thousands of edits per day and it's a fundamental part of what a wiki is. It's absolutely vital that editing be both swift and bug-free, and if we can reduce the opportunities for saving to get hung up, so much the better.

...

Caching special pages seems like a reasonable approach.

Unfortunately that doesn't really solve the problem any more than replacing the search with a link to Google solves the search problem.

If updating these cached pages is so slow and db-intensive that it takes the 'pedia offline for fifteen-twenty minutes (which it does), then nobody's going to want to update the caches (last updated April 9...) and they become outdated and useless.

It works as a temporary crutch in place of "blank page -- this feature has been disabled, please find a less popular web site to play on", but it's not a solution.

...

Watchlists could definitely be improved, haven't seen a good way to do this yet, though. It could be done on page save, but with a much-watched page, this again would add severe drain, with possibly no overall benefit. Improve the SQL and indexes? Maybe, but I'm no SQL guru.

Which Himalayan mountain do we have to climb to find one? :)

-- brion vibber (brion @ pobox.com)

erik_moeller＠gmx.de

5:21 a.m.

...

Well maybe, but my experience with using online chats like this is:

Everyone sits around for several hours babbling, waiting for the other

folks to show up and complaining about the problems they're having logging in.

By the end, someone has scribbled up a page with a work plan, which

everyone ignores in the future.

During all this time, they _could_ have been doing something

productive instead...

Depends on the moderator. No moderation=unpredictable, bad moderator=bad result, good moderator=possibly good result. Just like in real life meetings.

...

Obviously not, since that PHP stuff needs data from the database to work. :)

Duh. But if our database is so slow that it can't even answer simple SELECTs, we can't do anything useful, cache or no cache. And if it isn't, then we should concentrate on the queries which aren't simple. The linkcache might still be one of those bottlenecks (simply because of the sheer number of queries involved), I haven't checked your latest changes to that code.

...

...
Our bottleneck is the database server. Fetching stuff from CUR and converting it into HTML is not an issue. 20 pass parser? Add another zero. Until I see evidence that this has any impact on performance, I don't care. Turn off link checking and all pages are rendered lightning fast.

...

And that would be a pretty piss-poor wiki, wouldn't it? :)

Yes, but this is really one of the more expensive wiki features that also limits all caching options severely. Impossible to work without it, but apparently hard to implement in a scalable fashion.

...

...
What would be useful is to maintain a persistent (over several sessions) index of all existing and non existing pages in memory for the link checking. A file on a ramdisk maybe? I think it would be worth giving it a try at least, and not a lot of work.

...

Sure, it _might_ help. Code it up and see!

I might. I'll have to see if it makes any difference on the relatively small de database which I'm currently using locally. It would have to be optional -- setting up the software is already difficult enough.

...

Slow saving impacts everyone who tries to edit articles; four edits per minute may be _relatively_ rare compared to page views, but we're still running thousands of edits per day and it's a fundamental part of what a wiki is. It's absolutely vital that editing be both swift and bug-free, and if we can reduce the opportunities for saving to get hung up, so much the better.

Yeah yeah yeah. I still think we should care more about the real showstoppers. But hey, you can always _code it_. (Finally an opportunity to strike back ;-)

...

If updating these cached pages is so slow and db-intensive that it takes the 'pedia offline for fifteen-twenty minutes (which it does), then nobody's going to want to update the caches (last updated April 9...) and they become outdated and useless.

If this downtime is unacceptable, we might indeed have to think about a query only server with somewhat delayed data availability. This could be a replacement for the sysops, too. Mirroring the Wikipedia database files (raw) should be no issue with a SCSI system, or a low priority copy process.

...

...
Watchlists could definitely be improved, haven't seen a good way to do this yet, though. It could be done on page save, but with a much-watched page, this again would add severe drain, with possibly no overall benefit. Improve the SQL and indexes? Maybe, but I'm no SQL guru.

...

Which Himalayan mountain do we have to climb to find one? :)

Maybe we should stop looking in the Himalayan mountains and start searching the lowlands .. In other words: Don't search those who will do it for society or for the glory. Just hand over the cash and be done with it.

Regards,

Erik

Brion Vibber

6:16 a.m.

On Mon, 2003-04-28 at 19:21, Erik Moeller wrote: [on a persistent link-existence table]

...

I might. I'll have to see if it makes any difference on the relatively small de database which I'm currently using locally. It would have to be optional -- setting up the software is already difficult enough.

I don't know whether you've already looked into this, but PHP does seem to have some support for shared memory:

http://www.php.net/manual/en/ref.sem.php or http://www.php.net/manual/en/ref.shmop.php

These seem to require enabling compile-time options for PHP.

It's also possible to create an in-memory-only table in MySQL (type=HEAP), which may be able to bypass other MySQL slownesses (but it may not, I haven't tested it).

...

...
Slow saving impacts everyone who tries to edit articles; four edits per minute may be _relatively_ rare compared to page views, but we're still running thousands of edits per day and it's a fundamental part of what a wiki is. It's absolutely vital that editing be both swift and bug-free, and if we can reduce the opportunities for saving to get hung up, so much the better.

Yeah yeah yeah. I still think we should care more about the real showstoppers. But hey, you can always _code it_. (Finally an opportunity to strike back ;-)

Touché. :) My point is just that we need to keep that critical path clean and smooth -- and working. (I would consider not differentiating live from broken links, or getting frequent failures on page save to be fatal flaws, whereas not having a working search or orphans function is just danged annoying.)

...

If this downtime is unacceptable, we might indeed have to think about a query only server with somewhat delayed data availability. This could be a replacement for the sysops, too. Mirroring the Wikipedia database files (raw) should be no issue with a SCSI system, or a low priority copy process.

Sure, MySQL's database replication can provide for keeping a synched db on another server. (Which, too, could provide for some emergency fail-over in case the main machine croaks.)

The wiki would just need a config option to query the replicated server for certain slow/nonessential operations (search, various special pages, sysop queries) and leave the main db server free to take care of the business of showing and saving pages and logging in users.

However this is all academic until we have reason to believe that a second server will be available to us in the near future.

...

Maybe we should stop looking in the Himalayan mountains and start searching the lowlands .. In other words: Don't search those who will do it for society or for the glory. Just hand over the cash and be done with it.

A lovely idea, but there _isn't_ any cash as of yet, nor a non-profit foundation to formally solicit donations with which to fund programmers. Until this gets done, or unless someone wants to fund people more directly, all we've got is volunteer developers, who are only rarely unemployed database gurus who can spend all day working on Wikipedia. :)

-- brion vibber (brion @ pobox.com)

Guillaume Blanchard

7:10 a.m.

New subject: Accent in mispeelings page

Hi, The "Pages with misspellings" don't care the accent common misspellings we add in "Wikipédia:List of common misspellings". "e", "é", "è" and "ê" are interpret all as same. It's useful in search page, but very bad for misspellings detection. Do "search" & "misspellings" using the same algorithm ? Can we solve it easly ?

Aoineko

Krzysztof Kowalczyk

10:08 a.m.

New subject: Bug in install.php script

Hi,

There's a typo in install.php script. It has: copydirectory( "./stylesheets", $wgStyleSheetsDirectory ); while it should have: copydirectory( "./stylesheets", $wgStyleSheetDirectory );

The end result is that *.css files are copied to "/" instead of "${IP}/style"

Regards,

Krzysztof Kowalczyk

Index: install.php =================================================================== RCS file: /cvsroot/wikipedia/phase3/install.php,v retrieving revision 1.4 diff -u -r1.4 install.php --- install.php 28 Apr 2003 18:14:48 -0000 1.4 +++ install.php 29 Apr 2003 07:03:51 -0000 @@ -53,7 +53,7 @@ copyfile( ".", "texvc.phtml", $IP );

copydirectory( "./includes", $IP ); -copydirectory( "./stylesheets", $wgStyleSheetsDirectory ); +copydirectory( "./stylesheets", $wgStyleSheetDirectory );

copyfile( "./images", "wiki.png", $wgUploadDirectory ); copyfile( "./languages", "Language.php", $IP );

Kurt Jansson

6:20 a.m.

Brion Vibber schrieb:

...

On Mon, 2003-04-28 at 16:46, Erik Moeller wrote:

...
Improve the SQL and indexes? Maybe, but I'm no SQL guru.

Which Himalayan mountain do we have to climb to find one? :)

Wasn't there someone on this list a while ago who has written a book about MySQL? Or am I fantasising about this?

Kurt

Kurt Jansson

4:04 a.m.

Brion Vibber schrieb:

...

Various special pages are so slow they've been disabled. Most of them

could be made much more efficient with better queries and/or by maintaining summary tables. Some remaining ones are also pretty inefficient, like the Watchlist. Someone needs to look into these and make the necessary adjustments to the code.

Could we set the set the length of the watchlist to 50 or something like that per default, and not making it dependent of the length you choose in the preferences for the RecentChanges? (I for example have set it to 150, because otherwise the "show changes since ..." is cut off too early. On the English Wp I'd have to set it even higher.)

...

Jimbo, we really need some news on this front. If parts and/or a whole machine really *is* on order and can be set up in the near future, we need to know that. If it's *not*, then it may be time to pass around the plate and have interested parties make sure one does get ordered, as had begun to be discussed prior to the March 19 announcement.

Even with a second server, and the software and database being faster, how long will it take until this again isn't enough because articles, editors and visitors should be growing exponentially in theory. There will be the foundation, and maybe we'll get some money through it and can buy new hardware, but will it be sufficient? And for how long? And aren't there other important things we could spend the money on, if an other free project or a university would host us for free? But maybe I cherish an illusion here.

Kurt

Brion Vibber

3:54 a.m.

On Mon, 2003-04-28 at 18:04, Kurt Jansson wrote:

...

Could we set the set the length of the watchlist to 50 or something like that per default, and not making it dependent of the length you choose in the preferences for the RecentChanges?

Well, that wouldn't help for performance as the data goes through a temporary table. Basically the DB's grabbing your *entire* watchlist, then only sending the most recent X items to the wiki for formatting in a list.

...

Even with a second server, and the software and database being faster, how long will it take until this again isn't enough because articles, editors and visitors should be growing exponentially in theory. There will be the foundation, and maybe we'll get some money through it and can buy new hardware, but will it be sufficient? And for how long?

How long will the internet be able to deal with all those new users? Won't we run out of IP addresses if IPv6 never rolls out? When will the sun burn out, leaving the earth a lifeless ball of coal?? :) Hopefully, we'll be able to keep up.

...

And aren't there other important things we could spend the money on, if an other free project or a university would host us for free? But maybe I cherish an illusion here.

Do feel free to ask other free projects and universities if they'd be interested in supporting the project...

-- brion vibber (brion @ pobox.com)

Kurt Jansson

6:02 a.m.

Brion Vibber schrieb:

...

On Mon, 2003-04-28 at 18:04, Kurt Jansson wrote:

...
Could we set the set the length of the watchlist to 50 or something like that per default, and not making it dependent of the length you choose in the preferences for the RecentChanges?

Well, that wouldn't help for performance as the data goes through a temporary table. Basically the DB's grabbing your *entire* watchlist, then only sending the most recent X items to the wiki for formatting in a list.

I see. I hadn't thought this through completely. (Maybe a link in the Watchlist for easy removal of articles would help people to keep their watchlist small and tidy.)

...

...
Even with a second server, and the software and database being faster, how long will it take until this again isn't enough because articles, editors and visitors should be growing exponentially in theory. There will be the foundation, and maybe we'll get some money through it and can buy new hardware, but will it be sufficient? And for how long?

How long will the internet be able to deal with all those new users? Won't we run out of IP addresses if IPv6 never rolls out? When will the sun burn out, leaving the earth a lifeless ball of coal?? :) Hopefully, we'll be able to keep up.

Okay, I'll remind you in a year or two :-)

...

Do feel free to ask other free projects and universities if they'd be interested in supporting the project...

I'll do. Could you describe what our requirements are? (Sorry, I'm not very experienced with this server stuff. I'm just trying to install Debian with a friends help.) - I'll go tromping (right word?) in the three universities in Berlin then.

Kurt

Jason Richey

12:32 a.m.

If everyone decides that this is an important thing to do, I am willing to attend. I'd prefer not to, though. I cherish my weekends...

Jason

Erik Moeller wrote:

...

How about organizing a chat this week about the ongoing Wikipedia performance crisis and how to solve it? Talking to people can provide additional motivation for getting things done, and help us organize our priorities. It might also reduce some frustration. If we do this, all the relevant people should be present:

Jimbo

Jason

Brion

Lee

Magnus

...

It might be best to meet on the weekend, so that work does not interfere. My suggestion would be Saturday, 20:00 UTC.

What do you think?

Regards,

Erik _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

-- "Jason C. Richey" jasonr@bomis.com

7924

Age (days ago)

7925

Last active (days ago)

wikitech-l@lists.wikimedia.org

17 comments

9 participants

tags (0)

participants (9)

Brion Vibber
erik_moeller＠gmx.de
Guillaume Blanchard
Jason Richey
John R. Owens
Krzysztof Kowalczyk
Kurt Jansson
Lee Daniel Crocker
Nick Reinking