Wikitech-l November 2002

wikitech-l@lists.wikimedia.org

46 participants
134 discussions

Long term plans for scalability
by Nick Hill 26 Nov '02

26 Nov '02

I believe Wikipedia is being held back in terms of how many people can use it and how it can grow, through architectural constraints. The current architecture of one machine taking the entire burden of all searches, updates and web page delivery inherently limits the rate at which Wikipedia can grow. In order for Wikipedia to grow, it needs an architecture which can easily devolve work to other servers. A main database is still required to enforce administrative policy and maintain database consistency. Work to improve the speed of the database and reduce lag will, in the long run, only be of very limited benefit and, perhaps, reduce the amount of lag users experience for a few days or weeks. A method of easily implementing mirror servers with live, real-time updates is required. Each mirror server should cater for all the functionality users expect from Wikipedia except for taking care of form submissions of updates, which should be forwarded to the master wiki server. The main database server should be released from the burden of serving web pages and concentrate on running administrative code, processing and posting database updates. The update system can be achieved by either: 1) the main server creating SQL files of incremental changes to be emailed to mirror servers, signed with a key pair, sequentially numbered to ensure they are automatically processed in order this way, the server can run asynchronously with the mirrors which is better for reliability of the server. The server will not need to wait for connection responses from the mirror and updates will be cached in the mail system in the event that the mirror server be unavailable. (The main server will then only need to create one email per update. The mail system infrastructure will take care of sending the data to each mirror. In fact, a system such as pipermail used on this list would solve the problem wonderfully. Mirror admins simply subscribe to the list to get all updates sent to their machine and can manually download updates they are missing from the list!) Or 2) by the master server opening a connection directly to the SQL daemon on each remote machine. In which case the server will need to track what the mirrors have and have not received updates and need to wait for time-out on non-operational mirrors)(this system may open exploits on the server via the sql interface).

2 1

Periodic restarts (was: [Wikitech-l] database connections)
by Poor, Edmund W 26 Nov '02

26 Nov '02

The English Wikipedia slowed down again today. >From 15:28 to 16:13 (US Eastern Standard Time) only 10 edits got through. Until we figure out what's causing the slowdown, what kind of First Aid measures can we apply? Would restarting any of the servers periodically help? Suppose it takes X minutes to restart a server, and that the "lag" problem puts Wikipedia out of operation for Y minutes -- where X << Y. Maybe then this would be a good "band-aid" for the problem. Ed Poor

4 3

RE: Periodic restarts (was: [Wikitech-l] database connections)
by Poor, Edmund W 26 Nov '02

26 Nov '02

> > Would restarting any of the servers periodically help? > > > > Suppose it takes X minutes to restart a server, and > > that the "lag" problem puts Wikipedia out of operation > > for Y minutes -- where X << Y. Maybe then this would > > be a good "band-aid" for the problem. > > It takes all of a couple seconds to restart apache; I've set it to do a > "graceful" restart every half hour during US daytime, which more or less > coincides with our peak hours. > > -- brion vibber (brion @ pobox.com) Does restarting apache also untangle the gridlock of database queries? Ed Poor

1 0

French Wikipedia is too slow
by Poor, Edmund W 26 Nov '02

26 Nov '02

Anthere wrote: > It is now more than a week that the french wikipedia > is basically not usuable in the evening. All the usual > evening participants don't participate any more. > *This* is a *problem* to me. Brion, would you please take a look at the French Wikipedia database. Is it tangle up with long-running queries? Is it slowed down by persistent connections? Maybe it just needs a reboot. Jimbo talked recently about setting up a chron job to reboot it once an hour during certain hours of the day. Ed Poor -----Original Message----- From: Anthere [mailto:anthere5@yahoo.com] Sent: Monday, November 25, 2002 4:44 PM To: wikipedia-l(a)wikipedia.org Subject: Re: [Wikipedia-l] Blocked: 217.5.141.103 --- Lars Aronsson <lars(a)aronsson.se> wrote: > Pierre Abbat wrote: > > I just blocked 217.5.141.103. Some sysop on the > German Wiki please do the > > same. The block should be removed in a few hours, > or sooner if he gets the > > message. > > I'm curious, is this how we want Wikipedia to work? > No trial? No > reasons stated? Just "block him now!", really? I > find this > problematic. Am I the only one? Hu, no. You are not. Actually, what I did was to delete these pages... He didnot put many of these, I had no way to know who he was, nor to communicate with him. I did not want him to create endless empty pages, unlikely to be filled before long (architecture stuff, we have no architect around !) I thought maybe he would notice...maybe is that a wrong move, dunno... There's one thing I know though. It is now more than a week that the french wikipedia is basically not usuable in the evening. All the usual evening participants don't participate any more. *This* is a *problem* to me.

1 0

Re: The v-word
by Toby Bartels 25 Nov '02

25 Nov '02

Jonathan Walther wrote: >I doubt anyone >here has the time to look at every single edit that was done to Lirs >pages and see all the tiny changes that were made; the fact that Lirs >page is long makes the job much harder than it otherwise would be. And I said that it was easy to do. But Jon's right -- it can be very tricky. In fact, *Clutch* was the first to link the article on good taste, not Lir: http://www.wikipedia.org/w/wiki.phtml?title=User:Lir&diff=437516&oldid=4375… If Lir linked it anytime before then, then she'd taken it out. However, Lir *did* choose to let it stay in, although she changed the format of the link. I think that it's necessary that ordinary administrators be given the power to block signed in users for vandalism. I'm not arguing for further *authority*, such as to ban people like Helga and Lir that weren't engaged in vandalism as such but eventually proved impossible to work with -- only Jimbo has the authority to ban such people. I mean that logged in users should be bannable for the *same* acts of vandalism that anonymous users can be banned for. All that this requires on the technical end is that administrators be able to see the IP numbers of signed in users. And, yes, we need to clarify what rights users have to their user page. Until the episode with Lir, users were given quite a free rein, up to the point that advertising was explicitly allowed. While I've said that Clutch's edits to [[User:Lir]] were vandalism, I do think that this needs to be clarified. -- Toby

1 0

My vacation, dealing with Lir
by Jimmy Wales 25 Nov '02

25 Nov '02

I will be going with my wife and dauther to visit my parents for the Thanksgiving holidays. We will be leaving here (Florida) on Tuesday morning early, and returning the following Monday. My parents do have broadband access, so I will be able to get online some, but I won't be giving my usual daily attention to the mailing list. I give a greenlight to sysops and developers to continue and extend the ban on Lir to new i.p.'s, usernames, and the like, as necessary. If this person continues to post in violation of the ban, then I'll see what I can do to prevent this vandalism at the source. This includes, if necessary, removing Lir from the mailing list in the event of more smartass defensive remarks. Try to avoid mistakes. :-) And try not to argue about it on the list. I request that we all just take a little break from fussing about Lir, Lir's user page, etc. I just protected Lir's user page, to give a break from people feeling a need to fight about it. If Lir decides not to continue at wikipedia in the future, we can just delete the page permanently, so it really doesn't matter either way what is done with it now.

1 0

database connections
by elian 25 Nov '02

25 Nov '02

Hello, While learning PHP I came across the following warning in the php manual and remembered one frequent error message in wikipedia (maximum number of database connections exceeded...): There are a couple of additional caveats to keep in mind when using persistent connections. One is that when using table locking on a persistent connection, if the script for whatever reason cannot release the lock, then subsequent scripts using the same connection will block indefinitely and may require that you either restart the httpd server or the database server. Another is that when using transactions, a transaction block will also carry over to the next script which uses that connection if script execution ends before the transaction block does. In either case, you can use register_shutdown_function() to register a simple cleanup function to unlock your tables or roll back your transactions. Better yet, avoid the problem entirely by not using persistent connections in scripts which use table locks or transactions (you can still use them elsewhere). Do we actually use persistent connections and is this problem affecting wikipedia? greetings, elian -- Sex is hereditary. If your parents never had it, chances are you won't either.

3 2

Hungarian Wikipedia Software
by Erik Moeller 25 Nov '02

25 Nov '02

Hi, can we upgrade the hu.wikipedia.org to the new software? Zoltan Simon, who has contacted wiki-en a few days ago and with whom I have been corresponding since, has expressed interest in starting the Hungarian Wikipedia (which is currently empty). I think it would make more sense to upgrade the software before telling him how to work on it. Regards, Erik -- FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

1 0

Table types and locking
by Brion Vibber 25 Nov '02

25 Nov '02

A few days back someone suggested that we could switch the tables from MyISAM to InnoDB to help avoid slowdowns where the whole system is waiting on a table lock to be released, without having to rewrite as much of our code as we would to change to a different dbms such as postgres. But, the fulltext indexing we use to drive the search engine only works on MyISAM tables. However... with MySQL not all tables in a database have to be of the same type. There are already separate columns for the fulltext index thanks to the need to strip out markup; why not move them out to a separate table, keyed to the cur_id, and make it MyISAM while the other tables move to InnoDB? It'd only be written to on page edit and deletion, and only read on search and mispeelings, so there'd be less contention on it than there is on the much-abused cur table. And for extra bonus points, the current revisions download dump wouldn't be weighted down by the stripped copy of every article's text. We may have to recompile MySQL to enable InnoDB, but relatively little of our code should have to change to take advantage of it. -- brion vibber (brion @ pobox.com)

2 1

How to log in as admin
by Carl Youngblood 25 Nov '02

25 Nov '02

I just installed wikipedia on my webserver and can't for the life of me figure out how to log in as an admin. I downloaded the latest version from CVS. Thanks, Carl Youngblood

2 2

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l November 2002