I believe Wikipedia is being held back in terms of how many people can use
it and how it can grow, through architectural constraints.
The current architecture of one machine taking the entire burden of all
searches, updates and web page delivery inherently limits the rate at which
Wikipedia can grow.
In order for Wikipedia to grow, it needs an architecture which can easily
devolve work to other servers. A main database is still required to enforce
administrative policy and maintain database consistency.
Work to improve the speed of the database and reduce lag will, in the long
run, only be of very limited benefit and, perhaps, reduce the amount of lag
users experience for a few days or weeks.
A method of easily implementing mirror servers with live, real-time updates
is required. Each mirror server should cater for all the functionality
users expect from Wikipedia except for taking care of form submissions of
updates, which should be forwarded to the master wiki server.
The main database server should be released from the burden of serving web
pages and concentrate on running administrative code, processing and
posting database updates.
The update system can be achieved by either:
1) the main server creating SQL files of incremental changes to
be emailed to mirror servers, signed with a key pair, sequentially
numbered to ensure they are automatically processed in order this
way, the server can run asynchronously with the mirrors which is
better for reliability of the server. The server will not need to
wait for connection responses from the mirror and updates will be
cached in the mail system in the event that the mirror server be
unavailable. (The main server will then only need to create one
email per update. The mail system infrastructure will take care of
sending the data to each mirror. In fact, a system such as pipermail
used on this list would solve the problem wonderfully. Mirror admins
simply subscribe to the list to get all updates sent to their machine
and can manually download updates they are missing from the list!)
2) by the master server opening a connection directly to the SQL daemon
on each remote machine. In which case the server will need to track what
the mirrors have and have not received updates and need to wait for
time-out on non-operational mirrors)(this system may open exploits on the
server via the sql interface).
The English Wikipedia slowed down again today.
>From 15:28 to 16:13 (US Eastern Standard Time) only 10 edits got through.
Until we figure out what's causing the slowdown, what kind of First Aid measures can we apply?
Would restarting any of the servers periodically help?
Suppose it takes X minutes to restart a server, and that the "lag" problem puts Wikipedia out of operation for Y minutes -- where X << Y. Maybe then this would be a good "band-aid" for the problem.
> > Would restarting any of the servers periodically help?
> > Suppose it takes X minutes to restart a server, and
> > that the "lag" problem puts Wikipedia out of operation
> > for Y minutes -- where X << Y. Maybe then this would
> > be a good "band-aid" for the problem.
> It takes all of a couple seconds to restart apache; I've set it to do a
> "graceful" restart every half hour during US daytime, which more or less
> coincides with our peak hours.
> -- brion vibber (brion @ pobox.com)
Does restarting apache also untangle the gridlock of database queries?
> It is now more than a week that the french wikipedia
> is basically not usuable in the evening. All the usual
> evening participants don't participate any more.
> *This* is a *problem* to me.
Brion, would you please take a look at the French Wikipedia database. Is it tangle up with long-running queries? Is it slowed down by persistent connections?
Maybe it just needs a reboot. Jimbo talked recently about setting up a chron job to reboot it once an hour during certain hours of the day.
From: Anthere [mailto:firstname.lastname@example.org]
Sent: Monday, November 25, 2002 4:44 PM
Subject: Re: [Wikipedia-l] Blocked: 188.8.131.52
--- Lars Aronsson <lars(a)aronsson.se> wrote:
> Pierre Abbat wrote:
> > I just blocked 184.108.40.206. Some sysop on the
> German Wiki please do the
> > same. The block should be removed in a few hours,
> or sooner if he gets the
> > message.
> I'm curious, is this how we want Wikipedia to work?
> No trial? No
> reasons stated? Just "block him now!", really? I
> find this
> problematic. Am I the only one?
Hu, no. You are not.
Actually, what I did was to delete these pages...
He didnot put many of these, I had no way to know who
he was, nor to communicate with him. I did not want
him to create endless empty pages, unlikely to be
filled before long (architecture stuff, we have no
architect around !)
I thought maybe he would notice...maybe is that a
wrong move, dunno...
There's one thing I know though.
It is now more than a week that the french wikipedia
is basically not usuable in the evening. All the usual
evening participants don't participate any more.
*This* is a *problem* to me.
Jonathan Walther wrote:
>I doubt anyone
>here has the time to look at every single edit that was done to Lirs
>pages and see all the tiny changes that were made; the fact that Lirs
>page is long makes the job much harder than it otherwise would be.
And I said that it was easy to do.
But Jon's right -- it can be very tricky.
In fact, *Clutch* was the first to link the article on good taste, not Lir:
If Lir linked it anytime before then, then she'd taken it out.
However, Lir *did* choose to let it stay in,
although she changed the format of the link.
I think that it's necessary that ordinary administrators
be given the power to block signed in users for vandalism.
I'm not arguing for further *authority*,
such as to ban people like Helga and Lir
that weren't engaged in vandalism as such
but eventually proved impossible to work with --
only Jimbo has the authority to ban such people.
I mean that logged in users should be bannable for
the *same* acts of vandalism that anonymous users can be banned for.
All that this requires on the technical end
is that administrators be able to see the IP numbers
of signed in users.
And, yes, we need to clarify what rights users have to their user page.
Until the episode with Lir, users were given quite a free rein,
up to the point that advertising was explicitly allowed.
While I've said that Clutch's edits to [[User:Lir]] were vandalism,
I do think that this needs to be clarified.
I will be going with my wife and dauther to visit my parents for the
Thanksgiving holidays. We will be leaving here (Florida) on Tuesday
morning early, and returning the following Monday. My parents do have
broadband access, so I will be able to get online some, but I won't be
giving my usual daily attention to the mailing list.
I give a greenlight to sysops and developers to continue and extend
the ban on Lir to new i.p.'s, usernames, and the like, as necessary.
If this person continues to post in violation of the ban, then I'll
see what I can do to prevent this vandalism at the source. This
includes, if necessary, removing Lir from the mailing list in the
event of more smartass defensive remarks.
Try to avoid mistakes. :-) And try not to argue about it on the list.
I request that we all just take a little break from fussing about Lir,
Lir's user page, etc. I just protected Lir's user page, to give a
break from people feeling a need to fight about it. If Lir decides
not to continue at wikipedia in the future, we can just delete the
page permanently, so it really doesn't matter either way what is done
with it now.
While learning PHP I came across the following warning in the php manual
and remembered one frequent error message in wikipedia (maximum number of
database connections exceeded...):
There are a couple of additional caveats to keep in mind when using
persistent connections. One is that when using table locking on a
persistent connection, if the script for whatever reason cannot release
the lock, then subsequent scripts using the same connection will block
indefinitely and may require that you either restart the httpd server or
the database server. Another is that when using transactions, a
transaction block will also carry over to the next script which uses that
connection if script execution ends before the transaction block does. In
either case, you can use register_shutdown_function() to register a
simple cleanup function to unlock your tables or roll back your
transactions. Better yet, avoid the problem entirely by not using
persistent connections in scripts which use table locks or transactions
(you can still use them elsewhere).
Do we actually use persistent connections and is this problem affecting
Sex is hereditary. If your parents never had it,
chances are you won't either.
can we upgrade the hu.wikipedia.org to the new software? Zoltan Simon,
who has contacted wiki-en a few days ago and with whom I have been
corresponding since, has expressed interest in starting the Hungarian
Wikipedia (which is currently empty). I think it would make more sense
to upgrade the software before telling him how to work on it.
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
A few days back someone suggested that we could switch the tables from
MyISAM to InnoDB to help avoid slowdowns where the whole system is
waiting on a table lock to be released, without having to rewrite as
much of our code as we would to change to a different dbms such as
postgres. But, the fulltext indexing we use to drive the search engine
only works on MyISAM tables.
However... with MySQL not all tables in a database have to be of the
same type. There are already separate columns for the fulltext index
thanks to the need to strip out markup; why not move them out to a
separate table, keyed to the cur_id, and make it MyISAM while the other
tables move to InnoDB?
It'd only be written to on page edit and deletion, and only read on
search and mispeelings, so there'd be less contention on it than there
is on the much-abused cur table. And for extra bonus points, the current
revisions download dump wouldn't be weighted down by the stripped copy
of every article's text.
We may have to recompile MySQL to enable InnoDB, but relatively little
of our code should have to change to take advantage of it.
-- brion vibber (brion @ pobox.com)