If it's not too much trouble, could we do the watchlist without string comparisons? I bet a table like this speed things up:
user_id article_id
------- ----------
284 298
284 1598
284 6503
284 1364
284 3305
Then, we could link the tables, like:
SELECT * FROM cur, watch
WHERE cur_id = watch_article_id AND watch_user_id = 284
(Again, I'm not sure of the syntax: I made up a non-existent table!)
Would this work?
How hard would it be to implement?
How much would it help?
Ed Poor
I'm still working on my long-term redesign of the Wiki software (planned
to be an Apache module written in C, with PostgreSQL as the backend).
I notice the two letter country codes, with a colon, prefix articles in
the wikipedias for other languages.
Are these namespaces, or are they parsed specially by the php to get
redirected to a different URL, which is a totally different wikipedia in
it's own right?
I guess that's another table to create:
CREATE TABLE wikipedias (
language_code byte(2),
url_prefix text
);
INSERT INTO wikipedias VALUES ("eo", "http://eo.wikipedia.org/wiki/");
Of course this table would be accessed only once, when the Apache module
gets initialized. Would make it easier to add web based editing too.
Jonathan
--
Geek House Productions, Ltd.
Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998
Phone: 604-435-1205
Email: djw(a)reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2
Would it not be interesting to slightly change the way the watch list is working ?
That's a very important tool on the en.wiki
1) to follow-up articles one is interested in
It can hardly be done with recent changes now; too many articles modifications everyday.
But in the watch list feature, what is really interesting - imho - is the watch of the most recent articles modified.
2) to a lesser extent to find back some articles one want to go back one day
but this can also be done with the search, or by adding links on personal pages
My watch list is hideously big. Each time I click on watch list, my next move is to click on the *stop loading* button after a small bunch of seconds. Just display the last 3/7 days of watched articles. 99% of the time, that's *all* what is needed. Why should one wait for the whole watch list to be downloaded ? Why should one impose extra useless burden to the server ?
How much of server time does the watch list query take anyway ?
How much of the client time does it take to load 400 kb of data, when only 50k are really needed ?
Why should not a persistent and repetitive human habit be automated ?
I'd like to see at the top of the watch list something like
- display the articles I follow which were modified in the past 3 days
- display the articles I follow which were modified in the past 14 days
- display all the articles I follow
With a default at 3 days for example.
It is not that this is *really* needed by humans, though it would be nice. But could it help performance ?
---------------------------------
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now
I have just made some simple TeX validator and parser.
To make it any useful I need a list of all TeX constructs we support.
Everything that contains some other function will be considered
illegal.
Here is short list of obvious ones (classified by arity):
0 alpha beta gamma theta tau vartheta
0 pi upsilon varpi phi delta kappa
0 rho varphi epsilon lambda varrho chi
0 varepsilon mu sigma psi zeta nu
0 varsigma omega eta
0 Gamma Lambda Sigma Psi Delta Upsilon Omega Theta Pi Phi
1 sqrt
2 frac
Do you have some more complete list ?
Parser should have no problem with a couple thousands,
so don't hesitate.
I believe Wikipedia is being held back in terms of how many people can use
it and how it can grow, through architectural constraints.
The current architecture of one machine taking the entire burden of all
searches, updates and web page delivery inherently limits the rate at which
Wikipedia can grow.
In order for Wikipedia to grow, it needs an architecture which can easily
devolve work to other servers. A main database is still required to enforce
administrative policy and maintain database consistency.
Work to improve the speed of the database and reduce lag will, in the long
run, only be of very limited benefit and, perhaps, reduce the amount of lag
users experience for a few days or weeks.
A method of easily implementing mirror servers with live, real-time updates
is required. Each mirror server should cater for all the functionality
users expect from Wikipedia except for taking care of form submissions of
updates, which should be forwarded to the master wiki server.
The main database server should be released from the burden of serving web
pages and concentrate on running administrative code, processing and
posting database updates.
The update system can be achieved by either:
1) the main server creating SQL files of incremental changes to
be emailed to mirror servers, signed with a key pair, sequentially
numbered to ensure they are automatically processed in order this
way, the server can run asynchronously with the mirrors which is
better for reliability of the server. The server will not need to
wait for connection responses from the mirror and updates will be
cached in the mail system in the event that the mirror server be
unavailable. (The main server will then only need to create one
email per update. The mail system infrastructure will take care of
sending the data to each mirror. In fact, a system such as pipermail
used on this list would solve the problem wonderfully. Mirror admins
simply subscribe to the list to get all updates sent to their machine
and can manually download updates they are missing from the list!)
Or
2) by the master server opening a connection directly to the SQL daemon
on each remote machine. In which case the server will need to track what
the mirrors have and have not received updates and need to wait for
time-out on non-operational mirrors)(this system may open exploits on the
server via the sql interface).
The English Wikipedia slowed down again today.
>From 15:28 to 16:13 (US Eastern Standard Time) only 10 edits got through.
Until we figure out what's causing the slowdown, what kind of First Aid measures can we apply?
Would restarting any of the servers periodically help?
Suppose it takes X minutes to restart a server, and that the "lag" problem puts Wikipedia out of operation for Y minutes -- where X << Y. Maybe then this would be a good "band-aid" for the problem.
Ed Poor
> > Would restarting any of the servers periodically help?
> >
> > Suppose it takes X minutes to restart a server, and
> > that the "lag" problem puts Wikipedia out of operation
> > for Y minutes -- where X << Y. Maybe then this would
> > be a good "band-aid" for the problem.
>
> It takes all of a couple seconds to restart apache; I've set it to do a
> "graceful" restart every half hour during US daytime, which more or less
> coincides with our peak hours.
>
> -- brion vibber (brion @ pobox.com)
Does restarting apache also untangle the gridlock of database queries?
Ed Poor
Anthere wrote:
> It is now more than a week that the french wikipedia
> is basically not usuable in the evening. All the usual
> evening participants don't participate any more.
> *This* is a *problem* to me.
Brion, would you please take a look at the French Wikipedia database. Is it tangle up with long-running queries? Is it slowed down by persistent connections?
Maybe it just needs a reboot. Jimbo talked recently about setting up a chron job to reboot it once an hour during certain hours of the day.
Ed Poor
-----Original Message-----
From: Anthere [mailto:anthere5@yahoo.com]
Sent: Monday, November 25, 2002 4:44 PM
To: wikipedia-l(a)wikipedia.org
Subject: Re: [Wikipedia-l] Blocked: 217.5.141.103
--- Lars Aronsson <lars(a)aronsson.se> wrote:
> Pierre Abbat wrote:
> > I just blocked 217.5.141.103. Some sysop on the
> German Wiki please do the
> > same. The block should be removed in a few hours,
> or sooner if he gets the
> > message.
>
> I'm curious, is this how we want Wikipedia to work?
> No trial? No
> reasons stated? Just "block him now!", really? I
> find this
> problematic. Am I the only one?
Hu, no. You are not.
Actually, what I did was to delete these pages...
He didnot put many of these, I had no way to know who
he was, nor to communicate with him. I did not want
him to create endless empty pages, unlikely to be
filled before long (architecture stuff, we have no
architect around !)
I thought maybe he would notice...maybe is that a
wrong move, dunno...
There's one thing I know though.
It is now more than a week that the french wikipedia
is basically not usuable in the evening. All the usual
evening participants don't participate any more.
*This* is a *problem* to me.
Jonathan Walther wrote:
>I doubt anyone
>here has the time to look at every single edit that was done to Lirs
>pages and see all the tiny changes that were made; the fact that Lirs
>page is long makes the job much harder than it otherwise would be.
And I said that it was easy to do.
But Jon's right -- it can be very tricky.
In fact, *Clutch* was the first to link the article on good taste, not Lir:
http://www.wikipedia.org/w/wiki.phtml?title=User:Lir&diff=437516&oldid=4375…
If Lir linked it anytime before then, then she'd taken it out.
However, Lir *did* choose to let it stay in,
although she changed the format of the link.
I think that it's necessary that ordinary administrators
be given the power to block signed in users for vandalism.
I'm not arguing for further *authority*,
such as to ban people like Helga and Lir
that weren't engaged in vandalism as such
but eventually proved impossible to work with --
only Jimbo has the authority to ban such people.
I mean that logged in users should be bannable for
the *same* acts of vandalism that anonymous users can be banned for.
All that this requires on the technical end
is that administrators be able to see the IP numbers
of signed in users.
And, yes, we need to clarify what rights users have to their user page.
Until the episode with Lir, users were given quite a free rein,
up to the point that advertising was explicitly allowed.
While I've said that Clutch's edits to [[User:Lir]] were vandalism,
I do think that this needs to be clarified.
-- Toby
I will be going with my wife and dauther to visit my parents for the
Thanksgiving holidays. We will be leaving here (Florida) on Tuesday
morning early, and returning the following Monday. My parents do have
broadband access, so I will be able to get online some, but I won't be
giving my usual daily attention to the mailing list.
I give a greenlight to sysops and developers to continue and extend
the ban on Lir to new i.p.'s, usernames, and the like, as necessary.
If this person continues to post in violation of the ban, then I'll
see what I can do to prevent this vandalism at the source. This
includes, if necessary, removing Lir from the mailing list in the
event of more smartass defensive remarks.
Try to avoid mistakes. :-) And try not to argue about it on the list.
I request that we all just take a little break from fussing about Lir,
Lir's user page, etc. I just protected Lir's user page, to give a
break from people feeling a need to fight about it. If Lir decides
not to continue at wikipedia in the future, we can just delete the
page permanently, so it really doesn't matter either way what is done
with it now.