On Jan 15, 2004, at 01:46, Peter Gervai wrote:
I suspect the problem is that when User checks her
watchlist,
mediawiki gets the list, and
searched through all the articles to see the recent changes. Right?
There are two ways the db may search the watchlist.
It can look through all the items in your watchlist, check their
timestamps etc, then sort the result and take only the recent items.
This is efficient for small watchlists.
Or, it can look through the most recently changed pages to see which
are on your watchlist. If you have a really big watchlist this can be
more efficient, but the rate at which pages are edited on
en.wikipedia.org this only helps if the cutoff is quite short indeed.
I suspect there are 150000+ articles to check against
10-500 entries
on the watchlist.
The longest watchlists are in the range of a couple thousand, iirc. A
few hundred is not atypical for power users. Most people have
relatively few.
I do not know how many watchlist entries are there
(brion?), but I
suspect at least a
magnitude less. Let's say 10000 entries.
Actually, it's about on the same order as the total number of pages.
de.wikipedia has 60419 watchlist entries; en.wikipedia has 263770.
However any given page has at most a couple hundred people watching it.
The most-watched page on de.wikipedia is the Hauptseite with 115
watchers; on en.wikipedia it's Wikipedia:Village pump with 288. [Aside:
Jesus Christ is less popular than World War II at 59 vs 60 watchers,
but He still beats out George W. Bush by two Wikipedians!]
What if the entries have:
watchlist db:
the user id
watched article #
article last changed (date, submitter, comment)
And every time an article gets updated, it updates all the _watchlist_
entries
for itself.
Updating up to a couple hundred rows on page save may or may not be a
worse performance drain than the current system. If it works, it might
be worth it for the faster reads.
Tests I did some time ago were inconclusive about read improvements,
but they may not have been properly indexed for the join to cur to get
the revision data.
Also a slight complication; currently the watchlist only uses a single
entry for each page and its talk page. To add timestamps to sort on,
we'd have to double the number of rows involved.
-- brion vibber (brion @
pobox.com)