On Dec 3, 2004, at 3:28 AM, Nick Pisarro wrote:
But while this all works great, I did discover a bit
of a "got cha". I
noticed that the number of 'recentchanges' records in my 'wikidb' is a
small fraction of the number in 'cur' and 'old'. It seems that after
every 1-1,000 changes the RC table is "pruned" to hold only the last
week's changes. Hmmmm. I found this disturbing for several reasons: 1)
The Recent Changes page lets you "see" up to 30 days of recent
changes--won't work.
Well, it often will work on a lightly used wiki. On a heavily used
wiki, though, certainly no. But then you'd never see 30 days' worth of
edits anyway on a really busy wiki, since the numeric limit will be hit
long before.
2) As this could be the definitive log of *all*
changes on a wiki, I
would think you should *never* want to prune this.
The recentchanges table is purely an optimization hack. Originally,
Special:Recentchanges pulled data directly from the cur and old tables,
but this was very inefficient: first, at the time we were on MySQL 3.x
which couldn't optimize descending sorts using an index (eg, the
timestamp field). This meant that hits had to pull a bunch of rows from
cur and old and sort them each in a temporary table. More generally,
slogging cur and old rows around meant an extra burden because they
carry the full article text.
Putting edit notifications into a smaller table as well meant that they
could be pulled out much more quickly, with less copying and sorting
and merging of large records.
We later added the inverse_timestamp fields to cur and old to provide
for descending sort optimization for other features (such as
Special:Contributions and page history), and MySQL 4.x I believe can do
a descending sort efficiently, so the table is not as important
anymore. In the next major revision we'll also be splitting the edit
history bits from the page text parts of cur/old, so pulling directly
from the revisions database will be much more efficient than pulling
from cur and old now would be.
We have extended the recentchanges table a little bit, specifically in
regards to its 'epemeral' nature: "bot" edits are specially marked to
be temporarily hidden from view, and we store IP addresses of editing
users temporarily (they are not logged permanently, but having the
information for recent edits assists in tracking down vandalism
problems). None of this is permanent -- the permanent records of
editing is in the cur and old tables.
3) At the very least, the age
to which it is pruned should be a variable in DefaultSetting.php so it
could be adjusted on a wiki by wiki basis.
Perhaps.
-- brion vibber (brion @
pobox.com)