2009/6/3 sl contrib <sl.contrib(a)googlemail.com>om>:
Hi Roan,
Would it somehow be possible to build an
intermediate solution? E.g.
would
it be feasible to build a dedicated
action=query&prop=allchanges&start=...&end=...
that just solved that problem?
For revisions, possibly. It wouldn't include
log events, though.
I've had a go a modifying the code for allpages.
Basically if this is made conditional:
$this->addWhereFld('page_namespace', $params['namespace']);
then all pages can be searched (irrespective of namespace). Has this got a
massive impact on efficiency?
Yes, for queries with certain oft-used parameters,
this'll harm
efficiency a lot.
The maximum number of entries returned is
limited anyway, and it shouldn't really matter which namespace they come
from. (Of course some things like apfrom no longer work as expected, but for
my usecase, it would be ok to be disabled.)
Not only do they no longer work as
expected, they also cause inefficiency.
You then introduce new parameters: startid, endid,
start, end (for start/end
of revid, or start/end of last touched), and amend the query:
if (isset ($params['start'])) {
$this->addWhere('page_touched>=' . $params['start']);
}
Finally you need something like:
$this->addOption('ORDER BY', 'page_touched');
and
$this->setContinueEnumParameter('start',
$this->keyToTitle($row->page_latest));
Since there's no index on
page_latest, sorting and paging on it the
way you do is inefficient. Especially the ORDER BY page_latest part
causes a filesort of the entire page table, which has over 10 million
entries on English Wikipedia.
With those changes (and a few conditionals)
'allpages' can produce a list of
pages that were touched between two dates, or a set of pages that have new
revisions between two revision numbers. Not sure yet whether last touched
will work as well as the revision timestamp, but at least from the revision
number you could easily update an offline set of wiki pages.
Do you think this looks good so far? Should I post the code somewhere so
that people can have a look?
This'll probably work (albeit breaking a few
things such as apfrom, as
you mentioned), but due to the inefficient queries involved, it won't
make it into the MediaWiki core.
Roan Kattouw (Catrope)