2009/6/3 sl contrib sl.contrib@googlemail.com:
Hi Roan,
Would it somehow be possible to build an intermediate solution? E.g. would it be feasible to build a dedicated action=query&prop=allchanges&start=...&end=... that just solved that problem?
For revisions, possibly. It wouldn't include log events, though.
I've had a go a modifying the code for allpages. Basically if this is made conditional: $this->addWhereFld('page_namespace', $params['namespace']);
then all pages can be searched (irrespective of namespace). Has this got a massive impact on efficiency?
Yes, for queries with certain oft-used parameters, this'll harm efficiency a lot.
The maximum number of entries returned is limited anyway, and it shouldn't really matter which namespace they come from. (Of course some things like apfrom no longer work as expected, but for my usecase, it would be ok to be disabled.)
Not only do they no longer work as expected, they also cause inefficiency.
You then introduce new parameters: startid, endid, start, end (for start/end of revid, or start/end of last touched), and amend the query: if (isset ($params['start'])) { $this->addWhere('page_touched>=' . $params['start']); }
Finally you need something like: $this->addOption('ORDER BY', 'page_touched'); and $this->setContinueEnumParameter('start', $this->keyToTitle($row->page_latest));
Since there's no index on page_latest, sorting and paging on it the way you do is inefficient. Especially the ORDER BY page_latest part causes a filesort of the entire page table, which has over 10 million entries on English Wikipedia.
With those changes (and a few conditionals) 'allpages' can produce a list of pages that were touched between two dates, or a set of pages that have new revisions between two revision numbers. Not sure yet whether last touched will work as well as the revision timestamp, but at least from the revision number you could easily update an offline set of wiki pages. Do you think this looks good so far? Should I post the code somewhere so that people can have a look?
This'll probably work (albeit breaking a few things such as apfrom, as you mentioned), but due to the inefficient queries involved, it won't make it into the MediaWiki core.
Roan Kattouw (Catrope)