[Mediawiki-api] [Mediawiki-api-announce] BREAKING CHANGE: Queries may return fewer results than the limit and still set a query-continue

Brad Jorsch b-jorsch at northwestern.edu
Fri Feb 6 19:34:35 UTC 2009


On Fri, Feb 06, 2009 at 01:08:02PM +0100, Roan Kattouw wrote:
> Roan Kattouw schreef:
> > In r46845 [1], the issue raised in bug 11430 [2] a year and a half ago 
> > was finally addressed: when the API was asked to produce huge amounts of 
> > data (for instance the content of 500 revisions at 280 KB each), it 
> > would run out of memory trying to store and process it. To prevent this 
> > from happening, the amount of data the API can return is now limited. 

I see a possible bug: if the size of the first result is over the size
limit, you'll never be able to get the result at all. For example,
create a page larger than $wgAPIMaxResultSize and try to fetch the
revision contents.

Yes, the 8 MiB default is unlikely to be hit by a single revision, but
what if someone lowers it to 250 KiB and tries to fetch those 280 KB
revisions? Or what if someone uploads an image that has a ridiculous
amount of metadata?

> Something I forgot to mention here is that continue parameters have been 
> added to most modules that didn't have one: iicontinue for 
> prop=imageinfo, rvcontinue for prop=revisions, cicontinue for 
> prop=categoryinfo, incontinue for prop=info and amfrom for 
> meta=allmessages; meta=siteinfo abuses siprop as query-continue parameter.

Looking at r46923, I see a number of bugs, including one or two existing
bugs that turned up when I was looking at all these continue parameters.

rvcontinue doesn't work, it seems that $continue is never extracted from
$params.

amfrom is broken if ammessages is not sorted in whatever order PHP uses
for string comparison. For example, I tried ammessages=aboutsite|aboutpage
with a ridiculously low $wgAPIMaxResultSize on a local installation, and
it returned aboutsite with amfrom=aboutpage. When I then did
ammessages=aboutsite|aboutpage&amfrom=aboutpage, it returned aboutsite
and amfrom=aboutpage again.

alfrom should probably be used instead of alcontinue if alunique is
specified.

drcontinue must be used instead of drstart in "all" mode.

prop=duplicatefiles may have an issue similar to the allmessages problem
mentioned above, it looks like the results aren't sorted by i2.img_name
which could cause repeats or skips if database decides to return them in
any order other than would be given if i2.img_name were included in the
ORDER BY clause.

All of these continues that are based on a count (which includes the
existing eloffset, euoffset, and sroffset BTW) are potentially broken.
There are two issues:

1. If the query isn't strictly ordered, the database is under no
   obligation to return the rows in the same order the second time as it
   did the first. That means that titles=A|B|C&prop=info could return A
   and B with incontinue=2, but titles=A|B|C&prop=info&incontinue=2
   could then return A or B again instead of C if the database orders
   the rows differently for whatever reason.

2. Even if the results are sorted, something like
   titles=A|B|C|D|E&prop=info&incontinue=2 could skip C if A or B were
   deleted between the first and second queries, or repeat B if "A2"
   were inserted.



More information about the Mediawiki-api mailing list