On Fri, Feb 06, 2009 at 01:08:02PM +0100, Roan Kattouw wrote:
Roan Kattouw schreef:
> In r46845 [1], the issue raised in bug 11430 [2] a year and a half ago
> was finally addressed: when the API was asked to produce huge amounts of
> data (for instance the content of 500 revisions at 280 KB each), it
> would run out of memory trying to store and process it. To prevent this
> from happening, the amount of data the API can return is now limited.
I see a possible bug: if the size of the first result is over the size
limit, you'll never be able to get the result at all. For example,
create a page larger than $wgAPIMaxResultSize and try to fetch the
revision contents.
Yes, the 8 MiB default is unlikely to be hit by a single revision, but
what if someone lowers it to 250 KiB and tries to fetch those 280 KB
revisions? Or what if someone uploads an image that has a ridiculous
amount of metadata?
Something I forgot to mention here is that continue
parameters have been
added to most modules that didn't have one: iicontinue for
prop=imageinfo, rvcontinue for prop=revisions, cicontinue for
prop=categoryinfo, incontinue for prop=info and amfrom for
meta=allmessages; meta=siteinfo abuses siprop as query-continue parameter.
Looking at r46923, I see a number of bugs, including one or two existing
bugs that turned up when I was looking at all these continue parameters.
rvcontinue doesn't work, it seems that $continue is never extracted from
$params.
amfrom is broken if ammessages is not sorted in whatever order PHP uses
for string comparison. For example, I tried ammessages=aboutsite|aboutpage
with a ridiculously low $wgAPIMaxResultSize on a local installation, and
it returned aboutsite with amfrom=aboutpage. When I then did
ammessages=aboutsite|aboutpage&amfrom=aboutpage, it returned aboutsite
and amfrom=aboutpage again.
alfrom should probably be used instead of alcontinue if alunique is
specified.
drcontinue must be used instead of drstart in "all" mode.
prop=duplicatefiles may have an issue similar to the allmessages problem
mentioned above, it looks like the results aren't sorted by i2.img_name
which could cause repeats or skips if database decides to return them in
any order other than would be given if i2.img_name were included in the
ORDER BY clause.
All of these continues that are based on a count (which includes the
existing eloffset, euoffset, and sroffset BTW) are potentially broken.
There are two issues:
1. If the query isn't strictly ordered, the database is under no
obligation to return the rows in the same order the second time as it
did the first. That means that titles=A|B|C&prop=info could return A
and B with incontinue=2, but titles=A|B|C&prop=info&incontinue=2
could then return A or B again instead of C if the database orders
the rows differently for whatever reason.
2. Even if the results are sorted, something like
titles=A|B|C|D|E&prop=info&incontinue=2 could skip C if A or B were
deleted between the first and second queries, or repeat B if "A2"
were inserted.