On Fri, Feb 06, 2009 at 01:08:02PM +0100, Roan Kattouw wrote:
Roan Kattouw schreef:
In r46845 [1], the issue raised in bug 11430 [2] a year and a half ago was finally addressed: when the API was asked to produce huge amounts of data (for instance the content of 500 revisions at 280 KB each), it would run out of memory trying to store and process it. To prevent this from happening, the amount of data the API can return is now limited.
I see a possible bug: if the size of the first result is over the size limit, you'll never be able to get the result at all. For example, create a page larger than $wgAPIMaxResultSize and try to fetch the revision contents.
Yes, the 8 MiB default is unlikely to be hit by a single revision, but what if someone lowers it to 250 KiB and tries to fetch those 280 KB revisions? Or what if someone uploads an image that has a ridiculous amount of metadata?
Something I forgot to mention here is that continue parameters have been added to most modules that didn't have one: iicontinue for prop=imageinfo, rvcontinue for prop=revisions, cicontinue for prop=categoryinfo, incontinue for prop=info and amfrom for meta=allmessages; meta=siteinfo abuses siprop as query-continue parameter.
Looking at r46923, I see a number of bugs, including one or two existing bugs that turned up when I was looking at all these continue parameters.
rvcontinue doesn't work, it seems that $continue is never extracted from $params.
amfrom is broken if ammessages is not sorted in whatever order PHP uses for string comparison. For example, I tried ammessages=aboutsite|aboutpage with a ridiculously low $wgAPIMaxResultSize on a local installation, and it returned aboutsite with amfrom=aboutpage. When I then did ammessages=aboutsite|aboutpage&amfrom=aboutpage, it returned aboutsite and amfrom=aboutpage again.
alfrom should probably be used instead of alcontinue if alunique is specified.
drcontinue must be used instead of drstart in "all" mode.
prop=duplicatefiles may have an issue similar to the allmessages problem mentioned above, it looks like the results aren't sorted by i2.img_name which could cause repeats or skips if database decides to return them in any order other than would be given if i2.img_name were included in the ORDER BY clause.
All of these continues that are based on a count (which includes the existing eloffset, euoffset, and sroffset BTW) are potentially broken. There are two issues:
1. If the query isn't strictly ordered, the database is under no obligation to return the rows in the same order the second time as it did the first. That means that titles=A|B|C&prop=info could return A and B with incontinue=2, but titles=A|B|C&prop=info&incontinue=2 could then return A or B again instead of C if the database orders the rows differently for whatever reason.
2. Even if the results are sorted, something like titles=A|B|C|D|E&prop=info&incontinue=2 could skip C if A or B were deleted between the first and second queries, or repeat B if "A2" were inserted.