On Mon, Feb 09, 2009 at 02:27:15PM +0100, Roan Kattouw wrote:
In the comment above the default value there's a big warning saying you shouldn't set $wgAPIMaxResultSize to something lower than the maximum revision size. Wiki admins who do that anyway are asking for trouble.
Ok, as long as it's documented that should be good enough.
The metadata thing is also a non-issue, because it's stored in a database field that holds at most 4 MB.
It looks like img_metadata is a MEDIUMBLOB in MySQL, which can hold up to 2**24 bytes (16 MiB) according to MySQL's documentation. OTOH, in PostgreSQL it uses a BYTEA, which will go up to 512 MiB in 8.1 and there is no reason that maximum couldn't be increased in later versions. I don't see any code limiting the size of the stored data on the MediaWiki side of things, either.
Of course, I could be missing something.
rvcontinue doesn't work, it seems that $continue is never extracted from $params.
Fixed in r47036; forgot to update the patch I applied for the extract() crusade I did a while ago.
Looks good to me.
amfrom is broken if ammessages is not sorted in whatever order PHP uses for string comparison.
Fixed in r47036 by checking for inequality (!=) rather than less-than (<), using the fact that the order of message retrieval is pretty much constant (unless of course extensions are installed/removed between two requests).
Now amfrom is broken differently: meta=allmessages&amfrom=aboutpage returns only aboutpage instead of everything starting with aboutpage.
There may be a better way, but one easy way to do it would be like this: $skip = !is_null($params['from']); foreach( $messages_target as $message ) { if($message === $params['from']) $skip = false; if(!$skip) $messages[$message] = wfMsg( $message ); }
alfrom should probably be used instead of alcontinue if alunique is specified.
drcontinue must be used instead of drstart in "all" mode.
Fixed in r47036; forgot to update for the recent paging fixes in alllinks and the revamp of deletedrevs.
Looks good to me.
prop=duplicatefiles may have an issue similar to the allmessages problem mentioned above, it looks like the results aren't sorted by i2.img_name which could cause repeats or skips if database decides to return them in any order other than would be given if i2.img_name were included in the ORDER BY clause.
I'll look into this; sorting by i2.img_name as well is less efficient according to a quick EXPLAIN, but I'm not sure if it's too inefficient (will ask a database guru). If memory serves, this problem shouldn't occur on MySQL databases, though, because they sort stuff by the primary key (img_name) by default IIRC.
According to MySQL's documentation, you're right if InnoDB tables are being used and the query is doing a table scan (or presumably using the primary key for an index scan); if it uses some other index, this may not hold true. If MyISAM is used, I believe it's insertion order, although deletes, OPTIMIZE TABLE, ALTER TABLE ORDER BY, and probably other stuff screw that up.
If MySQL really is sorting by the i2.img_name by default, theoretically making that explicit shouldn't make a difference. But knowing MySQL, it wouldn't surprise me that it does.
All of these continues that are based on a count (which includes the existing eloffset, euoffset, and sroffset BTW) are potentially broken. There are two issues:
If the query isn't strictly ordered, the database is under no obligation to return the rows in the same order the second time as it did the first. That means that titles=A|B|C&prop=info could return A and B with incontinue=2, but titles=A|B|C&prop=info&incontinue=2 could then return A or B again instead of C if the database orders the rows differently for whatever reason.
Even if the results are sorted, something like titles=A|B|C|D|E&prop=info&incontinue=2 could skip C if A or B were deleted between the first and second queries, or repeat B if "A2" were inserted.
The existing UI for listing external links and search results also uses this offset-based approach, which is pretty much why the API does as well. You're right about these two issues, and I'll look into them deeper later (after I've caught up with what happened this weekend). The same thing about MySQL sorting by primary key (IIRC) applies here.
I don't know what else could be done for search results, so I wouldn't worry about that.
prop=categoryinfo should be easy if ORDER BY cat_title doesn't kill MySQL. Then you can do the standard thing there using cat_title as the continue parameter.
prop=extlinks must already be using the el_from index, so hopefully adding el_to to the ORDER BY won't kill MySQL. Then el_to can be the continue parameter.
list=exturlusage might be tough. The ideal thing would be to order by el_index and page_id (with page_id|el_index as the continue parameter), but I'm not at all confident that wouldn't filesort. If it does, I have no idea what we could do.
prop=imageinfo looks somewhat tricky. I might change "continue" to be the concatenation of the image title and the appropriate start value, and change the logic something like this: $titles = array_keys( $pageIds[NS_FILE] ); sort($titles); // Unless we can depend on the order to always be the same $images = RepoGroup::singleton()->findFiles( $titles ); if(is_null($params['continue'])){ $skip = false; } else { $skip = true; list($continue_title, $continue_start) = explode('|', $params['continue']); } foreach ( $titles as $title ) { $start = $params['start']; if($skip) { if($title != $continue_title) continue; $skip = false; $start = $continue_start; }
if(!isset($images[$title])) { // Add the "missing" record, as in lines 143-148 of r47043 // except add a break in the !$fit case continue; }
$img = $images[$title]; // Add the "not missing" record, as in lines 68-134 of r47043 // except use $start instead of $params['start'] }
prop=info could be done by merging $titles and $missing (if necessary), doing an asort, and using similar logic. The continue would be just the title.
For prop=revisions, I wonder if MySQL would filesort if you order by rev_id/page_id (depending on which is being used in the WHERE clause), since I would guess that's the index being used to fetch the rows anyway. If not, that could easily work for the continue parameter in non-enum mode (enum mode is already OK).
Thanks for testing and reporting all this stuff (in just 2 days).
No problem.