Background:

max aplimit = max pllimit = 500 (5000 bots)

Server SQL :
  pageset = select * from pages where start='!' limit 5000
  select * from links where id in pageset limit 5000

Since each wiki page has more than 1 link, you need to do about a 50-100 api calls to get all the links in a block. Btw, it also means that it is by far more efficient to set gaplimit = 50 -- 100 because otherwise the server populates and returns 5000 page headers each time, hugely wasting both SQL and network bandwidth.

Links are sorted by pageid, pages - by title. If you need links for the first page in a block, the chances are that you have to iterate through 50% of all other page links first.

Now lets look at your example:

* If you set your gaplimit=100 & pllimit=5000, you get all the links for 100 pages in one call, which is no different than simple-continue.

* If you set "max" to both, and you want 80% of the pages per block, you most likely will have to download 99% of the links - same as downloading everything -- simple-continue.

* If you want a small percentage of pages, like 1 per block, than on average you still have to download 50+% of links. Even in the best case scenario, if you are lucky, you need one additional links block to know that first page has no more links.

Proper way to do the last case it is to use allpages without links, go through them and make a list of 500 page ids you want. Afterwards, download all links with a different query -- pageids=list, not generator.  Assuming 1 needed page per block, you just saved time and bandwidth of 250 blocks with links! A huge huge saving, without even counting how much less the SQL servers had to work. That's 250 queries you didn't have to make.



So you see, no matter how you look at this problem, you either 1) simple-stream the whole result set, or 2) do two separate queries - one to get the list of all titles and select needed, and another call to get their links for them. A much much faster, more efficient, green solution.


Lastly - if we enable simple query by default, the server can do much smarter logic - if gaplimit=max & pllimit=max, reduce gaplimit=pllimit/50. In other words, the server will return only pages it can fill with links, but not much more. This is open for discussion of course, and I haven't finalized how to do this properly.

I hope all this explains it enough. If you have other thoughts, please contact me privately, there is no need to involve the whole list in this.