I do not think API should support the case you described with gaplimit=1, because that fundamentally breaks the original API goal of "get data
about
many pages with lots of elements on them in one request".
Oh? I thought the goal of the API was to provide a machine-usable interface to MediaWiki so people don't have to screen-scrape the HTML pages, which alleviates the worry about whether changes to the user interface are going to break screen-scrapers. I never knew it was all about *bulk* data access *only*.
Brad, API has a clear goal (at least that was my goal when I wrote it), to provide access to all the functionality a wiki UI offers. The point here is how *easy* and at the same time *efficient* API is. The continue in the past few years has gotten too complex for client libraries to use efficiently to combine multiple requests. The example you gave seems very uncommon, and it can easily be solved with making one extra API call to get a list of articles first - which in this case of O(N) would only make it O(N+1) -- still O(N). That's why I don't think we should even go into the "|next" ability - it is very rare it will be used and can be done easily by another call without generator. See below on iteration point.
But even if we do find compelling reasons to include that, for the
advanced
scenario "skip subquery and follow on with the generator" it might make sense to introduce appendable "|next" value keyword gapcontinue=A|next
How do things decide whether "foocontinue=A|next" is saying "the next foocontinue after A" or really means "A|next"? For example,
https://en.wiktionary.org/w/api.php?action=query&titles=secundus&pro... currently returns plcontinue "46486|0|next".
Or are you proposing every module be individually coded to recognize this "|next"?
Again, unless there are good usage scenarios to keep this, I don't think we ever need this "|next" feature - it was a "just in case" idea, which I doubt we will need.
Ideally all "continue" values should be joined into a single "query-continue = magic-value" of no interesting user-passable properties.
So clients can make absolutely no decisions about processing the data they get back? No thanks.
When you make a SQL query to the server, you don't get to control the "continue" process. You can stop and make another query with different initial parameters. Same goes for iterating through a collection - none of the programming languages offering IEnumerable have stream control functionality - too complicated without clear benefits. API can be seen as a stream returning server - with some "continue" parameter. You don't like result - you do another query. That's how you control it. Documenting the "continue" properties is a sure way to over-complicate API usage and remove server's ability to optimize the process in the future, without adding any significant benefit.
Why not propose adding something like that as an option, instead of
trying to force everyone to do things your way? Say have a parameter dumbcontinue=1 that replaces query-continue with
<query-dumb-continue>prop=links|categories&plcontinue=...&clcontinue=...&wlstart=...&allmessages=...</query-dumb-continue>
Entirely compatible.
This might be a good solution. Need community feedback on this.
IMO, if a client wants to ensure it has complete results for any page objects in the result, it should just process all of the prop continuation parameters to completion.
The result set might be huge. It wouldn't be nice to have a 12GB x64 only client lib requirement :)
Then use a smaller limit on your generator. And don't do this for prop=revisions&rvprop=content.
My bad, didn't thee the "prop" continuation, thought you meant all of them. Lastly, lets try keeping sarcasm to the minimal with a technical discussion. We have Wikipedia talk pages for that.