I do not think
API should support the case you described with gaplimit=1,
because that fundamentally breaks the original API goal of "get data
about
many pages with lots of elements on them in one
request".
Oh? I thought the goal of the API was to provide a machine-usable
interface to MediaWiki so people don't have to screen-scrape the HTML
pages, which alleviates the worry about whether changes to the user
interface are going to break screen-scrapers. I never knew it was all
about *bulk* data access *only*.
Brad, API has a clear goal (at least that was my goal when I wrote it), to
provide access to all the functionality a wiki UI offers. The point here is
how *easy* and at the same time *efficient* API is. The continue in the
past few years has gotten too complex for client libraries to use
efficiently to combine multiple requests. The example you gave seems very
uncommon, and it can easily be solved with making one extra API call to get
a list of articles first - which in this case of O(N) would only make it
O(N+1) -- still O(N). That's why I don't think we should even go into the
"|next" ability - it is very rare it will be used and can be done easily by
another call without generator. See below on iteration point.
But even if we
do find compelling reasons to include that, for the
advanced
scenario "skip subquery and follow on with
the generator" it might make
sense to introduce appendable "|next" value keyword gapcontinue=A|next
How do things decide whether "foocontinue=A|next" is saying "the next
foocontinue after A" or really means "A|next"? For example,
https://en.wiktionary.org/w/api.php?action=query&titles=secundus&pr…
currently returns plcontinue "46486|0|next".
Or are you proposing every module be individually coded to recognize
this "|next"?
Again, unless there are good usage scenarios to keep this, I don't think we
ever need this "|next" feature - it was a "just in case" idea, which
I
doubt we will need.
Ideally all
"continue" values should be joined into a single "query-continue =
magic-value" of no interesting user-passable properties.
So clients can make absolutely no decisions about processing the data
they get back? No thanks.
When you make a SQL query to the server, you don't get to control the
"continue" process. You can stop and make another query with different
initial parameters. Same goes for iterating through a collection - none of
the programming languages offering IEnumerable have stream control
functionality - too complicated without clear benefits. API can be seen as
a stream returning server - with some "continue" parameter. You don't like
result - you do another query. That's how you control it. Documenting the
"continue" properties is a sure way to over-complicate API usage and remove
server's ability to optimize the process in the future, without adding any
significant benefit.
Why not propose adding something like that as an option, instead of
trying to force everyone to do things your way? Say
have a parameter
dumbcontinue=1 that replaces query-continue with
<query-dumb-continue>prop=links|categories&plcontinue=...&clcontinue=...&wlstart=...&allmessages=...</query-dumb-continue>
Entirely compatible.
This might be a good solution. Need community feedback on this.
IMO, if a
client wants to ensure it has complete results for any page
objects in the result, it should just process all of the prop
continuation parameters to completion.
The result set might be huge. It wouldn't be nice to have a 12GB x64 only
client lib requirement :)
Then use a smaller limit on your generator. And don't do this for
prop=revisions&rvprop=content.
My bad, didn't thee the "prop" continuation, thought you meant all of them.
Lastly, lets try keeping sarcasm to the minimal with a technical
discussion. We have Wikipedia talk pages for that.