Hi everyone, there seem to have been many great changes in the API, so I decided to take a look at improving my old bots a bit, together with the rest of the pywiki framework. While looking, a few thoughts and questions have occured that I hope someone could comment on.
I have been out of the loop for a long time, so do forgive me if I misunderstand some recent changes and how they are suppose to work, or if this is a non-issue. Also I appologise if these issues have already been discussed and/or resolved.
My first idea for this email is "*dumb continue*":
Can we change the continue so that clients are not required to understand parameters once the first request has been made? This way a user of a client library can iterate over query results without knowing how to continue, and the library would not need to understand what to do with each parameter (iterator scenario):
for datablock in mwapi.Query( { generator=*allpages*, prop=*links|categories *, otherParams=... } ): # # Process the returned data blocks one at a time #
The way it is done now, Query() method must understand how to do continue in depth. Which parameters to look at first, which - at second, how to handle when there are no more links while there are more categories to enumerate. Now there is even a high bug potential -- if there are no more links, API returns just two continues - clcontinue & gapcontinue - which means that if the client makes the same request with the two additional "continue" parameters, API will return the same result again, possibly producing duplicate errors and consuming extra server resources.
*Proposal:* Query() method from above should be able to take ALL continue values and append ALL of them to the next query, without knowing anything about them, and without removing or changing any of the original request parameters. Query() will do this until server returns a data block with no more <query-continue> section.
Also, because the "page" objects might be incomplete between different data blocks, the user might need to know when a complete "page" object is returned. API should probably introduce an "incomplete" attribute on the page to indicate that the client should merge it with the page from the following data blocks with the same ID until there is no more "incomplete" flag. Page revision number could be used on the client to see if the page has been changed between calls:
for page in mwapi.QueryCompletePages( { same parameters as example above } ): # process each page
*API Implementation details:* In the example above where we have a generator & two properties, the next continue would be set to the very first item that had any of the properties incomplete. The properties continue will be as before, except that if there is no more categories, clcategory is set to some magic value like '|' to indicate that it is done and no more SQL requests to categories tables are needed on subsequent calls. The server should not return the maximum number of pages from the generator, if properties enumeration have not reached them yet (e.g. if generatorLimit=max & linksLimit=1 -> will return just the first page with one link on each return)
*Backwards compatibility:* This change might impact any client that will use the presence of the "plcontinue" or "clcontinue" fields as a guide to not use the next "gapcontinue". The simplest (and long overdue) solution is to add the "version=" parameter.
While at it, we might want to expand the action=paraminfo to include meaninful version data. Better yet, make a new "moduleinfo" action that returns any requested specifics about each module, e.g.: action=moduleinfo & modules= parse | query | query+allpages & props= version | params
Thanks! Please let me know what you think.
--Yuri