Hi everyone, there seem to have been many great changes in the API, so I
decided to take a look at improving my old bots a bit, together with the
rest of the pywiki framework. While looking, a few thoughts and questions
have occured that I hope someone could comment on.
I have been out of the loop for a long time, so do forgive me if I
misunderstand some recent changes and how they are suppose to work, or if
this is a non-issue. Also I appologise if these issues have already been
discussed and/or resolved.
My first idea for this email is "*dumb continue*":
Can we change the continue so that clients are not required to understand
parameters once the first request has been made? This way a user of a
client library can iterate over query results without knowing how to
continue, and the library would not need to understand what to do with each
parameter (iterator scenario):
for datablock in mwapi.Query( { generator=*allpages*, prop=*links|categories
*, otherParams=... } ):
#
# Process the returned data blocks one at a time
#
The way it is done now, Query() method must understand how to do continue
in depth. Which parameters to look at first, which - at second, how to
handle when there are no more links while there are more categories to
enumerate.
Now there is even a high bug potential -- if there are no more links, API
returns just two continues - clcontinue & gapcontinue - which means that if
the client makes the same request with the two additional "continue"
parameters, API will return the same result again, possibly producing
duplicate errors and consuming extra server resources.
*Proposal:*
Query() method from above should be able to take ALL continue values and
append ALL of them to the next query, without knowing anything about them,
and without removing or changing any of the original request parameters.
Query() will do this until server returns a data block with no more
<query-continue> section.
Also, because the "page" objects might be incomplete between different data
blocks, the user might need to know when a complete "page" object is
returned. API should probably introduce an "incomplete" attribute on the
page to indicate that the client should merge it with the page from the
following data blocks with the same ID until there is no more "incomplete"
flag. Page revision number could be used on the client to see if the page
has been changed between calls:
for page in mwapi.QueryCompletePages( { same parameters as example above }
):
# process each page
*API Implementation details:*
In the example above where we have a generator & two properties, the next
continue would be set to the very first item that had any of the properties
incomplete. The properties continue will be as before, except that if there
is no more categories, clcategory is set to some magic value like '|' to
indicate that it is done and no more SQL requests to categories tables are
needed on subsequent calls.
The server should not return the maximum number of pages from the
generator, if properties enumeration have not reached them yet (e.g. if
generatorLimit=max & linksLimit=1 -> will return just the first page with
one link on each return)
*Backwards compatibility:*
This change might impact any client that will use the presence of the
"plcontinue" or "clcontinue" fields as a guide to not use the next
"gapcontinue". The simplest (and long overdue) solution is to add the
"version=" parameter.
While at it, we might want to expand the action=paraminfo to include
meaninful version data. Better yet, make a new "moduleinfo" action that
returns any requested specifics about each module, e.g.:
action=moduleinfo & modules= parse | query | query+allpages & props=
version | params
Thanks! Please let me know what you think.
--Yuri