It's almost New Year, and time for presents (according to the Russian
traditions).
I hereby present you the API versioning framework. Navigate to
https://gerrit.wikimedia.org/r/#/c/41014/ and get that patch to your local
installation for a test run.
There shouldn't be any visible functionality changes. At least that was the
goal. But now we can easily add a new version module or a submodule
(prop/list/etc) without breaking existing code.
To see it in action -- in ApiMain.php, add this line into $Modules array:
'query2' => 'ApiQuery',
it adds another version of query implemented by the same existing class.
now if you look at http://localhost/api.php you will see
* action=query *
This module is obsolete. See old documentation at
api.php?action=help&modules=query
* action=query2 *
... -- the regular query parameters -- ...
both query and query2 will work identically in this example unless you
change ApiQuery.php.
Let the rotten egg throwing commence!
Once we settle on this, I propose you join me at the mediawiki labs --
https://labsconsole.wikimedia.org/wiki/Nova_Resource:Mediawiki-api
to develop and test the actual changes to the APIv2 interface. To keep it
going full steam ahead, I will accept ideas, criticisms, cookies, code
patches, small unmarked bills and large paychecks, but most importantly -
your time analyzing and commenting this effort.
--Yurik
Greetings from Chile
Just wanted to ask if anyone thinks it's possible to return just a
string (and not an array) from an API extension?
I've been looking at ApiBase.php and ApiResult.php and it seems it's
not, is it?
Thanks in advance.
Numerico
I'm querying the API to get the categories and subcategories for a page, and
I'd like to be able to exclude hidden categories and administration
categories from the result. The queries that I currently use look like
this:
Category Pages:
http://en.wikipedia.org/w/api.php?action=query&titles=Category:$catname&cmti
tle=Category:$catname&list=categorymembers&cmlimit=500&prop=categories&forma
t=php
Non-Category Pages:
http://en.wikipedia.org/w/api.php?action=parse&page=$pagename&prop=text|cate
gories&redirects&format=php
where $catname and $pagename are replaced by the page titles. Is there a
way to either exclude categories that are hidden categories, or that are
subcategories of the "Category:Hidden categories" category?
Thanks,
Robert
Folks,
Can anyone please tell me the most reliable way to request an article page using Media wiki API without fear of getting redirected?
I found that using title to query is not reliable as I have seen titles changing when more articles with same/similar titles are being added. Some of the older titles now lead to generic page stating this title might mean one of the following in the list and a list of article pages links are shown below this message. . I want to avoid getting redirected to this generic page and always stay on the article page as I plan to show this Wikipedia content on my numerous sub-domain home pages.
Is there a page id that I can use which doesn't change at all and is always associate to same article? If so, how can I query the page id using current titles? Any examples and pointers is very much appreciated.
Thanks,
Ravi
Hi,
I am currently working on a project which involves using the wikipedia
content. The expected traffic that our system needs to serve is around 200
qps during peak time.
My question is if using Media WIKI directly is really my option here (I
mean sending get requests to http://en.wikipedia.org/w/api.php ridectly
without any local mirror). Would this traffic be supported or banned?
Also what about the availability of the service and possible latencies?
If it was banned what is the best approach?
Thanks in advance for any answer.
Ewa Szwed
I would like to get some feedback on how best to proceed with the future
API versions. There will be breaking changes and desires to cleanup,
optimize, obsolete things, so we should start thinking about it. I see two
general approaches I would really like to hear your thoughts on -- a global
API version versus each module having its own version. Both seem to have
some pros and cons.
== Per API ==
A global version=42 parameter will be included in all calls to API,
specifying what functionality client is expecting. The number would
increase every so often, like once a month to signify "API changes bucket".
Every module will have this type of code when processing input parameters
and generating reply:
// For this module:
breakingChangeAversion = 15
breakingChangeBversion = 38
...
if requestVersion < breakingChangeAversion
reply as originally implemented
else if requestVersion < breakingChangeBversion
reply as after breaking change A
else
reply as after breaking change B
PROS: simple, allows the whole API to introduce global breaking changes
like new error reporting.
CONS: every module writer has to check the current number at the API
website before hard-coding a specific number into their module. There
might also some synchronicity issues between module authors - making a
change within a short time but long enough for a client writer to hard code
the number while only knowing about one's changes. and assuming no changes
to other modules.
== Per module ==
Each module name is followed by "_###" string. API ? action=Query_2 &
titles=...
Modules stay independent, each client knows just the modules it needs with
their versions.
PROS: keeps things clean and separate. Each version is increased only by
individual module writer due to a breaking change.
CONS: it becomes impossible to make a breaking change in the "core" of the
API, like maybe global parameters, a different system of error reporting,
etc. I am not sure we have any of this, but...?
At the moment, I am still leaning towards the per-module approach as it
seems cleaner.
For the version number itself, I think a simple integer would suffice -
client specifies which interface version it wants, server responds in that
format or returns an error. A complex 2.42.0.15 is not needed here - the
client will know at the time of writing what version it supports. If the
server can't reply with that request, it will fail. Knowing sub-numbers
wouldn't help, but only complicate things.
Lets hope this will be a short and constructive thread of good new ideas :)
(warning: conflict of interest)
My spouse, Leonard Richardson, is working on a new book for O'Reilly:
"RESTful Web APIs, a follow-up to 2007's RESTful Web Services". It'll
come out next year. Those of you who are interested in RESTful web API
design (paging Federico!) might want to sign up to be beta readers.
http://www.crummy.com/2012/12/21/1
Leonard also wants to hear about interesting or odd domain-specific data
standards and hypermedia formats.
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation
Hi everyone, there seem to have been many great changes in the API, so I
decided to take a look at improving my old bots a bit, together with the
rest of the pywiki framework. While looking, a few thoughts and questions
have occured that I hope someone could comment on.
I have been out of the loop for a long time, so do forgive me if I
misunderstand some recent changes and how they are suppose to work, or if
this is a non-issue. Also I appologise if these issues have already been
discussed and/or resolved.
My first idea for this email is "*dumb continue*":
Can we change the continue so that clients are not required to understand
parameters once the first request has been made? This way a user of a
client library can iterate over query results without knowing how to
continue, and the library would not need to understand what to do with each
parameter (iterator scenario):
for datablock in mwapi.Query( { generator=*allpages*, prop=*links|categories
*, otherParams=... } ):
#
# Process the returned data blocks one at a time
#
The way it is done now, Query() method must understand how to do continue
in depth. Which parameters to look at first, which - at second, how to
handle when there are no more links while there are more categories to
enumerate.
Now there is even a high bug potential -- if there are no more links, API
returns just two continues - clcontinue & gapcontinue - which means that if
the client makes the same request with the two additional "continue"
parameters, API will return the same result again, possibly producing
duplicate errors and consuming extra server resources.
*Proposal:*
Query() method from above should be able to take ALL continue values and
append ALL of them to the next query, without knowing anything about them,
and without removing or changing any of the original request parameters.
Query() will do this until server returns a data block with no more
<query-continue> section.
Also, because the "page" objects might be incomplete between different data
blocks, the user might need to know when a complete "page" object is
returned. API should probably introduce an "incomplete" attribute on the
page to indicate that the client should merge it with the page from the
following data blocks with the same ID until there is no more "incomplete"
flag. Page revision number could be used on the client to see if the page
has been changed between calls:
for page in mwapi.QueryCompletePages( { same parameters as example above }
):
# process each page
*API Implementation details:*
In the example above where we have a generator & two properties, the next
continue would be set to the very first item that had any of the properties
incomplete. The properties continue will be as before, except that if there
is no more categories, clcategory is set to some magic value like '|' to
indicate that it is done and no more SQL requests to categories tables are
needed on subsequent calls.
The server should not return the maximum number of pages from the
generator, if properties enumeration have not reached them yet (e.g. if
generatorLimit=max & linksLimit=1 -> will return just the first page with
one link on each return)
*Backwards compatibility:*
This change might impact any client that will use the presence of the
"plcontinue" or "clcontinue" fields as a guide to not use the next
"gapcontinue". The simplest (and long overdue) solution is to add the
"version=" parameter.
While at it, we might want to expand the action=paraminfo to include
meaninful version data. Better yet, make a new "moduleinfo" action that
returns any requested specifics about each module, e.g.:
action=moduleinfo & modules= parse | query | query+allpages & props=
version | params
Thanks! Please let me know what you think.
--Yuri
Old log entry parameters were stored with integer keys. In the new
style, they are stored under keys such as "4::foo", which identifies
that the parameter is "foo" and should be "$4" when passed into
Mediawiki messages.
For core log entry types, we remap either of these to more normal
names. But for non-core types, we just output the given names
directly. This works fine for the old style, and still works for the
new style for most formats except for the part where clients now have
to check for both "4::foo" and "0". But bug 43221 points out that
format=xml winds up generating invalid XML, something like <item
4::foo="value" />.
At the moment, I'm leaning towards the following to fix this:
1. Numeric keys will continue to be output as-is, since there's really
nothing else we can do.
2. Keys like "4::foo" will be output as "foo".
3. Add a hook to allow extensions to override all of the above, like
we already do for core modules.
4. (probably) Patch WMF-deployed extensions that ever generated
old-style log entries to use the hook.
For new-style non-core log entries, and for types provided by
extensions that start taking advantage of the new hook, this will
break backwards compatibility but will result in the cleanest output.
Any comments or alternative suggestions?
--
Brad Jorsch
Software Engineer
Wikimedia Foundation
Hi, in the new rewrite branch, what is the best way to do this?
* Get all links from a set of pages. If some pages are redirects - get
their targets. Only the link ns+titles are needed, no need to check
existence/pageid/etc. All this is one API call:
http://en.wikipedia.org/w/api.php?action=query&prop=links&titles=Archiver|A…
* Get all links and categories from the result of a generator or a list of
titles (from a file). Similar to the above, except for optional generator
and there is no redirect param. Once a bad page is found, I will load its
content and fix it. The links and categories names are needed only as text
string.
Thanks!
P.S. I have started
http://www.mediawiki.org/wiki/Manual:Pywikipediabot/Recipes that should
list all encountered bot scenarios and how various versions of pywiki
should be used to solve it. Please help by adding your bot's core workflow
to that list, and core developers could either suggest better code or alter
pywiki framework to better handle such cases.