Mediawiki-api December 2012

mediawiki-api@lists.wikimedia.org

21 participants
15 discussions

Need to extract abstract of a wikipedia page

by aditya srinivas

Hello, I am writing a Java program to extract the abstract of the wikipedia page given the title of the wikipedia page. I have done some research and found out that the abstract with be in rvsection=0 So for example if I want the abstract of 'Eiffel Tower" wiki page then I am querying using the api in the following way. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel… and parse the XML data which we get and take the wikitext in the tag <rev xml:space="preserve"> which represents the abstract of the wikipedia page. But this wiki text also contains the infobox data which I do not need. I would like to know if there is anyway in which I can remove the infobox data and get only the wikitext related to the page's abstract Or if there is any alternative method by which I can get the abstract of the page directly. Looking forward to your help. Thanks in Advance Aditya Uppu

5 months

APIv2 alpha implementation

by Yuri Astrakhan

It's almost New Year, and time for presents (according to the Russian traditions). I hereby present you the API versioning framework. Navigate to https://gerrit.wikimedia.org/r/#/c/41014/ and get that patch to your local installation for a test run. There shouldn't be any visible functionality changes. At least that was the goal. But now we can easily add a new version module or a submodule (prop/list/etc) without breaking existing code. To see it in action -- in ApiMain.php, add this line into $Modules array: 'query2' => 'ApiQuery', it adds another version of query implemented by the same existing class. now if you look at http://localhost/api.php you will see * action=query * This module is obsolete. See old documentation at api.php?action=help&modules=query * action=query2 * ... -- the regular query parameters -- ... both query and query2 will work identically in this example unless you change ApiQuery.php. Let the rotten egg throwing commence! Once we settle on this, I propose you join me at the mediawiki labs -- https://labsconsole.wikimedia.org/wiki/Nova_Resource:Mediawiki-api to develop and test the actual changes to the APIv2 interface. To keep it going full steam ahead, I will accept ideas, criticisms, cookies, code patches, small unmarked bills and large paychecks, but most importantly - your time analyzing and commenting this effort. --Yurik

11 years, 3 months

returning string from api extension

by webmaster＠numerica.cl

Greetings from Chile Just wanted to ask if anyone thinks it's possible to return just a string (and not an array) from an API extension? I've been looking at ApiBase.php and ApiResult.php and it seems it's not, is it? Thanks in advance. Numerico

11 years, 3 months

Excluding hidden categories

by Robert Crowe

I'm querying the API to get the categories and subcategories for a page, and I'd like to be able to exclude hidden categories and administration categories from the result. The queries that I currently use look like this: Category Pages: http://en.wikipedia.org/w/api.php?action=query&titles=Category:$catname&cmti tle=Category:$catname&list=categorymembers&cmlimit=500&prop=categories&forma t=php Non-Category Pages: http://en.wikipedia.org/w/api.php?action=parse&page=$pagename&prop=text|cate gories&redirects&format=php where $catname and $pagename are replaced by the page titles. Is there a way to either exclude categories that are hidden categories, or that are subcategories of the "Category:Hidden categories" category? Thanks, Robert

11 years, 3 months

Most reliable way to request an article page

by NetizenApps

Folks, Can anyone please tell me the most reliable way to request an article page using Media wiki API without fear of getting redirected? I found that using title to query is not reliable as I have seen titles changing when more articles with same/similar titles are being added. Some of the older titles now lead to generic page stating this title might mean one of the following in the list and a list of article pages links are shown below this message. . I want to avoid getting redirected to this generic page and always stay on the article page as I plan to show this Wikipedia content on my numerous sub-domain home pages. Is there a page id that I can use which doesn't change at all and is always associate to same article? If so, how can I query the page id using current titles? Any examples and pointers is very much appreciated. Thanks, Ravi

11 years, 3 months

http://en.wikipedia.org/w/api.php traffic limits

by Ewa Szwed

Hi, I am currently working on a project which involves using the wikipedia content. The expected traffic that our system needs to serve is around 200 qps during peak time. My question is if using Media WIKI directly is really my option here (I mean sending get requests to http://en.wikipedia.org/w/api.php ridectly without any local mirror). Would this traffic be supported or banned? Also what about the availability of the service and possible latencies? If it was banned what is the best approach? Thanks in advance for any answer. Ewa Szwed

11 years, 3 months

API versionning strategy

by Yuri Astrakhan

I would like to get some feedback on how best to proceed with the future API versions. There will be breaking changes and desires to cleanup, optimize, obsolete things, so we should start thinking about it. I see two general approaches I would really like to hear your thoughts on -- a global API version versus each module having its own version. Both seem to have some pros and cons. == Per API == A global version=42 parameter will be included in all calls to API, specifying what functionality client is expecting. The number would increase every so often, like once a month to signify "API changes bucket". Every module will have this type of code when processing input parameters and generating reply: // For this module: breakingChangeAversion = 15 breakingChangeBversion = 38 ... if requestVersion < breakingChangeAversion reply as originally implemented else if requestVersion < breakingChangeBversion reply as after breaking change A else reply as after breaking change B PROS: simple, allows the whole API to introduce global breaking changes like new error reporting. CONS: every module writer has to check the current number at the API website before hard-coding a specific number into their module. There might also some synchronicity issues between module authors - making a change within a short time but long enough for a client writer to hard code the number while only knowing about one's changes. and assuming no changes to other modules. == Per module == Each module name is followed by "_###" string. API ? action=Query_2 & titles=... Modules stay independent, each client knows just the modules it needs with their versions. PROS: keeps things clean and separate. Each version is increased only by individual module writer due to a breaking change. CONS: it becomes impossible to make a breaking change in the "core" of the API, like maybe global parameters, a different system of error reporting, etc. I am not sure we have any of this, but...? At the moment, I am still leaning towards the per-module approach as it seems cleaner. For the version number itself, I think a simple integer would suffice - client specifies which interface version it wants, server responds in that format or returns an error. A complex 2.42.0.15 is not needed here - the client will know at the time of writing what version it supports. If the server can't reply with that request, it will fail. Knowing sub-numbers wouldn't help, but only complicate things. Lets hope this will be a short and constructive thread of good new ideas :)

11 years, 4 months

beta readers wanted for new RESTful Web APIs book

by Sumana Harihareswara

(warning: conflict of interest) My spouse, Leonard Richardson, is working on a new book for O'Reilly: "RESTful Web APIs, a follow-up to 2007's RESTful Web Services". It'll come out next year. Those of you who are interested in RESTful web API design (paging Federico!) might want to sign up to be beta readers. http://www.crummy.com/2012/12/21/1 Leonard also wants to hear about interesting or odd domain-specific data standards and hypermedia formats. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

11 years, 4 months

Migrating to "dumb query-continue"

by Yuri Astrakhan

Hi everyone, there seem to have been many great changes in the API, so I decided to take a look at improving my old bots a bit, together with the rest of the pywiki framework. While looking, a few thoughts and questions have occured that I hope someone could comment on. I have been out of the loop for a long time, so do forgive me if I misunderstand some recent changes and how they are suppose to work, or if this is a non-issue. Also I appologise if these issues have already been discussed and/or resolved. My first idea for this email is "*dumb continue*": Can we change the continue so that clients are not required to understand parameters once the first request has been made? This way a user of a client library can iterate over query results without knowing how to continue, and the library would not need to understand what to do with each parameter (iterator scenario): for datablock in mwapi.Query( { generator=*allpages*, prop=*links|categories *, otherParams=... } ): # # Process the returned data blocks one at a time # The way it is done now, Query() method must understand how to do continue in depth. Which parameters to look at first, which - at second, how to handle when there are no more links while there are more categories to enumerate. Now there is even a high bug potential -- if there are no more links, API returns just two continues - clcontinue & gapcontinue - which means that if the client makes the same request with the two additional "continue" parameters, API will return the same result again, possibly producing duplicate errors and consuming extra server resources. *Proposal:* Query() method from above should be able to take ALL continue values and append ALL of them to the next query, without knowing anything about them, and without removing or changing any of the original request parameters. Query() will do this until server returns a data block with no more <query-continue> section. Also, because the "page" objects might be incomplete between different data blocks, the user might need to know when a complete "page" object is returned. API should probably introduce an "incomplete" attribute on the page to indicate that the client should merge it with the page from the following data blocks with the same ID until there is no more "incomplete" flag. Page revision number could be used on the client to see if the page has been changed between calls: for page in mwapi.QueryCompletePages( { same parameters as example above } ): # process each page *API Implementation details:* In the example above where we have a generator & two properties, the next continue would be set to the very first item that had any of the properties incomplete. The properties continue will be as before, except that if there is no more categories, clcategory is set to some magic value like '|' to indicate that it is done and no more SQL requests to categories tables are needed on subsequent calls. The server should not return the maximum number of pages from the generator, if properties enumeration have not reached them yet (e.g. if generatorLimit=max & linksLimit=1 -> will return just the first page with one link on each return) *Backwards compatibility:* This change might impact any client that will use the presence of the "plcontinue" or "clcontinue" fields as a guide to not use the next "gapcontinue". The simplest (and long overdue) solution is to add the "version=" parameter. While at it, we might want to expand the action=paraminfo to include meaninful version data. Better yet, make a new "moduleinfo" action that returns any requested specifics about each module, e.g.: action=moduleinfo & modules= parse | query | query+allpages & props= version | params Thanks! Please let me know what you think. --Yuri

11 years, 4 months

RFC: Encoding of log entry parameters, possible breaking change

by Brad Jorsch

Old log entry parameters were stored with integer keys. In the new style, they are stored under keys such as "4::foo", which identifies that the parameter is "foo" and should be "$4" when passed into Mediawiki messages. For core log entry types, we remap either of these to more normal names. But for non-core types, we just output the given names directly. This works fine for the old style, and still works for the new style for most formats except for the part where clients now have to check for both "4::foo" and "0". But bug 43221 points out that format=xml winds up generating invalid XML, something like <item 4::foo="value" />. At the moment, I'm leaning towards the following to fix this: 1. Numeric keys will continue to be output as-is, since there's really nothing else we can do. 2. Keys like "4::foo" will be output as "foo". 3. Add a hook to allow extensions to override all of the above, like we already do for core modules. 4. (probably) Patch WMF-deployed extensions that ever generated old-style log entries to use the hook. For new-style non-core log entries, and for types provided by extensions that start taking advantage of the new hook, this will break backwards compatibility but will result in the cleanest output. Any comments or alternative suggestions? -- Brad Jorsch Software Engineer Wikimedia Foundation

11 years, 4 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Mediawiki-api December 2012