Mediawiki-api October 2012

mediawiki-api@lists.wikimedia.org

7 participants
5 discussions

Need to extract abstract of a wikipedia page

by aditya srinivas

Hello, I am writing a Java program to extract the abstract of the wikipedia page given the title of the wikipedia page. I have done some research and found out that the abstract with be in rvsection=0 So for example if I want the abstract of 'Eiffel Tower" wiki page then I am querying using the api in the following way. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel… and parse the XML data which we get and take the wikitext in the tag <rev xml:space="preserve"> which represents the abstract of the wikipedia page. But this wiki text also contains the infobox data which I do not need. I would like to know if there is anyway in which I can remove the infobox data and get only the wikitext related to the page's abstract Or if there is any alternative method by which I can get the abstract of the page directly. Looking forward to your help. Thanks in Advance Aditya Uppu

5 months

Curious about logins...

by Prunka, Sean

Our wiki (running 1.15.3) has a requirement to be logged in to view any pages other than login and the Main_Page. By default this extends to the API, but I'm having trouble understanding how to do the logins from an external PHP app w/ fopen. (can't get the cookies set up.) (We cannot use Snoopy, and the mangmt would prefer to not use cURL.) I am allowed to allow API calls w/o a login, given that our environment is closed, and not accessible to (or from) the Internet. Is there a way to leave the required login in place for direct visits to our wiki, but turn it off for API calls? Alternately, is there an example login script for PHP that uses fopen, instead of cURL or Snoopy that I could modify for our needs? Thanks in advance, Sean Prunka

11 years, 5 months

Request for pages that don't exist no longer return an error code in xml

by Robert Chin

Is it intentional that pages that don't exist no longer return an error message? Previously a request for something like: http://en.wikipedia.org/w/api.php?action=parse&page=testest&format=xml&redi… Would return some friendly xml error that I could parse: <errors> <error code="missingtitle" info="The page you specified doesn't exist" /> </errors> Now it just returns an empty page. Is there some new way to detect if an invalid page has been requested or is this simply a bug? Thanks, Robert

11 years, 6 months

Fwd: Please notice and report glitches - changes coming

by Sumana Harihareswara

Sorry for the spam, but the ContentHandler changes especially may affect you -- if you have any time this weekend or next week to do some testing, we'd appreciate it. Thanks. -Sumana -------- Original Message -------- Subject: Please notice and report big glitches - changes coming Date: Fri, 12 Oct 2012 17:14:05 -0400 From: Sumana Harihareswara <sumanah(a)wikimedia.org> Organization: Wikimedia Foundation To: Coordination of technology deployments across languages/projects <wikitech-ambassadors(a)lists.wikimedia.org> On Monday we start deploying a new version of MediaWiki, 1.21wmf2, to the sites, starting with mediawiki.org and 2 test wikis (https://www.mediawiki.org/wiki/MediaWiki_1.21/Roadmap). 1.21wmf2 will have 3 big new things in it and we need your help to test on the "beta" test site http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page now to see if there are any really critical bugs. 1) The new ContentHandler ( https://www.mediawiki.org/wiki/ContentHandler ) might affect handing of CSS and JavaScript pages, import/export (including PDF export), and API stuff, especially when rendering and editing. I'd suggest we also look out for issues in template rendering, images and media handling, localisation, and mobile device access. (merged on Oct 9) 2) High-resolution image support. This work-in-progress will try to give higher-res images to high-density screens that can support it, like new Retina displays. More info at https://gerrit.wikimedia.org/r/#/c/24115/ . One of the bigger risks of the high res stuff is load-based, since we may see substantial new load on our image scalers. So *all* image scaling might be impacted. (merged on Oct 11) 3) "Sites" is a new backend to represent and store information about sites and site-specific configuration. This code is meant to replace the current interwiki code, but does not do so just yet. Still, keep an eye out for site-specific configuration or interwiki issues. Right now the version of MediaWiki on the beta cluster dates from 9 Oct and thus has ContentHandler but not the high-res image support or Sites. So please test on the beta sites now and look out for these issues on your sites in the weeks ahead. https://www.mediawiki.org/wiki/Category:MediaWiki_test_plans has some ideas on how to find errors. Thanks! With your help we can find bugs early and get them fixed before they affect lots of readers and editors. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

11 years, 6 months

Ignore template links?

by Brian Keegan

Hi all, I'm trying to scrape some data from en.wiki about the outlinks from the body of articles. However, the API returns article outlinks contained within templates. While I can write a routine to get a list of all the templates and identify the article links inside these templates to remove from the outlinks, this is problematic if a link appears in both the body and a template. Thus if article X has a link to Y in the body as well as links to Y an Z in templates, I want to capture Y but not Y & Z. Ideally, I'd like to either (1) be able to count the number of times an article links out to another article (if X links to Y twice) and then iterate this count down for each appearance in a template or (2) count only the links occurring in the body and not parsing the links in templates. Thank you in advance for your suggestions! Best, Brian

11 years, 6 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Mediawiki-api October 2012