I am writing a Java program to extract the abstract of the wikipedia page
given the title of the wikipedia page. I have done some research and found
out that the abstract with be in rvsection=0
So for example if I want the abstract of 'Eiffel Tower" wiki page then I am
querying using the api in the following way.
and parse the XML data which we get and take the wikitext in the tag <rev
xml:space="preserve"> which represents the abstract of the wikipedia page.
But this wiki text also contains the infobox data which I do not need. I
would like to know if there is anyway in which I can remove the infobox data
and get only the wikitext related to the page's abstract Or if there is any
alternative method by which I can get the abstract of the page directly.
Looking forward to your help.
Thanks in Advance
Our wiki (running 1.15.3) has a requirement to be logged in to view any pages other than login and the Main_Page.
By default this extends to the API, but I'm having trouble understanding how to do the logins from an external PHP app w/ fopen. (can't get the cookies set up.)
(We cannot use Snoopy, and the mangmt would prefer to not use cURL.)
I am allowed to allow API calls w/o a login, given that our environment is closed, and not accessible to (or from) the Internet.
Is there a way to leave the required login in place for direct visits to our wiki, but turn it off for API calls?
Alternately, is there an example login script for PHP that uses fopen, instead of cURL or Snoopy that I could modify for our needs?
Thanks in advance,
Is it intentional that pages that don't exist no longer return an error message?
Previously a request for something like:
Would return some friendly xml error that I could parse:
<error code="missingtitle" info="The page you specified doesn't exist" />
Now it just returns an empty page. Is there some new way to detect if
an invalid page has been requested or is this simply a bug?
Sorry for the spam, but the ContentHandler changes especially may affect
you -- if you have any time this weekend or next week to do some
testing, we'd appreciate it. Thanks.
-------- Original Message --------
Subject: Please notice and report big glitches - changes coming
Date: Fri, 12 Oct 2012 17:14:05 -0400
From: Sumana Harihareswara <sumanah(a)wikimedia.org>
Organization: Wikimedia Foundation
To: Coordination of technology deployments across languages/projects
On Monday we start deploying a new version of MediaWiki, 1.21wmf2, to
the sites, starting with mediawiki.org and 2 test wikis
(https://www.mediawiki.org/wiki/MediaWiki_1.21/Roadmap). 1.21wmf2 will
have 3 big new things in it and we need your help to test on the "beta"
test site http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page
now to see if there are any really critical bugs.
1) The new ContentHandler (
https://www.mediawiki.org/wiki/ContentHandler ) might affect handing of
stuff, especially when rendering and editing. I'd suggest we also look
out for issues in template rendering, images and media handling,
localisation, and mobile device access. (merged on Oct 9)
2) High-resolution image support. This work-in-progress will try to
give higher-res images to high-density screens that can support it, like
new Retina displays. More info at
https://gerrit.wikimedia.org/r/#/c/24115/ . One of the bigger risks of
the high res stuff is load-based, since we may see substantial new load
on our image scalers. So *all* image scaling might be impacted. (merged
on Oct 11)
3) "Sites" is a new backend to represent and store information about
sites and site-specific configuration. This code is meant to replace
the current interwiki code, but does not do so just yet. Still, keep an
eye out for site-specific configuration or interwiki issues.
Right now the version of MediaWiki on the beta cluster dates from 9 Oct
and thus has ContentHandler but not the high-res image support or Sites.
So please test on the beta sites now and look out for these issues on
your sites in the weeks ahead.
https://www.mediawiki.org/wiki/Category:MediaWiki_test_plans has some
ideas on how to find errors.
Thanks! With your help we can find bugs early and get them fixed before
they affect lots of readers and editors.
Engineering Community Manager
I'm trying to scrape some data from en.wiki about the outlinks from the
body of articles. However, the API returns article outlinks contained
within templates. While I can write a routine to get a list of all the
templates and identify the article links inside these templates to remove
from the outlinks, this is problematic if a link appears in both the body
and a template. Thus if article X has a link to Y in the body as well as
links to Y an Z in templates, I want to capture Y but not Y & Z.
Ideally, I'd like to either (1) be able to count the number of times an
article links out to another article (if X links to Y twice) and then
iterate this count down for each appearance in a template or (2) count only
the links occurring in the body and not parsing the links in templates.
Thank you in advance for your suggestions!