I am writing a Java program to extract the abstract of the wikipedia page
given the title of the wikipedia page. I have done some research and found
out that the abstract with be in rvsection=0
So for example if I want the abstract of 'Eiffel Tower" wiki page then I am
querying using the api in the following way.
and parse the XML data which we get and take the wikitext in the tag <rev
xml:space="preserve"> which represents the abstract of the wikipedia page.
But this wiki text also contains the infobox data which I do not need. I
would like to know if there is anyway in which I can remove the infobox data
and get only the wikitext related to the page's abstract Or if there is any
alternative method by which I can get the abstract of the page directly.
Looking forward to your help.
Thanks in Advance
A few weeks ago, a few Wikia and Wikimedia people started talking about
rewriting MediaWiki's web API for greater flexibility, usability, and
standardization. Notes from that meeting:
Federico Lucignano is one of the main developers on this effort. He
will be doing API stuff for Wikia for the next 5 months. Wikia wants to
attract motivated app developers and companies using Wikia's products to
use the API. They also want to make the APIs more standards-compliant
(a RESTful interface, using HTTP verbs), but that's a high-level goal.
Mobile-related work is first, driving the direction of some of
Federico's work, but this redesign would improve the whole platform,
including the enterprise.
Wikimedia and Wikia want to work together on this. Wikimedia Foundation
also wants to avoid boxing ourselves into special-purpose, specific
apps. Right now we're in the very early stages and I believe Wikia's
going to put out an RFC -- the initial research we discussed during the
kickoff meeting is starting this week.
Some people have also begun talking about this issue on the bug " Make
MediaWiki more RESTful":
https://bugzilla.wikimedia.org/show_bug.cgi?id=41837 in case you want to
check that out.
Engineering Community Manager
As you may already know, we've had API for retrieving mobile-friendly
page HTML for several months now. Because it is much more
feature-rich and fast, we are finally deprecating the old, ad-hoc API
that provided a JSON-encoded page text for URLs like
The old API was removed from documentation some time ago, and our logging
indicates that the few hits it receives are from our old iPhone app.
Therefore, to avoid maintaining old kludges forever, we decided to
completely deprecate the old API in one month from now. The date the
switch will be flipped on WMF is tentatively set to December 11. The
old app users will soon start receiving a warning urging them to
upgrade. If you are using or planning to use the old API in your code,
please upgrade it to use action=mobileview.
with mobileformat parameter set
Max Semenik ([[User:MaxSem]])
On Thu, Nov 8, 2012 at 10:59 AM, Platonides <platonides(a)gmail.com> wrote:
> Hola Javier,
> Did you look at
> No, there's no variable holding the diff.
> Available variables seem to be $NEWPAGE, $OLDID, $CHANGEDORCREATED,
> $PAGETITLE, $PAGETITLE_URL, $PAGEMINOREDIT, $UNWATCHURL, $PAGEEDITOR,
> $PAGEEDITOR_EMAIL, $PAGEEDITOR_WIKI, $PAGESUMMARY, $WATCHINGUSERNAME,
> $PAGEEDITDATE and $PAGEEDITTIME.
> As to how to do that, I would hook on AbortEmailNotification, and
> perform the same $enotif = new EmailNotification();,
> $enotif->notifyOnPageChange( ... ); but with my own class instead.
> Make that class child of EmailNotification. Override the
> composeCommonMailtext() with your own one, which calls
> parent::composeCommonMailtext() and then appends to $this->body the
> diff (nto straightforward, you will need to recover it from
> $this->oldid). Yes, composeCommonMailtext would need to be changed to
> protected. Seems a fair change. That class is not too well organised.
> Seems a feature we could want to merge upstream, too.
Thank you Platonides. You were very clear. So, I understand that the
real problem is how to get the diff (the specific diffs, not the link to
diffs) from $this within EmailNotification class.
Could be a good for users who want to see what changed directly in the
Nemo is referring to the dumpgenerator.py being broken on MediaWiki
versions above 1.20, and it should not actually affect older MediaWiki
You can safely continue with your grab. :)
On Sat, Nov 10, 2012 at 12:45 PM, Scott Boyd <scottdb56(a)gmail.com> wrote:
> At this link: https://code.google.com/p/wikiteam/issues/detail?id=56 , at
> the bottom, there is an entry by project member nemowiki that states:
> Comment 7 <https://code.google.com/p/wikiteam/issues/detail?id=56#c7>by project member
> nemowiki <https://code.google.com/u/101255742639286016490/>, Today (9
> hours ago)
> Fixed by emijrp in r806 <https://code.google.com/p/wikiteam/source/detail?r=806>. :-)
> *Status:* Fixed
> So does that mean this problem that "It's completely broken" is now fixed?
> I'm running a huge download of 64K+ page titles, and am now using the
> "r806" version of dumpgenerator.py. The first 35K+ page titles were
> downloaded with an older version). Both versions sure seem to be
> downloading MORE than 500 pages per namespace, but I'm not sure, since I
> don't know how you can tell if you are getting them all...
> So is it fixed or not?
> On Fri, Nov 9, 2012 at 4:27 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com>wrote:
>> It's completely broken: https://code.google.com/p/**
>> It will download only a fraction of the wiki, 500 pages at most per
We've created the greatest collection of shared knowledge in history. Help
protect Wikipedia. Donate now: http://donate.wikimedia.org
The specifications for the rewrite should be complete now:
Let me know how/where you prefer to procede and discuss me, I'll follow
As for IRC, I'm Nemo_bis everywhere; we have a #wikiteam channel on
EFNet (not super-useful), I'm on #wikimedia and related on FreeNode etc.
Are you sure that PWB has all possible fallbacks? I thought some of them
were removed at some point. Moreover, by downloading a few thousands
different wikis in the wild I can tell you there are some very tricky one...
It's completely broken:
It will download only a fraction of the wiki, 500 pages at most per
Let me reiterate that
https://code.google.com/p/wikiteam/issues/detail?id=44 is a very urgent
bug and we've seen no work on it in many months. We need an actual
programmer with some knowledge of python to fix it and make the script
work properly; I know there are several on this list (and elsewhere),
please please help. The last time I, as a non-coder, tried to fix a bug,
I made things worse
Only after API is implemented/fixed, I'll be able to re-archive the 4-5
thousands wikis we've recently archived on archive.org
(https://archive.org/details/wikiteam) and possibly many more. Many of
those dumps contain errors and/or are just partial because of the
script's unreliability, and wikis die on a daily basis. (So, quoting
emijrp, there IS a deadline.)
P.s.: Cc'ing some lists out of desperation; sorry for cross-posting.
On 11/08/2012 01:00 PM, Brad Jorsch wrote:
> On Wed, Nov 7, 2012 at 7:20 AM, Javier del Pozo<jdelpozo(a)cnb.csic.es> wrote:
>> I need to customize my mediawiki to send added lines in the mail
>> notification of changes (of a watchlist).
>> Can you help me?
> This really has nothing to do with the API, but you would customize
> the page MediaWiki:enotif_body on your MediaWiki installation.
I am sorry if this message shouldn't be here, but what I need to know is
how to bring the changes made to show them in the page
MediaWiki:enotif_body. Is there any parameter that contains the last
changes usable in enotif_body page?