I am writing a Java program to extract the abstract of the wikipedia page
given the title of the wikipedia page. I have done some research and found
out that the abstract with be in rvsection=0
So for example if I want the abstract of 'Eiffel Tower" wiki page then I am
querying using the api in the following way.
and parse the XML data which we get and take the wikitext in the tag <rev
xml:space="preserve"> which represents the abstract of the wikipedia page.
But this wiki text also contains the infobox data which I do not need. I
would like to know if there is anyway in which I can remove the infobox data
and get only the wikitext related to the page's abstract Or if there is any
alternative method by which I can get the abstract of the page directly.
Looking forward to your help.
Thanks in Advance
As of r84905  (going live to Wikimedia within the hour), the cltype
parameter to list=categorymembers is ignored when clsort=timestamp is
set. This change is needed for performance reasons: long-running
queries were causing bug 28291  on Wikimedia wikis.
This does not break backwards compatibility with any released version
of MediaWiki, because cltype is new in 1.17. Functional
cltype=foo&clsort=timestamp behavior was only live on Wikimedia wikis
for about 2-3 weeks.
Roan Kattouw (Catrope)
Mediawiki-api-announce mailing list
At Wikisource we do a lot of work proofreading in our Page: ns, with the pages being "held
together" by an overarching page in the Index: namespace. Principally the text gets
proofread and the validated in the Page: pages and later transcluded into the main
namespace (our presentation layer). A transclusion can be a part of a page, a whole page,
or a series of pages depending on the work being reproduced, and how we are choosing to
What I would like to do per work is to determine which pages are and which are not
transcluded a yes/no answer, rather than to know to where the page is transcluded. At the
moment, my only means to determine if a page is trancluded is to individually check each
page via WhatLinksHere, and I am looking for an easier way to determine the Y/N answer, so
I am looking for an easy way to dynamically determine this for a series of pages.
Ultimately my ideal outcome is to have a link on the Index: page that could easily show
which pages are and which pages are not transcluded.
I know what I want, however, I no idea to get either the API to work for me, or to work
out how to do it for a series of pages. Hence I am here to ask for some guidance and
Examples based on the work of vol. 32 of the Dictionary of National Biography
* Index:Dictionary of National Biography volume 32.djvu
* Page:Dictionary of National Biography volume 32.djvu/163
<page pageid="554940" ns="0" title="Laski, John (DNB00)" touched="2011-03-14T21:59:25Z" lastrevid="2341119" counter="" length="341" />
* Page:Dictionary of National Biography volume 32.djvu/263
Thanks in advance for any guidance.
My Wikipedia bot used to access Wikipedia by putting parameters into
URLs, like this:
I put some parameters into POST data, though. Firstly, anything that I
think ought to be kept relatively secret (such as login password), and
secondly, anything that might be long (such as edited text or even edit
summaries). Today I converted my bot to use bare URLs and put all the
parameters in POST. Yet, I wonder if I should have an option in my bot
for the user to use URL parameters instead of POST parameters. Bear in
mind that although I'm coding with Wikipedia in mind, my bot is open
source and intended to be used with any Mediawiki project.
I can see that there are avantages to putting parameters into POST
data. Are there any advantages to a bot putting parameters into the
I am trying to make a script to move files from local wikipedia to commons
but is getting an invalid token error. I could not search the archives and
see whether this has been addressed before.
Here is what I am trying to do.
1) Get token using the following api
<page pageid="10307065" ns="14" title="Category:Commons-ml" touched="
2011-03-14T05:01:24Z" lastrevid="49830005" counter="" length="728"
2) Post an image using the following api.
<error code="badtoken" info="Invalid token"/>
I am not sure whats wrong with my token here. Please advice. I dit he
posting using a dot net program after including "multipart/form-data" in the
I have written a Wikipedia bot. It obviously does a fair amount of
work with strings. I'd like to know what are MediaWiki's or
Wikipedia's limits for the size of these strings. For example, I have
found documentation that states that edit summaries are limited to 200
characters. (I'm not sure if that includes a terminating zero or not).
Where can I find the maximum size for:
page content (wikitext and HTML versions)
HTTP response from server
I'm able to list the categories that a page belongs to, and the categories
that a category contains. I'm also able to list the portals that a category
contains, but I can't seem to find a way to list the portals that a page or
category belongs to. Is there a way to do that?
Thanks in advance,
Hi everyone. I'm building a Wikipedia bot.
My bot finds out the timestamp and author of the current version of a
page in one query, like this :
If there is vandalism on the page, it attempts to find out the revid
and name of the author of the previous author of the page, like this :
If there is no other author other than the current author, then it will
not give a result. Note that the previous author may be 1 or 10
revisions ago - it simply means the person who edited it most recently
other than the current author.
Now, is there a way for me to combine these queries into one? That is,
can I get the present author of the page and the previous author in one
call, knowing that the present author may be responsible for 1 or
several revisions, and that there may be no previous author ?
Using the API, what is the best way to determine whether the current
user can edit a specified page? I could use:
to detect an "Action 'edit' is not allowed for the current user"
warning, but this doesn't account for users with edit rights who
attempt to edit a protected page.
In r83410/r83411 I have disabled filtering by size
(faminsize/famaxsize) and by hash (fasha1/fasha1base36) for
list=filearchive because of performance problems. It is possible that
filtering by hash will return at a later point, but filtering by size
has gone for good.
Affected wikis are those that run on version 1.17. This includes
Wikimedia, to which this change will be deployed this week.
Bryan Tong Minh