I am writing a Java program to extract the abstract of the wikipedia page
given the title of the wikipedia page. I have done some research and found
out that the abstract with be in rvsection=0
So for example if I want the abstract of 'Eiffel Tower" wiki page then I am
querying using the api in the following way.
and parse the XML data which we get and take the wikitext in the tag <rev
xml:space="preserve"> which represents the abstract of the wikipedia page.
But this wiki text also contains the infobox data which I do not need. I
would like to know if there is anyway in which I can remove the infobox data
and get only the wikitext related to the page's abstract Or if there is any
alternative method by which I can get the abstract of the page directly.
Looking forward to your help.
Thanks in Advance
I'm trying to fetch list of all images from an article. I'm using this
This query works; however as you can see in the result; first image item is
an .ogg file. Is it a bug in MediaWiki ?
What's the preferred query to fetch list of images from an article ?
Also list of images contains logos of Wiki sites like Wikimedia commons. It
may be expected; but is there a way to remove them ?
As of r86257 , which will be deployed to Wikimedia wikis soon and
will be included in the 1.17 release, sortkeys output by
list=categorymembers and prop=categories are now encoded as
hexadecimal strings, so "FOO" becomes "464f4f".
As previously announced, sortkeys are no longer guaranteed to be
human-readable, and may in fact contain binary data (this will happen
when Wikimedia switches to the UCA/ICU collation). However, outputting
binary data, notably in XML, was problematic , so I decided to use
hexadecimal encoding. This means the sortkey as returned by the API is
now guaranteed to not be human-readable, even if the underlying
collation uses a human-readable format (such as the uppercase
collation currently in use on Wikimedia wikis). However, it will still
sort correctly: if A sorts before B in the binary format, that will
also be the case in the hexadecimal format.
The following things changed:
* The 'sortkey' property in list=categorymembers and prop=categories
is now a hexadecimal string
* In prop=categories , clprop=sortkey will now also output the
'sortkeyprefix' property (human-readable part of the sortkey).
list=categorymembers already provided this through
* The format of cmcontinue has changed from type|pageid|rawsortkey to
type|hexsortkey|pageid . If you did not make any assumptions about the
format of cmcontinue and just passed back whatever you got in
query-continue, this won't affect you
Roan Kattouw (Catrope)
Mediawiki-api-announce mailing list
The documentation for API:Login <http://www.mediawiki.org/wiki/API:Login> is
not elaborate enough and I find it confusing. I don't know whether I am the
only one. This is the statement in particular that I am finding difficult to
CentralAuth SUL Login
Now you have to parse the cookie by looking for the centralauth_ cookies and
adding additional entries for all other wikis that centralauth covers in
The cookies I get when I log in (after I get a success message) is like
this. First, I do not understand why the domains for centralauth cookies is
not having the language code (in this case 'en'). I tried manually changing
it to en.wikipedia.org but that did not help. Second, I am not sure whether
I create new cookies myself and give the names as enwiki_User, enwiki_Token
and enwiki_Session and then copy the domain and values.
"Domain:en.wikipedia.org, Name:enwikiUserID, Value:1668905"
"Domain:en.wikipedia.org, Name:enwikiUserName, Value:Sreejithk2000"
"Domain:.wikipedia.org, Name:centralauth_User, Value:Sreejithk2000"
Can someone elaborate on the next steps on the API portal or reply back to
Quite surprisingly, I pointed my api to commons and all the Domains now
gives the complete domain (commons.wikipedia.org) and I can get an edit
token with these cookies.
When using cmstartsortkey for a category, I get the impression the
result is incorrect.
The way I understand it (and the way pywikipediabot uses it) this
should give the disambiguation pages starting at Da. In reality
however, it gives them starting at D + <some unicode character beyond
z>. Am I misunderstanding the working of this api part or is this a
André Engels, andreengels(a)gmail.com