Hello,
I am writing a Java program to extract the abstract of the wikipedia page
given the title of the wikipedia page. I have done some research and found
out that the abstract with be in rvsection=0
So for example if I want the abstract of 'Eiffel Tower" wiki page then I am
querying using the api in the following way.
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel…
and parse the XML data which we get and take the wikitext in the tag <rev
xml:space="preserve"> which represents the abstract of the wikipedia page.
But this wiki text also contains the infobox data which I do not need. I
would like to know if there is anyway in which I can remove the infobox data
and get only the wikitext related to the page's abstract Or if there is any
alternative method by which I can get the abstract of the page directly.
Looking forward to your help.
Thanks in Advance
Aditya Uppu
Hi Mediawiki-api mailing listers!
I'm trying to get the intro to a list of Wikipedia pages using the
"extracts" property with "exintro=True". This works fine for most sites,
but for a few of them the API returns an empty extract field. See for
example:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Anthem…
When looking at the page "https://en.wikipedia.org/wiki/Anthem" there
definitely seems to be text before the first section, so I think I should
be getting something. Indeed without the "exintro" parameter, I get the
expected return.
Any idea why this occurs?
Best,
Bertel
According to RFC 7231 § 3.1.1.5,[1] a POST request that does not include a
Content-Type header may be interpreted by the server in one of two ways:
1. It may assume application/octet-stream. In this case, PHP and the
Action API will not see the request as having any parameters, and so
will probably serve the auto-generated help page.[2]
2. It may "sniff" the content type. It's likely enough to correctly
guess application/x-www-form-urlencoded in this case, and therefore PHP and
the Action API will see the request as having the intended parameters.
It turns out that HHVM and PHP 7 (at least as used at Wikimedia) differ in
their behaviors: PHP 7 seems to choose option 1, while HHVM chooses option
2.
Thus, clients that have been generating POST requests to Wikimedia wikis'
Action APIs without a Content-Type header will have been receiving expected
results from HHVM but will now start receiving unexpected results as
Wikimedia's migration to PHP 7 proceeds.[3] Affected clients should be
updated to include the Content-Type header in their requests.
See https://phabricator.wikimedia.org/T230526 for some details on this
issue.
[1]: https://tools.ietf.org/html/rfc7231#section-3.1.1.5
[2]: As seen for example at https://www.mediawiki.org/w/api.php.
[3]: See https://phabricator.wikimedia.org/T176370 for progress on the
migration.
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Mediawiki-api-announce mailing list
Mediawiki-api-announce(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce