Hi Gabriel,
The REST API looks promising - thank you!
Having played around with it a bit, I seem to only be able to get one
revision per request. Is that correct, or am I doing something wrong? My
project requires every revision and its references from a large number of
articles, so that would make a lot of requests. The regular API allows for
multiple revisions per request (only with action=query, though).
Thanks!
Bertel
2016-12-21 17:01 GMT+01:00 Gabriel Wicke <gwicke(a)wikimedia.org>rg>:
Bertel, another option is to use the REST API:
- HTML for a specific revision:
https://en.
wikipedia.org/api/rest_v1/#!/Page_content/getFormatRevision
<https://en.wikipedia.org/api/rest_v1/#!/Page_content/getFormatRevision>
- Within this HTML, references are marked up like this:
https://www.mediawiki.org/wiki/Specs/HTML/1.3.0/Extensions/Cite
<https://www.mediawiki.org/wiki/Specs/HTML/1.3.0/Extensions/Cite>. Any
HTML or XML DOM parser can be used to extract this information.
Hope this helps,
Gabriel
On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen <
geilfeldt(a)gmail.com> wrote:
Hi Brad and Gergo,
Thanks for your responses!
@Brad: Yeah, that was also my impression, but I wasn't sure. Seemed
strange that the example in the official docs would point to a place where
the feature was disabled. Thank you for clearing that up!
@Gergo: I've been looking at action=parse, but as far as I understand it,
it is limited to one revision per API request, which makes it quite slow to
get a bunch of older revisions from a large number of articles.
action=query&prop=revisions&rvprop=content omits the references from the
output (just gives the string "{{reflist}}" after "References").
"mvrefs"
sounds very promising, though! I will definitely check that out - thank you!
Best,
Bertel
2016-12-20 19:51 GMT+01:00 Gergo Tisza <gtisza(a)wikimedia.org>rg>:
On Tue, Dec 20, 2016 at 10:18 AM, Bertel
Teilfeldt Hansen <
geilfeldt(a)gmail.com> wrote:
And is there no way of getting references through
the API?
There is no nice way, but you can always get the HTML (or the parse
tree, depending on whether you want parsed or raw refs) and process it;
references are not hard to extract. For the wikitext version, there is a
python tool:
https://github.com/mediawiki-utilities/python-mwrefs
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api