I would appreciate if anyone could give me any advice in the best way to use the API to do the following:
For a particular Wikipedia page: 1. Determine if the page has changed in the last 24 hours. 2. Extract from the page only the sections that have changed in the last 24 hours.
I know the index.php call can return a history (e.g. http://en.wikipedia.org/w/index.php?title=Ray_Bradbury&action=history ) which has links that allow you to get particular diffs. However this is intended for human consumption, and would require some screen-scraping to extract out the information I need.
Is there a way to use the API to get similar information as the index.php?*&action=history command, but instead of getting HTML to get an XML or JSON result that is easier to digest?
Thanks, __ Eamonn
On Sun, Jun 10, 2012 at 06:57:27PM -0700, Eamonn wrote:
Is there a way to use the API to get similar information as the index.php?*&action=history command, but instead of getting HTML to get an XML or JSON result that is easier to digest?
You'll want to look into prop=revisions for that.[1]
But there's not really any way to get just the sections that changed; you can request a diff, or request the full wikitext for the appropriate revisions and process it yourself.
[1]: https://www.mediawiki.org/wiki/API:Properties#revisions_.2F_rv
On 11/06/12 03:57, Eamonn wrote:
I would appreciate if anyone could give me any advice in the best way to use the API to do the following:
For a particular Wikipedia page:
- Determine if the page has changed in the last 24 hours.
- Extract from the page only the sections that have changed in the last
24 hours.
I know the index.php call can return a history (e.g. http://en.wikipedia.org/w/index.php?title=Ray_Bradbury&action=history ) which has links that allow you to get particular diffs. However this is intended for human consumption, and would require some screen-scraping to extract out the information I need.
Is there a way to use the API to get similar information as the index.php?*&action=history command, but instead of getting HTML to get an XML or JSON result that is easier to digest?
Thanks, __ Eamonn
Take a look at list=recentchanges
Although http://www.mediawiki.org/w/index.php?title=Special:RecentChanges&feed=rs... may already contain everything you want.
mediawiki-api@lists.wikimedia.org