Hi everyone. I'm trying to code a Wikipedia bot. The bot is going to look at past revisions of a page to check whether vandalism was present in past revisions. I can get the unmodified wikitext of a page using:
http://en.wikipedia.org/w/index.php?action=raw&title=User:Richardcavell
But that only works for the most recent revision. Using prop=revisions I can get the revid of a past revision. Then if I try:
http://en.wikipedia.org/w/index.php?action=raw&revids=342742660
It gives me the main page, not the past revision of User:Richardcavell. Doing:
http://en.wikipedia.org/w/index.php?action=raw&revids=342742660&titl...
doesn't work. It still gives me the main page. If I do:
http://en.wikipedia.org/w/api.php?action=query&format=xml&prop=revis...
then I get the past revision. It is surrounded by XML tags, and some of the text is changed. For example, " is replaced by ". Removing the XML tags isn't so difficult, but is prone to error. The second issue is more annoying.
1. Is it possible for me to get the raw wikitext of a past revision, without XML tags?
2. Is it possible for me to get the raw wikitext without the "->" style modifications?
Richard
it looks like your confusing what parameters your using with index.php and api.php
the code your using to get and parse that xml content should take care of un-XMLencoding it. you shouldnt be doing it manually.
mediawiki-api@lists.wikimedia.org