how to get article content and all revision history from wikipedia - MediaWiki-l

31 May 2015

Hello there,

For research purpose I would like to retrieve information, such as article text and all
revisions (revision content, time stamps, usernames), for English articles under certain
categories (including sub-categories) or probably a set of randomly selected articles, but
not necessary for the whole English Wikipedia. 

I tried Export page
(https://en.wikipedia.org/w/index.php?title=Special:Export&action=submit) but it
limits revisions to 1000. And it generates an XML document as output.

I have been reading some information online but still don't have a very clear picture.
I know there are downloadable dumps compressed in XML format, and also it appears the same
content can be downloaded in form of MySQL database as well.

I am familiar with java, and have some experience with MySQL, XML, PHP, and HTML.

My questions are:
What are the better ways for me to get the information I need? Please be specific. 

For example, if I download the data in XML format, do I use MediaWiki (PHP) to retrieve
the information from those XML documents or is there a good java XML parser for wikipedia
to retrieve my desired results?

If the whole content can be downloaded to a MySQL database in my local computer, can I
write a java program with SQL queries to get my desired results from the database, or
MediaWiki is better to retrieve the results from the database?

Thank you,
Ming