2012/3/3 Ashish Mukherjee <ashish.mukherjee(a)gmail.com>om>:
Do the dumps give very granular-level data for a Wiki entry?
The XML dumps give the complete text of every page, in the same
wikitext format that you see when you edit it. It also has metadata,
like title, authors, timestamp, namespace etc.
The MediaWiki::DumpFile module also provides some functions that allow
you to analyze page info even if it doesn't necessarily come from a
dump, but these functions are relatively basic. Just see the module's
docs and check whether it has the particular thing that you need.
I used this module quite a lot; you can find the biggest thing i did
with it here:
I haven't maintained it in a long while, but it should still be
functional and you are welcome to recycle the functions and the
regular expressions there.
If there's any particular kind of data that you need, let me know -
maybe i already have code that can extract it.