I'm trying to get information from Wikipedia dump based on
categorization or template usage. I first query MediaWiki API with
embeddedin or categorymembers query to get a list of articles I'm
interested in. Then I retrieve them from the dump and extract the
information I need. The problem is that sometimes the current titles
retrieved using the API doesn't match with what's in the dump because
the article has been moved, for example.
I think I could use two options to solve the problem:
- Parse the categorization and template usage information from all
articles in the dump and build the list of all articles in given
category and using given templates myself. This might be prone to errors
because of the need of custom parsing.
- Import the dump into the local MediaWiki installation and query the
API locally. But from what I read in the documentation importing the
dump into a database can take an excessive amount of time.
Is there any easier option? Is there a dump of categorization and
template usage kept somewhere? Or perhaps I missed something and this
information can be retrieved from the dump without parsing it?