Hello,
I'm trying to get information from Wikipedia dump based on categorization or template usage. I first query MediaWiki API with embeddedin or categorymembers query to get a list of articles I'm interested in. Then I retrieve them from the dump and extract the information I need. The problem is that sometimes the current titles retrieved using the API doesn't match with what's in the dump because the article has been moved, for example.
I think I could use two options to solve the problem: - Parse the categorization and template usage information from all articles in the dump and build the list of all articles in given category and using given templates myself. This might be prone to errors because of the need of custom parsing. - Import the dump into the local MediaWiki installation and query the API locally. But from what I read in the documentation importing the dump into a database can take an excessive amount of time.
Is there any easier option? Is there a dump of categorization and template usage kept somewhere? Or perhaps I missed something and this information can be retrieved from the dump without parsing it?
Thanks, Piotr