Hello everyone,

I am trying to use the MediaWiki API to create a dictionary based on categories or lists on Wikipedia. I would like to be able to select a category, or perhaps a list page, and get all members of that list.

I've done some reading of the API, and implemented a prototype. It works a little bit but only when the data is structured just perfectly for my purposes. For example, I can easily get a list of all of the English-language films. I'm using the action=query and list=categorymembers for this. I end up with 500 films at a time, and I can continue as needed to get all 60k or so. This is because there is a category that is tagged to each English-language film's individual page.

On the other hand, if I want to get a list of all National Hockey League (NHL) players, this is a lot more difficult. The category "Category:Lists of National Hockey League players" exists, but it's a category of lists of players. Much of the categorization of Wikipedia turns out to be in lists, not categories. I could write a webscrapper for this but that would probably be very unreliable.

Is there a standardized way to deal with lists and sublists that I might have missed? I don't mind write a bunch of code to recursively crawl sublists and expand them. But I would like to avoid something as not-standard as web scrapping the content because it will be very fragile. 

Thank you for the help,
-mike