On Wed, Jan 10, 2018 at 2:19 PM, Mike MacHenry <mike.machenry(a)gmail.com>
wrote:
On the other hand, if I want to get a list of all
National Hockey League
(NHL) players, this is a lot more difficult. The category "Category:Lists
of National Hockey League players" exists, but it's a category of lists of
players. Much of the categorization of Wikipedia turns out to be in lists,
not categories. I could write a webscrapper for this but that would
probably be very unreliable.
There is a Category:National Hockey League players. You'll have to handle
subcategories on your own but that's still a lot less messy than parsing
HTML.
Is there a standardized way to deal with lists and sublists that I might
have missed? I don't mind write a bunch of code to
recursively crawl
sublists and expand them. But I would like to avoid something as
not-standard as web scrapping the content because it will be very fragile.
There is not. You can check if Wikidata has something appropriate (e.g. all
humans with the P3522 (
NHL.com player ID) property), but otherwise you are
on your own. Also, there is no guarantee Wikipedia and Wikidata has the
same data (every Wikipedia article has an item in Wikidata but often the
properties are not fleshed out yet).