On Tue, Jul 9, 2013 at 10:46 AM, Daniel Mietchen
<daniel.mietchen(a)googlemail.com> wrote:
Hello together,
in the framework of a GLAM project, we are looking for ways to
(1) identify the number of pages in a given category - including via
subcategories - on a given wiki
You can get the list of subcategories of a category with
list=categorymembers&cmtype=subcat. You'd have to make calls to this
for each individual (sub)category you're interested in, and be sure to
detect cycles properly.
You can get the number of pages in a category with prop=categoryinfo.
You can batch this by specifying up to 50 titles per query (500 if
your account has the "apihighlimits" userright).
If you're going to be doing a lot of this, it might be better to
perform queries directly against the database, either by downloading
the database dumps or using Tool Labs.
(2) get the pageview stats for all these pages,
including on aggregate
The raw pageview stat data may also be available on Tool Labs. I see
some data in /shared/viewstats/, but it doesn't seem to be up to date.
--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation