On Wed, Feb 8, 2023 at 8:52 PM Marty Blayney <martyblayney.machi(a)gmail.com>
wrote:
Kia ora Fiona,
[...]
That doesn't address a lot of the issues above around the tool itself
being inaccurate, but since using a single category we don't seem to get
wild swings anymore. I'm still quite suspicious of the data - there have
been a few times when we were expecting different numbers (e.g. when a
photo was featured on the main page, but no massive spike was recorded in
GLAMorgan).
<wren-leave(a)lists.wikimedia.org>
This type of historical data isn't really possible on GLAMorgan, and the
tool really leads people astray by making it sound like the data is more
reliable than it is. By my understanding, the tool is doing two things, all
live in your browser on the client side. First, a query to the PetScan API
finds all the pages across Wikimedia sites using files from the given
category. Then, a series of queries to the Wikimedia pageview API for each
page for the time range given. What this means is you are actually just
calculating the historical page view data for the pages *currently* using
images from the category, not the actual page views that the images saw
during the year and month you are querying. If the photo was featured on
the main page, but is not currently, it will not know that, or show any of
those page views. Conversely, if you are checking a category with an image
currently on the main page, even if only for a few hours, it will credit
that image with all the main page's views for every month in history.
To check this, try inputting the category that an image on the main page is
in right now, but check its page views back in 2016. All of this basic
faulty logic would be solved by checking the actual media requests, rather
than the pageview API. It would be even better if there was a way to query
a category for historical view on its members at a given timestamp in
history, but that doesn't seem possible.