On Wed, Feb 8, 2023 at 8:52 PM Marty Blayney <martyblayney.machi@gmail.com> wrote:
Kia ora Fiona,

[...]

That doesn't address a lot of the issues above around the tool itself being inaccurate, but since using a single category we don't seem to get wild swings anymore. I'm still quite suspicious of the data - there have been a few times when we were expecting different numbers (e.g. when a photo was featured on the main page, but no massive spike was recorded in GLAMorgan).

This type of historical data isn't really possible on GLAMorgan, and the tool really leads people astray by making it sound like the data is more reliable than it is. By my understanding, the tool is doing two things, all live in your browser on the client side. First, a query to the PetScan API finds all the pages across Wikimedia sites using files from the given category. Then, a series of queries to the Wikimedia pageview API for each page for the time range given. What this means is you are actually just calculating the historical page view data for the pages currently using images from the category, not the actual page views that the images saw during the year and month you are querying. If the photo was featured on the main page, but is not currently, it will not know that, or show any of those page views. Conversely, if you are checking a category with an image currently on the main page, even if only for a few hours, it will credit that image with all the main page's views for every month in history.

To check this, try inputting the category that an image on the main page is in right now, but check its page views back in 2016. All of this basic faulty logic would be solved by checking the actual media requests, rather than the pageview API. It would be even better if there was a way to query a category for historical view on its members at a given timestamp in history, but that doesn't seem possible.