Hi:
As it is the first time I'm working in Wikimedia analytics I found a case that was weird to me. In some cases I can't get data from the API.
- en.wikivoyage.org - Culturally significant landscapes in Jaén - 2022121700 - API call: https://w.wiki/6DjC
I got the «The date(s) you used are valid, but we either do not have data for those date(s)» message, which looks strange to me because the resource exists as can be checked:
- 2022121600 - API call: https://w.wiki/6DjE
If there is no visit for 2022121700 I would have expected a correct response with value=0.
Is this the expected behavior or I have found a glitch? I found a few other cases, so I prefer to ask here.
Thanks.
If there is no visit for 2022121700 I would have expected a correct response with value=0.
Is this the expected behavior or I have found a glitch? I found a few other cases, so I prefer to ask here.
You're right that this is strange, the behavior *should* be consistent, but it is expected. If you look at a 3-day range https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikivoyage.org/all-access/user/Culturally%20significant%20landscapes%20in%20Ja%C3%A9n/daily/2022121600/2022121800, you see value=0 for the 16th and 18th but no response for the 17th. But that's just part of the data, you're filtering by agent=user. If you look at agent=spider https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikivoyage.org/all-access/spider/Culturally%20significant%20landscapes%20in%20Ja%C3%A9n/daily/2022121600/2022121800 as well, you'll see that there's nonzero data for the 16th and the 18th, but nothing for the 17th. On days where we have no contact whatsoever with a particular page, we don't insert any data https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/refinery/+/refs/heads/master/oozie/cassandra/coord_pageview_per_article_daily.properties#167. Other clients, like the pageviews tool, just fill in gaps like this https://pageviews.wmcloud.org/?project=en.wikivoyage.org&platform=all-access&agent=user&redirects=0&start=2022-12-15&end=2022-12-23&pages=Culturally_significant_landscapes_in_Ja%C3%A9n with 0. The code that responds to this API https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/aqs/+/refs/heads/master/sys/pageviews.js#219 says we store null in the data store (Cassandra) for efficiency, and we map null to 0 when we answer requests. So to summarize:
* if you see a zero, it means there was some activity from some agent type on that day, just not the particular agent type you're looking at * if you see no data, or that error, then there was no activity at all on that day
To create more consistent behavior here, we would have to run additional queries to check whether there's a real problem or just this scenario. And that's inefficient (we don't have tons of resources to work with). Hope this helps :)
On Tue, Jan 17, 2023 at 4:51 PM Dan Andreescu dandreescu@wikimedia.org wrote:
- if you see a zero, it means there was some activity from some agent type
on that day, just not the particular agent type you're looking at
- if you see no data, or that error, then there was no activity at all on
that day
Understood.
To create more consistent behavior here, we would have to run additional queries to check whether there's a real problem or just this scenario. And that's inefficient (we don't have tons of resources to work with). Hope this helps :)
Excellent answer. Thanks!