The difference is very small, but you're right to point it out, I've opened
a task to look into it:
https://phabricator.wikimedia.org/T205457
On Wed, Sep 19, 2018 at 5:10 PM Felix J. Scholz <felixjacobscholz(a)gmail.com>
wrote:
Hey,
I've been looking through the documentation on the pageview api in recent
days, and have a question that I have not been able to come up with a
solution to so far.
Per my understanding, the data accessible through the "aggregated by
project" pageview api [1], when filtered to just query "user" agents,
should return the same results as can be found in the hourly pageview dumps
data [2 / 3].
However, while the data is close, in two of my brief tests (for the data
of October 1, 2015) the values did not match up.
Data from "aggregate" API:
en.wikipedia & excluding spiders [4]: 238.845.634
pt.wikipedia & excluding spiders [5]: 11.390.043
Data from pageview dumps [3]:
en & en.zero & en.m: 238.840.836
pt & pt.zero & pt.m: 11.389.979
As you can see while the values are close, they do not match.
What am I missing here? Am I maybe mistaken in the notion that the two
data sources are providing data from the same source and thus should be
compatible?
Felix
[1]
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
[2]
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageviews
[3]
https://dumps.wikimedia.org/other/pageviews/
[4]
https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/en.wikipedia/…
[5]
https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/pt.wikipedia/…
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics