The difference is very small, but you're right to point it out, I've opened a task to look into it: https://phabricator.wikimedia.org/T205457


On Wed, Sep 19, 2018 at 5:10 PM Felix J. Scholz <felixjacobscholz@gmail.com> wrote:
Hey,

I've been looking through the documentation on the pageview api in recent days, and have a question that I have not been able to come up with a solution to so far.

Per my understanding, the data accessible through the "aggregated by project" pageview api [1], when filtered to just query "user" agents, should return the same results as can be found in the hourly pageview dumps data [2 / 3]. 

However, while the data is close, in two of my brief tests (for the data of October 1, 2015) the values did not match up.

Data from "aggregate" API:
en.wikipedia & excluding spiders [4]: 238.845.634
pt.wikipedia & excluding spiders [5]: 11.390.043

Data from pageview dumps [3]:
en & en.zero & en.m: 238.840.836
pt & pt.zero & pt.m: 11.389.979

As you can see while the values are close, they do not match.

What am I missing here? Am I maybe mistaken in the notion that the two data sources are providing data from the same source and thus should be compatible?

Felix

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics