Hi:
I can't find any (official?) definition about the nature and practical use between the «Pageview» «Pageview complete» dumps other than a brief description at Dumps[1].
Searching for pageview complete there is not results at:
- Wikitech: https://w.wiki/6EDz - Meta: https://w.wiki/6EDx, except the link to Dumps[2]
Maybe is it because the complete ones are by design related to Wikipedia Administrative Pages Analytics[3]? Or because it included data before the latest pageview definition (2015)? Maybe both?
[1] https://dumps.wikimedia.org/other/analytics/ [2] https://dumps.wikimedia.org/other/pageview_complete/readme.html [3] https://meta.wikimedia.org/wiki/Wikipedia_Administrative_Pages_Analytics
Hi Ismael,
You're right to be confused, we left the work and documentation in a messy state following the departure of team members that worked on this dataset. We have not yet been able to prioritize cleaning it up.
The basic idea was that pageviews_complete was going to be a combined dataset, in a uniform format, as compressed as possible, with everything we retained about pageviews to Wikimedia projects since 2007. Currently we only have data available for download since 2011, and we still have links to the old data and deprecated datasets that pageviews_complete is supposed to replace. As a result, it's very confusing. If you tell me what exactly you're looking for, I can try to direct you. And if you're looking to help improve the documentation, then we very much welcome that, and I can point you in the right direction as well.
On Mon, Jan 16, 2023 at 8:08 AM Ismael Olea ismael@olea.org wrote:
Hi:
I can't find any (official?) definition about the nature and practical use between the «Pageview» «Pageview complete» dumps other than a brief description at Dumps[1].
Searching for pageview complete there is not results at:
- Wikitech: https://w.wiki/6EDz
- Meta: https://w.wiki/6EDx, except the link to Dumps[2]
Maybe is it because the complete ones are by design related to Wikipedia Administrative Pages Analytics[3]? Or because it included data before the latest pageview definition (2015)? Maybe both?
[1] https://dumps.wikimedia.org/other/analytics/ [2] https://dumps.wikimedia.org/other/pageview_complete/readme.html [3] https://meta.wikimedia.org/wiki/Wikipedia_Administrative_Pages_Analytics --
Ismael Olea
http://olea.org/diario/ _______________________________________________ Analytics mailing list -- analytics@lists.wikimedia.org To unsubscribe send an email to analytics-leave@lists.wikimedia.org
On Tue, Jan 17, 2023 at 4:28 PM Dan Andreescu dandreescu@wikimedia.org wrote:
The basic idea was that pageviews_complete was going to be a combined dataset, in a uniform format, as compressed as possible, with everything we retained about pageviews to Wikimedia projects since 2007.
So, the goal is to keep an historic archive of pageview activity, right?
And I can imagine the «pageview» has been designed to be suitable for data processing (API's, etc). Isn't it?
If you tell me what exactly you're looking for, I can try to direct you.
At this point I'm just studying the available metrics panorama for an internal report.
And if you're looking to help improve the documentation, then we very much welcome that, and I can point you in the right direction as well.
I would really like to be able to do it.
Anyhow, thanks a lot for the explanation.
So, the goal is to keep an historic archive of pageview activity, right?
Correct
And I can imagine the «pageview» has been designed to be suitable for data
processing (API's, etc). Isn't it?
Yes, more for automated parsing/compute than for human readability.
And if you're looking to help improve the documentation, then we very much
welcome that, and I can point you in the right direction as well.
I would really like to be able to do it.
The code is here: https://gerrit.wikimedia.org/g/operations/puppet/+/171b5ac85aead1c2da0fc6b05... And the gerrit guide is here https://www.mediawiki.org/wiki/Gerrit.