I am cc-ing analytics@ our public e-mail list where questions like this one
get asked (and archived).
>As far as I know, page views are counted every time someone views a page,
and if someone reloads a page several time, each time would be counted as 1
>page view. Is that correct?>
>I wonder if you could explain the reasoning behind that a bit
Yes, we do not count "distict" pageviews by a user, but pageviews overall.
So if look at barak obama pages 10 times one day those are 10 pageviews.
>If we would want to be able to know if a page got just reloaded by the
same person, we would need to have a unique ID for each user, and this
would have to >be browser connected. I'm guessing that this would be a
strong privacy issue.
Correct. Event with unique IDs we could never identify users, just devices.
As uniqueIDs are really cookies on your browser.
On Thu, Oct 13, 2016 at 6:48 AM, Birgit Müller <birgit.mueller(a)wikimedia.de>
> Hi Nuria,
> we shortly met when you had your team offsite in Berlin :-)
> We're currently running a feedback loop on the PageView Analysis tool by
> MusicAnimal/Community Tech, and in that context a question appeared:
> As far as I know, page views are counted every time someone views a page,
> and if someone reloads a page several time, each time would be counted as 1
> page view. Is that correct?
> I wonder if you could explain the reasoning behind that a bit -
> my first guess was:
> If we would want to be able to know if a page got just reloaded by the
> same person, we would need to have a unique ID for each user, and this
> would have to be browser connected. I'm guessing that this would be a
> strong privacy issue.
> From the technical perspective, I could imagine that this would also be
> very difficult.
> Am I right, or what are the exact reasons behind that?
> Looking forward to your answer, and I hope you're doing well!
> Birgit Müller
> Community Communications Manager
> Software Development and Engineering
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Tel. (030) 219 158 26-0
> Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
> Wissens frei teilhaben kann. Helfen Sie uns dabei!
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
Dear mr Haar,
Thanks for bringing this to our attention.
The report you refer to has been discontinued since August 2015. (see page
The successor based on new definitions, and methodology is at
I forward you message to the WMF Analytics Team who maintain these stats.
From: Haar, Dirk [mailto:Dirk.Haar@partner.commerzbank.com]
Sent: Wednesday, October 12, 2016 11:22
Would you mind to make a change on the OS report page?
I always wondered about values for Linux Mint in comparision to Ubuntu
there, and now found the reason in
Clem's blog entry here
<http://segfault.linuxmint.com/2016/09/addressing-fud/> (see "Wikimedia
What you show as Linux Mint are only those version up to Mint 10.
Current version is 18, and since versio 11, the user agent you (or better
let's say "Wikimedia stats") evaluate
is shown as "Ubuntu". There should at least be a note at this line, other
distros may be concerned, too.
(Btw., that remembers older browser usage statistics, when websites couldn't
deal with Netscape Communicator
so that you'd had to switch the user agent to "Internet Explorer", shifting
the usage values complety.)
I would highly recommend using X-Analytics header for this, and
establishing a "well known" key name(s). X-Analytics gets parsed into
key-value pairs (object field) by our varnish/hadoop infrastructure,
whereas the user agent is basically a semi-free form text string. Also,
constantly have to perform two types of analysis - those that came from the
"backend" and those that were made by the browser.
On Sun, Oct 2, 2016 at 4:28 PM Stas Malyshev <smalyshev(a)wikimedia.org>
> > I'll try to throw in a #TOOL: comment where I can remember using SPARQL,
> > but I'll be bound to forget a few...
> Thanks, though using distinct User-Agent may be easier for analysis,
> since those are stored as separate fields, and doing operations on
> separate field would be much easier than extracting comments from query
> field e.g. when doing Hive data processing.
> Stas Malyshev
> Wikidata mailing list
Dear Analytics mailing list,
I am working, along with Issa Rice (cc'ed) on an analysis of changes to
Wikipedia pageviews since December 2007, when pageview statistics first
started being maintained. To help with our analysis, we collected key
events related to changes to user experience on the site as well as to
statistics availabilty and measurement. We've recorded our findings on this
page in Issa's userspace:
I'd appreciate it if you can highlight:
(a) Factual errors in the material currently in the timeline
(b) Missing events that you think should belong in the timeline, with
regards to the availability of statistics as well as any other events that
affected user experience significantly.
In addition, I had the following question: In the Wikimedia per-article
pageviews API https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_
ranularity_start_end, where do Wikipedia Zero pageviews get recorded? Do
they go under mobile-web, or mobile-app, or neither? https://wikitech.
<https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview> gives some
information on how the underlying pageviews are recorded in the pageviews
dataset, but I wasn't clear on how the pageview REST API processes that
NOTE: We don't intend to move the page to Wikipedia's main space, as we
know it won't meet the notability criterion. The user space was just a
convenient place to store it while taking advantage of MediaWiki's syntax
and Wikipedia's templates.