Yes, how big that part is, that is what I would be curious about.

On Fri, Apr 8, 2016 at 11:32 AM Nuria Ruiz <nuria@wikimedia.org> wrote:
>Basically, to capture only people who already have a Wikimedia-cookie, and count those.
Ah, yes, now I get it.

Yes. We have done these calculations and they under report by quite a bit cause you need two visits to wikipedia to have a cookie (cookie is set on your first visit, sent back on the 2nd visit) so as you said you will miss all 1-hit visits in a monthly period, for example. Whether this matters depends on user's browsing patterns, it turns out that 1-hit visits make up quite a significant part of our traffic. 




On Fri, Apr 8, 2016 at 11:22 AM, Denny Vrandečić <vrandecic@gmail.com> wrote:
+Wikimedia Analytics 

Thanks for pointing me to the list, I should have written there in the first place.

Sorry, with "user agent" term, I didn't mean the actual user agent string, but rather what you are trying to express with "unique device" - i.e. the different browsers on a single mobile device. I should have just stayed with your terminology to make it less confusing.

Basically, to capture only people who already have a Wikimedia-cookie, and count those. This would still underreport - as it would miss all that only came once - but not by too much, I'd think. Right now I am more worried about overreporting.

I hope this is a bit clearer.



On Fri, Apr 8, 2016 at 11:16 AM Nuria Ruiz <nuria@wikimedia.org> wrote:
Denny:

Best list to ask these kinds of questions is analytics@ (cc-ed).

>A minor question - could you also count the number of unique recurring user agents per month? I.e. the number of visits that return and have a still valid cookie (e.g. by >marking the cookie after the count).
mmm...Not sure what you mean by "recurring" as you can have thousands of people with the same user agent, right? Think "everyone in Seattle with an iPhone and the latest OS using Safari" . You can add other pieces of info like IP, but in mobile and due to NAT-ing [1] that can also mean a group of thousands of people. So it will always under-report heavily the number of unique devices if you use "recurring user agents" as base for your main calculation.

Now, I might be missing something as your question is brief, maybe you can elaborate a bit more ? 


>I am worried that the current number, due to the freshness offset  might be overreporting
Since the offset calculation takes IP into account when looking for freshness and it only keeps devices having 1 event without cookies and 0 with cookies the calculation is likely to under-report in mobile, due to, again, NAT-ing and user agents being shared among many devices. We see this on our data as smaller offset numbers in mobile projects than desktop projects. Now, this methodology might over report for a user that uses many distinct IPS, same browser, does 1 request and clears cookies after every session, now this is a far less often a common of a scenario. 

Hopefully this makes sense. 
 

>Again, congratulations on the work! I am really happy to see the WMF not being dependent on a commercial traffic numbers provider anymore!

On Fri, Apr 8, 2016 at 10:30 AM, Denny Vrandečić <vrandecic@gmail.com> wrote:
Hi Nuria, Aaron,

first congratulations on the Unique devices work! I am really impressed by the solution and the dataset. I am looking forward to the visualizations that will come out from this.

A minor question - could you also count the number of unique recurring user agents per month? I.e. the number of visits that return and have a still valid cookie (e.g. by marking the cookie after the count).

My reasoning is the following: knowing well that it would possibly further underreport the number of unique user agents, it would get rid of all user agents that clean their cookies out or that use some form of incognito mode. It would only count people who have been there, got a cookie, returned, and then we mark the cookie, and don't count them further until it expires.

I am worried that the current number, due to the freshness offset [1], might be overreporting, and I do not agree fully with your reasoning in that page that this is OK. Counting only the recurring ones would clean that up, give a more reliable number, although it would potentially underreport the people who indeed only come once a month (a number I don't expect to be too large).

It would be interesting to see these two numbers side by side.

Again, congratulations on the work! I am really happy to see the WMF not being dependent on a commercial traffic numbers provider anymore!

Cheers,
Denny