unique visitors

List overview All Threads
Download

newer

older

WWW 2016 Wiki Workshop: accepted...

Re: [Wiki-research-l] unique...

phoebe ayers

17 Mar 2016 17 Mar '16

5:52 p.m.

Show replies by date

Dan Andreescu

17 Mar 17 Mar

6:04 p.m.

Hi Phoebe, We can't use the comScore numbers any more because they were not tracking mobile devices. Since a lot of our traffic is now coming from mobile devices, comScore numbers were showing a sharp decline, but this was not a reflection of reality. We've been working on a way to measure uniques that's sensitive to privacy, and we're very close to release that data, we're just working on the blog post now. One of the drawbacks is that we can't report on a single total number across all our projects. We'll have an announcement shortly but until then I'll see if we can share the numbers with you privately. Since we can't compute one total number, would you give us a list of wikis that you would like numbers for? Dan On Thu, Mar 17, 2016 at 12:52 PM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

...

Hi all, Can someone help me with my failing memory and remind me what the current state of affairs is re: unique visitors -- we're not counting them anymore? We are counting them but not via comscore? Something else? Just putting together a talk and wanted the latest numbers. thanks, -- Phoebe -- * I use this address for lists; send personal messages to phoebe.ayers <at> gmail.com * _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

phoebe ayers

8:40 p.m.

On Thu, Mar 17, 2016 at 1:04 PM, Dan Andreescu <dandreescu(a)wikimedia.org> wrote:

...

Right, I thought this was the case but didn't know the current state of affairs. (Missed the announcement from Kevin). We've been working on a way to measure uniques

...

that's sensitive to privacy, and we're very close to release that data, we're just working on the blog post now.

Awesome, that is great to hear! Congratulations! One of the drawbacks is that we

...

can't report on a single total number across all our projects.

Hmm. That's unfortunate for understanding reach -- if nothing else, the idea that "half a billion people access Wikipedia" (eg from earlier comscore reports) was a PR-friendly way of giving an idea of the scale of our readership. But I can see why it would be tricky to measure. Since this is the research list: I suspect there's still lots to be done in understanding just how multilingual people use different language editions of Wikipedia, too. We'll have

...

an announcement shortly but until then I'll see if we can share the numbers with you privately. Since we can't compute one total number, would you give us a list of wikis that you would like numbers for?

I was thinking about total numbers, but for this particular project maybe English, Spanish & Arabic -- I'm interested in global languages. Thank you! Phoebe

...

Dan On Thu, Mar 17, 2016 at 12:52 PM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

_______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- * I use this address for lists; send personal messages to phoebe.ayers <at> gmail.com *

Leila Zia

9:57 p.m.

Hey Phoebe, On Thu, Mar 17, 2016 at 12:40 PM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

...

On Thu, Mar 17, 2016 at 1:04 PM, Dan Andreescu <dandreescu(a)wikimedia.org> wrote: One of the drawbacks is that we

can't report on a single total number across all our projects.

I'm not sure how ComScore estimated unique visitors/people, and whether they estimated unique users/people or unique devices. I'd like to point out that the current approach for counting uniques that Dan has referred to will tell you the unique device count, and not unique user/people count. There may be big differences in those numbers, for example, I access Wikipedia with at least two devices everyday and I know I'm not unique. ;) Erik Zachte and I have had some very early discussions about how to go from the unique device number to unique user number, but we are nowhere close to any solution. In the mean time, I suggest replacing unique users with unique devices in your communications. Good luck with your presentation. :) Best, Leila

...

Phoebe

Dan On Thu, Mar 17, 2016 at 12:52 PM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

_______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- * I use this address for lists; send personal messages to phoebe.ayers <at> gmail.com * _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Andrew Gray

18 Mar 18 Mar

12:09 p.m.

On 17 March 2016 at 19:40, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

...

One of the drawbacks is that we can't report on a single total number across all our projects.

Building on this question a little: with the information we currently have, is it actively *wrong* for us to keep using the "half a billion" figure as a very rough first-order estimate? (Like Phoebe, I think I keep trotting it out when giving talks). Do the new figures give us reason to think it's substantially higher or lower than that, or even not meaningfully answerable? -- - Andrew Gray andrew.gray(a)dunelm.org.uk

Toby Negrin

2:45 p.m.

Hi Andrew, Phoebe -- here's what the Communication department here is comfortable with: Wikipedia and the other projects operated by the Wikimedia Foundation

...

receive hundreds of millions of unique visitors per month

The numbers of unique devices for enwiki are far greater than 500mm. But of course, people use more than one device and/or shared computers. There are datasets out there about average number of devices per person so we could potentially use this as a scaling factor to get a higher level of confidence but IMO the mapping from device to actual human is always going to be dicey. For comparison, I worked at Yahoo for a long time and generally understand their tech stack -- in their 2014 annual report, Yahoo speaks to "more than 1 billion MAUs".[1] From my experience, I really don't know how they could measure this with any certainty without estimation or other statistical techniques because they have the same measurement issues that we do. Only Facebook or other sites where personalization is necessary for the site to work can report on reach without some sort of qualification. -Toby [1] http://static.tumblr.com/7drgjla/386nnw4n9/yahoo_inc._2014_annual_report.pdf On Fri, Mar 18, 2016 at 4:09 AM, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote:

...

On 17 March 2016 at 19:40, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

One of the drawbacks is that we can't report on a single total number across all our projects.

phoebe ayers

3:45 p.m.

On Fri, Mar 18, 2016 at 9:45 AM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:

...

Hi Andrew, Phoebe -- here's what the Communication department here is comfortable with:

Wikipedia and the other projects operated by the Wikimedia Foundation receive hundreds of millions of unique visitors per month

Thanks Toby. I'll start using that and add it to my presentation toolkit.

...

Sounds like a research project! Like Andrew, for communication purposes I'm less interested in exactitude than I am in order-of-magnitude. (The kinds of things I use these numbers for: comparisons against the online population, against the population reached by libraries, etc. -- all of which is deeply qualified, of course.) Semi-off topic thoughts about "reach" as a metric: I wonder if there's a qualitative project somewhere in here about *types* of use -- e.g. if I'm using WP on my phone & my work pc is that really equivalent use? Perhaps I am using them for different kinds of information seeking, e.g. looking up terms related to work vs looking up info on movie stars -- does this different kind of use matter for how we construct and present information, or count "use"? Can we build testable hypotheses about use patterns & needs for people who do straight device swapping (phone to tablet to pc, for the same purposes) versus people who have devices for different purposes (i.e. work v. personal) versus people who share devices? (Obviously, all this goes well beyond just Wikipedia use). I also think there's something in here about levels of access related to language, which relates to multilingual use of Wikipedia. Someone who speaks a language served by a Wikipedia with 100K articles can not access Wikipedia to the same depth or level that a person who can use English Wikipedia with 5M articles can. In other words, though we talk about reach, not all reach is the same. The depth to which Wikipedia reaches me -- someone with unlimited data on multiple devices and 24/7 device access for all purposes, who reads English well and a couple other languages poorly -- is way different from the depth which someone with part-time access on a mobile phone who speaks an underserved language is reached by our projects. This may be pretty obvious, but I hadn't thought about the implications for claiming "we reach x millions" before. -- phoebe

...

For comparison, I worked at Yahoo for a long time and generally understand their tech stack -- in their 2014 annual report, Yahoo speaks to "more than 1 billion MAUs".[1] From my experience, I really don't know how they could measure this with any certainty without estimation or other statistical techniques because they have the same measurement issues that we do. Only Facebook or other sites where personalization is necessary for the site to work can report on reach without some sort of qualification. -Toby [1] http://static.tumblr.com/7drgjla/386nnw4n9/yahoo_inc._2014_annual_report.pdf On Fri, Mar 18, 2016 at 4:09 AM, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote:

On 17 March 2016 at 19:40, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

One of the drawbacks is that we can't report on a single total number across all our projects.

_______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- * I use this address for lists; send personal messages to phoebe.ayers <at> gmail.com *

Toby Negrin

4:44 p.m.

Hi Phoebe -- Allow me to direct you to the eminent statistician George Box's aphorism: "All models are wrong but some are useful"[1] :) Right now the Reading team models our success on three axes: Reach: how many people read our content Engagement: how much do they read Retention: how often do they come back Just as metrics for games measure "fun", we think of ours as measuring "learning" but there's a lot of nuance left on the table, particularly in the areas that you highlight above. We're slowly building up our quantitive and qualitative understanding of our Readers. Leila and team have been doing some really interesting work <https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour> on Reader intention and we've made some serious strides in our qualitative research as well. Every new project and exploration adds another dimension of understanding but clearly we need to balance complexity and effort with expressive value. -Toby [1] https://en.wikipedia.org/wiki/All_models_are_wrong On Fri, Mar 18, 2016 at 7:45 AM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

...

On Fri, Mar 18, 2016 at 9:45 AM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:

Hi Andrew, Phoebe -- here's what the Communication department here is comfortable with:

Wikipedia and the other projects operated by the Wikimedia Foundation receive hundreds of millions of unique visitors per month

Thanks Toby. I'll start using that and add it to my presentation toolkit.

The numbers of unique devices for enwiki are far greater than 500mm.

But of

course, people use more than one device and/or shared computers. There

are

datasets out there about average number of devices per person so we could potentially use this as a scaling factor to get a higher level of

confidence

but IMO the mapping from device to actual human is always going to be

dicey. Sounds like a research project! Like Andrew, for communication purposes I'm less interested in exactitude than I am in order-of-magnitude. (The kinds of things I use these numbers for: comparisons against the online population, against the population reached by libraries, etc. -- all of which is deeply qualified, of course.) Semi-off topic thoughts about "reach" as a metric: I wonder if there's a qualitative project somewhere in here about *types* of use -- e.g. if I'm using WP on my phone & my work pc is that really equivalent use? Perhaps I am using them for different kinds of information seeking, e.g. looking up terms related to work vs looking up info on movie stars -- does this different kind of use matter for how we construct and present information, or count "use"? Can we build testable hypotheses about use patterns & needs for people who do straight device swapping (phone to tablet to pc, for the same purposes) versus people who have devices for different purposes (i.e. work v. personal) versus people who share devices? (Obviously, all this goes well beyond just Wikipedia use). I also think there's something in here about levels of access related to language, which relates to multilingual use of Wikipedia. Someone who speaks a language served by a Wikipedia with 100K articles can not access Wikipedia to the same depth or level that a person who can use English Wikipedia with 5M articles can. In other words, though we talk about reach, not all reach is the same. The depth to which Wikipedia reaches me -- someone with unlimited data on multiple devices and 24/7 device access for all purposes, who reads English well and a couple other languages poorly -- is way different from the depth which someone with part-time access on a mobile phone who speaks an underserved language is reached by our projects. This may be pretty obvious, but I hadn't thought about the implications for claiming "we reach x millions" before. -- phoebe

For comparison, I worked at Yahoo for a long time and generally

understand

their tech stack -- in their 2014 annual report, Yahoo speaks to "more

than

1 billion MAUs".[1] From my experience, I really don't know how they

could

measure this with any certainty without estimation or other statistical techniques because they have the same measurement issues that we do. Only Facebook or other sites where personalization is necessary for the site

work can report on reach without some sort of qualification. -Toby [1]

http://static.tumblr.com/7drgjla/386nnw4n9/yahoo_inc._2014_annual_report.pdf

On Fri, Mar 18, 2016 at 4:09 AM, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote:

On 17 March 2016 at 19:40, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

> One of the drawbacks is that we > can't report on a single total number across all our projects. Hmm. That's unfortunate for understanding reach -- if nothing else, the idea that "half a billion people access Wikipedia" (eg from earlier comscore reports) was a PR-friendly way of giving an idea of the scale of our readership. But I can see why it would be tricky to measure. Since this is the research list: I suspect there's still lots to be done in understanding just how multilingual people use different language editions of Wikipedia, too.

_______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Leila Zia

5:21 p.m.

On Fri, Mar 18, 2016 at 8:44 AM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:

...

We're slowly building up our quantitive and qualitative understanding of our Readers. Leila and team have been doing some really interesting work <https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour> on Reader intention and we've made some serious strides in our qualitative research as well.

Phoebe, the page Toby mentioned is well documented and we are updating it as we learn more. If you are interested to get a quick sense of what we know about reader motivations, depth of information they are seeking, and their prior knowledge of the topic before reading an article, please look at the analysis section in https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Be…. Those results are based on a survey we ran in November 2015. We have run another larger survey in March 2016, and we are currently analyzing that survey. It will take months for us to finish this work, but we are hoping to make significant progress in this area and answer some basic but fundamental questions we don't have answers for today. Of course, ping if you want to chat more about this research. :) Best, L

...

-Toby [1] https://en.wikipedia.org/wiki/All_models_are_wrong On Fri, Mar 18, 2016 at 7:45 AM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

On Fri, Mar 18, 2016 at 9:45 AM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:

Hi Andrew, Phoebe -- here's what the Communication department here is comfortable with:

Wikipedia and the other projects operated by the Wikimedia Foundation receive hundreds of millions of unique visitors per month

Thanks Toby. I'll start using that and add it to my presentation toolkit.

The numbers of unique devices for enwiki are far greater than 500mm.

But of

course, people use more than one device and/or shared computers. There

are

datasets out there about average number of devices per person so we

could

potentially use this as a scaling factor to get a higher level of

confidence

but IMO the mapping from device to actual human is always going to be

For comparison, I worked at Yahoo for a long time and generally

understand

their tech stack -- in their 2014 annual report, Yahoo speaks to "more

than

1 billion MAUs".[1] From my experience, I really don't know how they

could

measure this with any certainty without estimation or other statistical techniques because they have the same measurement issues that we do.

Only

Facebook or other sites where personalization is necessary for the site

work can report on reach without some sort of qualification. -Toby [1]

http://static.tumblr.com/7drgjla/386nnw4n9/yahoo_inc._2014_annual_report.pdf

On Fri, Mar 18, 2016 at 4:09 AM, Andrew Gray <andrew.gray(a)dunelm.org.uk wrote: > > On 17 March 2016 at 19:40, phoebe ayers <phoebe.wiki(a)gmail.com> wrote: > > >> One of the drawbacks is that we > >> can't report on a single total number across all our projects. > > > > Hmm. That's unfortunate for understanding reach -- if nothing else, > > the idea that "half a billion people access Wikipedia" (eg from > > earlier comscore reports) was a PR-friendly way of giving an idea of > > the scale of our readership. But I can see why it would be tricky to > > measure. Since this is the research list: I suspect there's still

lots

> > to be done in understanding just how multilingual people use

different

> language editions of Wikipedia, too. Building on this question a little: with the information we currently have, is it actively *wrong* for us to keep using the "half a billion" figure as a very rough first-order estimate? (Like Phoebe, I think I keep trotting it out when giving talks). Do the new figures give us reason to think it's substantially higher or lower than that, or even not meaningfully answerable? -- - Andrew Gray andrew.gray(a)dunelm.org.uk _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Leila Zia

5:26 p.m.

I'm sorry, I forgot to mention this: The link I shared is what we know about English Wikipedia. We have also run 2 surveys in Spanish and Persian Wikipedias that were ground-work for running a similar survey as in S3-English in those languages. It is important for us to understand if reader characteristics are different in other languages, and if so, we need to be able to characterize how they are different than English. L Leila Zia Research Scientist Wikimedia Foundation On Fri, Mar 18, 2016 at 9:21 AM, Leila Zia <leila(a)wikimedia.org> wrote:

...

On Fri, Mar 18, 2016 at 8:44 AM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:

-Toby [1] https://en.wikipedia.org/wiki/All_models_are_wrong On Fri, Mar 18, 2016 at 7:45 AM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:

On Fri, Mar 18, 2016 at 9:45 AM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:

Hi Andrew, Phoebe -- here's what the Communication department here is comfortable with: > Wikipedia and the other projects operated by the Wikimedia Foundation > receive hundreds of millions of unique visitors per month

Thanks Toby. I'll start using that and add it to my presentation toolkit.

The numbers of unique devices for enwiki are far greater than 500mm.

But of

course, people use more than one device and/or shared computers. There

are

datasets out there about average number of devices per person so we

could

potentially use this as a scaling factor to get a higher level of

confidence

but IMO the mapping from device to actual human is always going to be

For comparison, I worked at Yahoo for a long time and generally

understand

their tech stack -- in their 2014 annual report, Yahoo speaks to "more

than

1 billion MAUs".[1] From my experience, I really don't know how they

could

measure this with any certainty without estimation or other statistical techniques because they have the same measurement issues that we do.

Only

Facebook or other sites where personalization is necessary for the

site to

work can report on reach without some sort of qualification. -Toby [1]

http://static.tumblr.com/7drgjla/386nnw4n9/yahoo_inc._2014_annual_report.pdf

On Fri, Mar 18, 2016 at 4:09 AM, Andrew Gray <

andrew.gray(a)dunelm.org.uk>

wrote: > > On 17 March 2016 at 19:40, phoebe ayers <phoebe.wiki(a)gmail.com>

wrote:

> > >> One of the drawbacks is that we > >> can't report on a single total number across all our projects. > > > > Hmm. That's unfortunate for understanding reach -- if nothing else, > > the idea that "half a billion people access Wikipedia" (eg from > > earlier comscore reports) was a PR-friendly way of giving an idea of > > the scale of our readership. But I can see why it would be tricky to > > measure. Since this is the research list: I suspect there's still

lots

> > to be done in understanding just how multilingual people use

different

> > language editions of Wikipedia, too. > > Building on this question a little: with the information we currently > have, is it actively *wrong* for us to keep using the "half a billion" > figure as a very rough first-order estimate? (Like Phoebe, I think I > keep trotting it out when giving talks). Do the new figures give us > reason to think it's substantially higher or lower than that, or even > not meaningfully answerable? > > -- > - Andrew Gray > andrew.gray(a)dunelm.org.uk > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Mark J. Nelson

8:57 p.m.

phoebe ayers <phoebe.wiki(a)gmail.com> writes:

...

I wonder if there's a qualitative project somewhere in here about *types* of use -- e.g. if I'm using WP on my phone & my work pc is that really equivalent use? Perhaps I am using them for different kinds of information seeking, e.g. looking up terms related to work vs looking up info on movie stars -- does this different kind of use matter for how we construct and present information, or count "use"?

Beyond the issue of devices, I think this is important in part because the raw traffic counts (and reach numbers and similar) paint a very specific story of what Wikimedia is doing and is successful at. (And what you measure influences what you tend to optimize for.) Specifically, a small slice of content, mainly English Wikipedia articles on pop culture, recent news events, and U.S. politics, contribute a disproportionate share of views. (A weekly top-25 list for enwiki is at https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report ). So if you're measuring aggregate numbers, you're measuring mainly that specific type of content. If the goal is really simply to reach as many people as possible, have high page views and unique visitor counts, etc., then this subset of articles is really the only important part of Wikimedia's mission--- articles on, say, mathematics, don't contribute anything to moving the needle if that's the metric. -Mark

2960

days inactive

2961

days old

wiki-research-l@lists.wikimedia.org

Manage subscription

10 comments

6 participants

tags (0)

participants (6)

Andrew Gray
Dan Andreescu
Leila Zia
Mark J. Nelson
phoebe ayers
Toby Negrin