On Thu, Sep 18, 2014 at 3:49 PM, Pine W <wiki.pine@gmail.com> wrote:
Yes, but supposedly phone survey companies are able to get representative samples of broad populations despite many people refusing to respond to phone surveys. If opt-in users were chosen using similar methods, could arguably representative data be obtained?

In theory, sure, but that's a high bar. Responsible phone survey firms that generate high quality data generally work very hard to draw random samples of the population under consideration, follow up with non-respondents numerous times to maximize the response rate, develop nuanced survey weights for their data in order to adjust the responses relative to known parameters of larger populations (when possible) and - at least recently - often conduct ongoing studies to ensure that their data quality remains high (e.g., in response to the transition away from land-lines toward cell-phone only users among some demographic groups).

Many of these practices are very difficult to map into contexts like Wikipedia, WMF projects, or online communities more broadly. Even the most sophisticated web-metrics data providers (e.g., ComScore, Quantcast) struggle with the issues of non-response and data quality. Those firms do not publish much about their methodologies and do not share their data with non-paying members of the public.

Mako and I have written about some of these issues in a PLoS ONE article[1] where we also attempt to correct some existing Wikipedia survey data using an interesting technique that draws on overlapping questions in an opt-in survey and a nationally-representative phone survey of US adults. I've also talked with a few communities about conducting surveys in a manner that would be more likely to generate high quality data along these lines, but without much to show for it yet. It would be great to see more people (scholars/communities/observers) move in this direction.

all the best,
a

[1] http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0065782

On Thu, Sep 18, 2014 at 3:49 PM, Pine W <wiki.pine@gmail.com> wrote:

Yes, but supposedly phone survey companies are able to get representative samples of broad populations despite many people refusing to respond to phone surveys. If opt-in users were chosen using similar methods, could arguably representative data be obtained?

Pine

On Sep 18, 2014 1:32 PM, "Benj. Mako Hill" <mako@atdot.cc> wrote:
<quote who="Pine W" date="Thu, Sep 18, 2014 at 12:07:53PM -0700">
> I suppose you could get more granular data by conducting an opt-in study of
> some kind, and you would need to be careful that users who haven't opted in
> are not accidentally included or indirectly have their privacy affected. I
> agree that collection at intervals shorter than an hour is going to raise a
> lot of privacy considerations for users who have not opted in.

That would certainly work for some research questions and that's more
or less what most toolbar data is.

The problem is that often questions answered with view data are about
the overall popularity of visibility of pages which requires data that
is representative. There's lots of reasons to believe that people who
opt-in aren't going to be representative of all Wikipedia readers.

Regards,
Mako


--
Benjamin Mako Hill
http://mako.cc/

Creativity can be a social contribution, but only in so far
as society is free to use the results. --GNU Manifesto

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l