+research
Fascinating. Thanks for sharing this, Nemo. And for setting those arrogant
Stackers straight ;)
For anyone else interested: Nemo was able to answer this question because
StackExchange has a Quarry <http://quarry.wmflabs.org/>-like public query
interface of their own. You should go play with it right now:
http://data.stackexchange.com/
Jonathan
On Fri, Nov 13, 2015 at 10:56 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com>
wrote:
> Some information at
> https://meta.stackexchange.com/questions/269334/how-many-active-users-contr…
>
> TL;DR: not really, and definitely not StackOverflow alone (~14k). But
> perhaps the whole StackExchange has more than the English Wikipedia alone.
>
> Nemo
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Hi all,
I write this email on the public list hoping that the discussion could
be of interest for more people.
I am working with a student on scientific citation on Wikipedia and,
very simply put, we would like to use the pageview dataset to have a
rough measure of how many times a paper was viewed thanks to
Wikipedia.[*]
The full dataset is, as of now, ~ 4.7TB in size.
I have two questions:
* if we download this dataset this would entail, from a first
estimation, ~ 30 days of continuous download (assuming an average
download speed of ~ 2MB/s, which was what we measured over the
download of a month of data (~ 64GB)). Here at my University (Trento,
Italy) this kind of downloads have to be notified to the IT
department. I was wondering if this would be a useful information for
the WMF, too.
* given the estimation above I was wondering if it is possible to
obtain this data over FedEx Bandwith (cit. [1]). i.e. via shipping of
a physical disk, I know that in some fields (e.g. neuroscience) this
is the standard way to exchange big dataset (in the order of TBs).
Thanks in advance for your help.
Cristian
[*] I know these are pageviews and not unique visitors, furthermore
there is no guarantee that viewing a citation means anything. I am
approaching to this data the same way "impressions" versus
"clicktroughs" are treated in the online advertising world.
[1] https://what-if.xkcd.com/31/