Yeah, Dan, that will work, thanx.
Just out of curiosity: Why are there three projects for "de" and what is
the difference between them ? /de/,/de.m/ and /de.zero/
Cheers, JJ
Am 06.03.2017 um 15:45 schrieb Dan Andreescu:
Jorg, take a look at
https://dumps.wikimedia.org/other/pagecounts-ez/
which has compressed data without losing granularity. You can get
monthly files here and download a lot less data.
On Mon, Mar 6, 2017 at 5:40 AM, Jörg Jung <joerg.jung(a)retevastum.de
<mailto:joerg.jung@retevastum.de>> wrote:
Marcel,
thanx for ur quick answer.
My main issue with dumps (or i don't get something) is:
I need to download them first to be able to aggregate and filter.
Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m = about
350TB
As i am not sitting directly at DE-CIX but in my private office i will
face a pretty hard time with that :-)
So my idea is that somebody "closer" to the raw data would basically do
the aggregation and filtering for me...
Will somebody (please) ?
Thanx, JJ
Am 06.03.2017 um 11:14 schrieb Marcel Ruiz Forns:
Hi Jörg, :]
Do you mean top 250K most viewed *articles* in
de.wikipedia.org
<http://de.wikipedia.org>
<http://de.wikipedia.org>?
If so, I think you can get that from the dumps indeed. You can find 2016
hourly pageview stats by article for all wikis
here:
https://dumps.wikimedia.org/other/pageviews/2016/
<https://dumps.wikimedia.org/other/pageviews/2016/>
Note that the wiki codes (first column) you're interested in are:
/de/,
/de.m/ and /de.zero/.
The third column holds the number of pageviews you're after.
Also, this data set does not include bot traffic as recognized by the
pageview definition
<https://meta.wikimedia.org/wiki/Research:Page_view
<https://meta.wikimedia.org/wiki/Research:Page_view>>.
As files are hourly and contain data for all
wikis, you'll need some
aggregation and filtering.
Cheers!
On Mon, Mar 6, 2017 at 2:59 AM, Jörg Jung <joerg.jung(a)retevastum.de
<mailto:joerg.jung@retevastum.de>
<mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>>>
wrote:
Ladies, gents,
for a project i plan i'd need the following data:
Top 250K sites for 2016 in project
de.wikipedia.org <http://de.wikipedia.org>
<http://de.wikipedia.org>, user-access.
I only need the name of the site and the corrsponding number of
user-accesses (all channels) for 2016 (sum over the year).
As far as i can see i can't get that data via REST or by aggegating
dumps.
So i'd like to ask here, if someone likes to helpout.
Thanx, cheers, JJ
--
Jörg Jung, Dipl. Inf. (FH)
Hasendriesch 2
D-53639 Königswinter
E-Mail: joerg.jung(a)retevastum.de
<mailto:joerg.jung@retevastum.de> <mailto:joerg.jung@retevastum.de
<mailto:joerg.jung@retevastum.de>>
Web:
www.retevastum.de
<http://www.retevastum.de>
<http://www.retevastum.de>
www.datengraphie.de
<http://www.datengraphie.de>
<http://www.datengraphie.de>
www.digitaletat.de
<http://www.digitaletat.de>
<http://www.digitaletat.de>
www.olfaktum.de
<http://www.olfaktum.de>
<http://www.olfaktum.de>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
<mailto:Analytics@lists.wikimedia.org>
<mailto:Analytics@lists.wikimedia.org
<mailto:Analytics@lists.wikimedia.org>>
<https://lists.wikimedia.org/mailman/listinfo/analytics>
<https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>>
--
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
--
Jörg Jung, Dipl. Inf. (FH)
Hasendriesch 2
D-53639 Königswinter
E-Mail: joerg.jung(a)retevastum.de <mailto:joerg.jung@retevastum.de>
Web:
www.retevastum.de <http://www.retevastum.de>
www.datengraphie.de <http://www.datengraphie.de>
www.digitaletat.de <http://www.digitaletat.de>
www.olfaktum.de <http://www.olfaktum.de>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jörg Jung, Dipl. Inf. (FH)
Hasendriesch 2
D-53639 Königswinter
E-Mail: joerg.jung(a)retevastum.de
Web: