Yeah, Dan, that will work, thanx.
Just out of curiosity: Why are there three projects for "de" and what is
the difference between them ? /de/,/de.m/ and /de.zero/
Cheers, JJ
Am 06.03.2017 um 15:45 schrieb Dan Andreescu:
> Jorg, take a look at https://dumps.wikimedia.org/other/pagecounts-ez/
> which has compressed data without losing granularity. You can get
> monthly files here and download a lot less data.
>
> On Mon, Mar 6, 2017 at 5:40 AM, Jörg Jung <joerg.jung@retevastum.de
> > <mailto:joerg.jung@retevastum.> <mailto:joerg.jung@retevastum.de >> wrote:
>
> Marcel,
>
> thanx for ur quick answer.
> My main issue with dumps (or i don't get something) is:
>
> I need to download them first to be able to aggregate and filter.
> Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m = about
> 350TB
>
> As i am not sitting directly at DE-CIX but in my private office i will
> face a pretty hard time with that :-)
>
> So my idea is that somebody "closer" to the raw data would basically do
> the aggregation and filtering for me...
>
> Will somebody (please) ?
>
> Thanx, JJ
>
> Am 06.03.2017 um 11:14 schrieb Marcel Ruiz Forns:
> > Hi Jörg, :]
> >
> > Do you mean top 250K most viewed *articles* in de.wikipedia.org
> <http://de.wikipedia.org>
> > <http://de.wikipedia.org>?
> >
> > If so, I think you can get that from the dumps indeed. You can find 2016
> > hourly pageview stats by article for all wikis
> > here: https://dumps.wikimedia.org/other/pageviews/2016/
> <https://dumps.wikimedia.org/other/pageviews/2016/ >
> >
> > Note that the wiki codes (first column) you're interested in are:
> /de/,
> > /de.m/ and /de.zero/.
> > The third column holds the number of pageviews you're after.
> > Also, this data set does not include bot traffic as recognized by the
> > pageview definition
> <https://meta.wikimedia.org/wiki/Research:Page_view
> <https://meta.wikimedia.org/wiki/Research:Page_view >>.
> > As files are hourly and contain data for all wikis, you'll need some
> > aggregation and filtering.
> >
> > Cheers!
> >
> > On Mon, Mar 6, 2017 at 2:59 AM, Jörg Jung <joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de >
de <mailto:joerg.jung@retevastum.de >>> wrote:
> >
> > Ladies, gents,
> >
> > for a project i plan i'd need the following data:
> >
> > Top 250K sites for 2016 in project de.wikipedia.org <http://de.wikipedia.org>
> > <http://de.wikipedia.org>, user-access.
> >
> > I only need the name of the site and the corrsponding number of
> > user-accesses (all channels) for 2016 (sum over the year).
> >
> > As far as i can see i can't get that data via REST or by aggegating
> > dumps.
> >
> > So i'd like to ask here, if someone likes to helpout.
> >
> > Thanx, cheers, JJ
> >
> > --
> > Jörg Jung, Dipl. Inf. (FH)
> > Hasendriesch 2
> > D-53639 Königswinter
> > E-Mail: joerg.jung@retevastum.de
> <mailto:joerg.jung@retevastum.de > <mailto:joerg.jung@retevastum.de
> <mailto:joerg.jung@retevastum.de >>
> > Web: www.retevastum.de <http://www.retevastum.de>
> <http://www.retevastum.de>
> > www.datengraphie.de <http://www.datengraphie.de>
> <http://www.datengraphie.de>
> > www.digitaletat.de <http://www.digitaletat.de>
> <http://www.digitaletat.de>
> > www.olfaktum.de <http://www.olfaktum.de>
> <http://www.olfaktum.de>
> >
> > _______________________________________________ > <mailto:Analytics@lists.
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> <mailto:Analytics@lists.wikimedia.org >
wikimedia.org
> <mailto:Analytics@lists.wikimedia.org >>
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> <https://lists.wikimedia.org/mailman/listinfo/analytics >
> > <https://lists.wikimedia.org/mailman/listinfo/analytics
> <https://lists.wikimedia.org/mailman/listinfo/analytics >>
> >
> >
> >
> >
> > --
> > *Marcel Ruiz Forns*
> > Analytics Developer
> > Wikimedia Foundation
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org >
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> <https://lists.wikimedia.org/mailman/listinfo/analytics >
> >
>
> --
> Jörg Jung, Dipl. Inf. (FH)
> Hasendriesch 2
> D-53639 Königswinter
> E-Mail: joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de >
> Web: www.retevastum.de <http://www.retevastum.de>
> www.datengraphie.de <http://www.datengraphie.de>
> www.digitaletat.de <http://www.digitaletat.de>
> www.olfaktum.de <http://www.olfaktum.de>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org >
> https://lists.wikimedia.org/mailman/listinfo/analytics
> <https://lists.wikimedia.org/mailman/listinfo/analytics >
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
Jörg Jung, Dipl. Inf. (FH)
Hasendriesch 2
D-53639 Königswinter
E-Mail: joerg.jung@retevastum.de
Web: www.retevastum.de
www.datengraphie.de
www.digitaletat.de
www.olfaktum.de
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics