Jorg, take a look at https://dumps.wikimedia.org/other/pagecounts-ez/ which has compressed data without losing granularity.  You can get monthly files here and download a lot less data.

On Mon, Mar 6, 2017 at 5:40 AM, Jörg Jung <joerg.jung@retevastum.de> wrote:
Marcel,

thanx for ur quick answer.
My main issue with dumps (or i don't get something) is:

I need to download them first to be able to aggregate and filter.
Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m = about
350TB

As i am not sitting directly at DE-CIX but in my private office i will
face a pretty hard time with that :-)

So my idea is that somebody "closer" to the raw data would basically do
the aggregation and filtering for me...

Will somebody (please) ?

Thanx, JJ

Am 06.03.2017 um 11:14 schrieb Marcel Ruiz Forns:
> Hi Jörg, :]
>
> Do you mean top 250K most viewed *articles* in de.wikipedia.org
> <http://de.wikipedia.org>?
>
> If so, I think you can get that from the dumps indeed. You can find 2016
> hourly pageview stats by article for all wikis
> here: https://dumps.wikimedia.org/other/pageviews/2016/
>
> Note that the wiki codes (first column) you're interested in are: /de/,
> /de.m/ and /de.zero/.
> The third column holds the number of pageviews you're after.
> Also, this data set does not include bot traffic as recognized by the
> pageview definition <https://meta.wikimedia.org/wiki/Research:Page_view>.
> As files are hourly and contain data for all wikis, you'll need some
> aggregation and filtering.
>
> Cheers!
>
> On Mon, Mar 6, 2017 at 2:59 AM, Jörg Jung <joerg.jung@retevastum.de
> <mailto:joerg.jung@retevastum.de>> wrote:
>
>     Ladies, gents,
>
>     for a project i plan i'd need the following data:
>
>     Top 250K sites for 2016 in project de.wikipedia.org
>     <http://de.wikipedia.org>, user-access.
>
>     I only need the name of the site and the corrsponding number of
>     user-accesses (all channels) for 2016 (sum over the year).
>
>     As far as i can see i can't get that data via REST or by aggegating
>     dumps.
>
>     So i'd like to ask here, if someone likes to helpout.
>
>     Thanx, cheers, JJ
>
>     --
>     Jörg Jung, Dipl. Inf. (FH)
>     Hasendriesch 2
>     D-53639 Königswinter
>     E-Mail:     joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>
>     Web:        www.retevastum.de <http://www.retevastum.de>
>                 www.datengraphie.de <http://www.datengraphie.de>
>                 www.digitaletat.de <http://www.digitaletat.de>
>                 www.olfaktum.de <http://www.olfaktum.de>
>
>     _______________________________________________
>     Analytics mailing list
>     Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
>     https://lists.wikimedia.org/mailman/listinfo/analytics
>     <https://lists.wikimedia.org/mailman/listinfo/analytics>
>
>
>
>
> --
> *Marcel Ruiz Forns*
> Analytics Developer
> Wikimedia Foundation
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

--
Jörg Jung, Dipl. Inf. (FH)
Hasendriesch 2
D-53639 Königswinter
E-Mail:     joerg.jung@retevastum.de
Web:        www.retevastum.de
            www.datengraphie.de
            www.digitaletat.de
            www.olfaktum.de

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics