Jorg, the project abbreviations are explained in depth here: https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageviews
On Mon, Mar 6, 2017 at 11:15 AM, Jörg Jung joerg.jung@retevastum.de wrote:
Yeah, Dan, that will work, thanx.
Just out of curiosity: Why are there three projects for "de" and what is the difference between them ? /de/,/de.m/ and /de.zero/
Cheers, JJ
Am 06.03.2017 um 15:45 schrieb Dan Andreescu:
Jorg, take a look at https://dumps.wikimedia.org/other/pagecounts-ez/ which has compressed data without losing granularity. You can get monthly files here and download a lot less data.
On Mon, Mar 6, 2017 at 5:40 AM, Jörg Jung <joerg.jung@retevastum.de mailto:joerg.jung@retevastum.de> wrote:
Marcel, thanx for ur quick answer. My main issue with dumps (or i don't get something) is: I need to download them first to be able to aggregate and filter. Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m =
about
350TB As i am not sitting directly at DE-CIX but in my private office i
will
face a pretty hard time with that :-) So my idea is that somebody "closer" to the raw data would basically
do
the aggregation and filtering for me... Will somebody (please) ? Thanx, JJ Am 06.03.2017 um 11:14 schrieb Marcel Ruiz Forns: > Hi Jörg, :] > > Do you mean top 250K most viewed *articles* in de.wikipedia.org <http://de.wikipedia.org> > <http://de.wikipedia.org>? > > If so, I think you can get that from the dumps indeed. You can
find 2016
> hourly pageview stats by article for all wikis > here: https://dumps.wikimedia.org/other/pageviews/2016/ <https://dumps.wikimedia.org/other/pageviews/2016/> > > Note that the wiki codes (first column) you're interested in are: /de/, > /de.m/ and /de.zero/. > The third column holds the number of pageviews you're after. > Also, this data set does not include bot traffic as recognized by
the
> pageview definition <https://meta.wikimedia.org/wiki/Research:Page_view <https://meta.wikimedia.org/wiki/Research:Page_view>>. > As files are hourly and contain data for all wikis, you'll need
some
> aggregation and filtering. > > Cheers! > > On Mon, Mar 6, 2017 at 2:59 AM, Jörg Jung <
joerg.jung@retevastum.de mailto:joerg.jung@retevastum.de
> <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>>>
wrote:
> > Ladies, gents, > > for a project i plan i'd need the following data: > > Top 250K sites for 2016 in project de.wikipedia.org <
> <http://de.wikipedia.org>, user-access. > > I only need the name of the site and the corrsponding number of > user-accesses (all channels) for 2016 (sum over the year). > > As far as i can see i can't get that data via REST or by
aggegating
> dumps. > > So i'd like to ask here, if someone likes to helpout. > > Thanx, cheers, JJ > > -- > Jörg Jung, Dipl. Inf. (FH) > Hasendriesch 2 > D-53639 Königswinter > E-Mail: joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de> <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>> > Web: www.retevastum.de <http://www.retevastum.de> <http://www.retevastum.de> > www.datengraphie.de <http://www.datengraphie.de> <http://www.datengraphie.de> > www.digitaletat.de <http://www.digitaletat.de> <http://www.digitaletat.de> > www.olfaktum.de <http://www.olfaktum.de> <http://www.olfaktum.de> > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> <mailto:Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>> > https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics> > <https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics>> > > > > > -- > *Marcel Ruiz Forns* > Analytics Developer > Wikimedia Foundation > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org <mailto:Analytics@lists.
wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics> > -- Jörg Jung, Dipl. Inf. (FH) Hasendriesch 2 D-53639 Königswinter E-Mail: joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.
de>
Web: www.retevastum.de <http://www.retevastum.de> www.datengraphie.de <http://www.datengraphie.de> www.digitaletat.de <http://www.digitaletat.de> www.olfaktum.de <http://www.olfaktum.de> _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics>
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Jörg Jung, Dipl. Inf. (FH) Hasendriesch 2 D-53639 Königswinter E-Mail: joerg.jung@retevastum.de Web: www.retevastum.de www.datengraphie.de www.digitaletat.de www.olfaktum.de
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics