Ok, guys, thanx alot !
Am 06.03.2017 um 17:33 schrieb Dan Andreescu:
Jorg, the project abbreviations are explained in depth here: https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageviews
On Mon, Mar 6, 2017 at 11:15 AM, Jörg Jung <joerg.jung@retevastum.de mailto:joerg.jung@retevastum.de> wrote:
Yeah, Dan, that will work, thanx. Just out of curiosity: Why are there three projects for "de" and what is the difference between them ? /de/,/de.m/ and /de.zero/ Cheers, JJ Am 06.03.2017 um 15:45 schrieb Dan Andreescu: > Jorg, take a look at https://dumps.wikimedia.org/other/pagecounts-ez/ <https://dumps.wikimedia.org/other/pagecounts-ez/> > which has compressed data without losing granularity. You can get > monthly files here and download a lot less data. > > On Mon, Mar 6, 2017 at 5:40 AM, Jörg Jung <joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de> > <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>>> wrote: > > Marcel, > > thanx for ur quick answer. > My main issue with dumps (or i don't get something) is: > > I need to download them first to be able to aggregate and filter. > Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m = about > 350TB > > As i am not sitting directly at DE-CIX but in my private office i will > face a pretty hard time with that :-) > > So my idea is that somebody "closer" to the raw data would basically do > the aggregation and filtering for me... > > Will somebody (please) ? > > Thanx, JJ > > Am 06.03.2017 um 11:14 schrieb Marcel Ruiz Forns: > > Hi Jörg, :] > > > > Do you mean top 250K most viewed *articles* in de.wikipedia.org <http://de.wikipedia.org> > <http://de.wikipedia.org> > > <http://de.wikipedia.org>? > > > > If so, I think you can get that from the dumps indeed. You can find 2016 > > hourly pageview stats by article for all wikis > > here: https://dumps.wikimedia.org/other/pageviews/2016/ <https://dumps.wikimedia.org/other/pageviews/2016/> > <https://dumps.wikimedia.org/other/pageviews/2016/ <https://dumps.wikimedia.org/other/pageviews/2016/>> > > > > Note that the wiki codes (first column) you're interested in are: > /de/, > > /de.m/ and /de.zero/. > > The third column holds the number of pageviews you're after. > > Also, this data set does not include bot traffic as recognized by the > > pageview definition > <https://meta.wikimedia.org/wiki/Research:Page_view <https://meta.wikimedia.org/wiki/Research:Page_view> > <https://meta.wikimedia.org/wiki/Research:Page_view <https://meta.wikimedia.org/wiki/Research:Page_view>>>. > > As files are hourly and contain data for all wikis, you'll need some > > aggregation and filtering. > > > > Cheers! > > > > On Mon, Mar 6, 2017 at 2:59 AM, Jörg Jung <joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de> <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>> > > <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de> <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>>>> wrote: > > > > Ladies, gents, > > > > for a project i plan i'd need the following data: > > > > Top 250K sites for 2016 in project de.wikipedia.org <http://de.wikipedia.org> <http://de.wikipedia.org> > > <http://de.wikipedia.org>, user-access. > > > > I only need the name of the site and the corrsponding number of > > user-accesses (all channels) for 2016 (sum over the year). > > > > As far as i can see i can't get that data via REST or by aggegating > > dumps. > > > > So i'd like to ask here, if someone likes to helpout. > > > > Thanx, cheers, JJ > > > > -- > > Jörg Jung, Dipl. Inf. (FH) > > Hasendriesch 2 > > D-53639 Königswinter > > E-Mail: joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de> > <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>> <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de> > <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>>> > > Web: www.retevastum.de <http://www.retevastum.de> <http://www.retevastum.de> > <http://www.retevastum.de> > > www.datengraphie.de <http://www.datengraphie.de> <http://www.datengraphie.de> > <http://www.datengraphie.de> > > www.digitaletat.de <http://www.digitaletat.de> <http://www.digitaletat.de> > <http://www.digitaletat.de> > > www.olfaktum.de <http://www.olfaktum.de> <http://www.olfaktum.de> > <http://www.olfaktum.de> > > > > _______________________________________________ > > Analytics mailing list > > Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> > <mailto:Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>> > <mailto:Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> > <mailto:Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>>> > > https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics> > <https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics>> > > <https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics> > <https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics>>> > > > > > > > > > > -- > > *Marcel Ruiz Forns* > > Analytics Developer > > Wikimedia Foundation > > > > > > _______________________________________________ > > Analytics mailing list > > Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> <mailto:Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>> > > https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics> > <https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics>> > > > > -- > Jörg Jung, Dipl. Inf. (FH) > Hasendriesch 2 > D-53639 Königswinter > E-Mail: joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de> <mailto:joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de>> > Web: www.retevastum.de <http://www.retevastum.de> <http://www.retevastum.de> > www.datengraphie.de <http://www.datengraphie.de> <http://www.datengraphie.de> > www.digitaletat.de <http://www.digitaletat.de> <http://www.digitaletat.de> > www.olfaktum.de <http://www.olfaktum.de> <http://www.olfaktum.de> > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> <mailto:Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>> > https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics> > <https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics>> > > > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics> > -- Jörg Jung, Dipl. Inf. (FH) Hasendriesch 2 D-53639 Königswinter E-Mail: joerg.jung@retevastum.de <mailto:joerg.jung@retevastum.de> Web: www.retevastum.de <http://www.retevastum.de> www.datengraphie.de <http://www.datengraphie.de> www.digitaletat.de <http://www.digitaletat.de> www.olfaktum.de <http://www.olfaktum.de> _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics <https://lists.wikimedia.org/mailman/listinfo/analytics>
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics