Analytics

analytics@lists.wikimedia.org

2 participants
1825 discussions

Fwd: Scribe Packaging Effort
by Andrew Otto 26 Jul '12

26 Jul '12

Hey guys, I reached out to this guy yesterday about the bug I ran into in Scribe. He had posted on the scribe-server google group that he had fixed this bug, and I also wanted to let him know about our potential efforts to standardize Scribe packaging. Here's his opinion on Scribe: > I personally gave up on Scribe, I'd recommend that you > consider Flume as a better replacement, that is more supported and > developed. Scribe has never been really well written or maintained, > it's just one of the many hacks that Facebook released. In general, it does seem to be pretty given up on. There have been a couple of pull requests merged in the last year, but beyond that there isn't much activity: https://github.com/facebook/scribe/commits/master It would be really interesting to know how (and if?) Facebook still uses Scribe internally. I'm pretty sure they've done a lot more with Hadoop since 2008-2010 when Scribe was being more actively promoted. Maybe they're using Flume instead now? We need a Facebook insider, anyone know one? -Ao Begin forwarded message: > From: tsuna <tsunanet(a)gmail.com> > Subject: Re: Scribe Packaging Effort > Date: July 26, 2012 12:45:30 AM EDT > To: Andrew Otto <otto(a)wikimedia.org> > > On Wed, Jul 25, 2012 at 9:05 AM, Andrew Otto <otto(a)wikimedia.org> wrote: >> Hi Benoît, > > Hi Andrew, > >> In the meantime, I have another question! I just ran into a problem that >> you say you fixed in this thread: >> https://groups.google.com/group/scribe-server/tree/browse_frm/month/2010-01… > > Are you referring to this? >> [Thu Nov 19 18:29:59 2009] "[hdfs] Connecting to HDFS" >> *** glibc detected *** ./scribed: munmap_chunk(): invalid pointer: 0x0000000001ea19c3 *** > >> However, the commit you link to 404s. I'm willing to rebuild scribe with >> whatever fix or release version is necessary. Can you point me in the right >> direction? What source should I use to build scribe to fix this bug? > > If you're referring to the bug above, it's a very old bug, it must be > fixed upstream already. I can't believe you're running into the same > bug almost 3 years later, it must be a different issue. > > Either way, I personally gave up on Scribe, I'd recommend that you > consider Flume as a better replacement, that is more supported and > developed. Scribe has never been really well written or maintained, > it's just one of the many hacks that Facebook released. > > Good luck. > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com

3 6

New WMF analytics blog post
by Sumana Harihareswara 25 Jul '12

25 Jul '12

A summary of what Analytics is doing and why they need you. https://blog.wikimedia.org/2012/07/25/meet-the-analytics-team/ Thanks, Diederik! -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

2 1

Fwd: [Wikitech-l] Git code review metrics
by Sumana Harihareswara 24 Jul '12

24 Jul '12

Belated cross-post for those who are interested in Git analytics. -Sumana -------- Original Message -------- Subject: [Wikitech-l] Git code review metrics Date: Fri, 20 Apr 2012 16:42:23 -0700 From: Erik Moeller <erik(a)wikimedia.org> Reply-To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Following up on the earlier thread by Rob [1], Rob and I kicked around the question what metrics/targets for code review we want to surface on an ongoing basis. We're not going to invest in a huge dashboard project right now, but we'll try to get at least some of the key metrics generated and visualized automatically. Help is appreciated, starting with what the metrics are that we should look at. Here's what we came up with, by priority: 1) Most important: Time series graph of # of open changesets Target: Numer of open changesets should not exceed 200. Optional breakdown: - mediawiki/core - mediawiki/extensions - WMF-deployed extensions - specific repos 2) Important: Aging trends. - Time series graph of # open changesets older than a, b, c days (to indicate troubling aging trends, e.g. a=3, b=5, c=7) - Target: There should be 0 changes that haven't been looked at all for more than 7 days. - Including only: Changes which have not received a -1 review, -1 verification, or -2 - Optional breakdown as above - Rationale: We're looking for tendencies of complete neglect of submissions here, which is why we have to exclude -1s or -2s. 3) Possibly useful: - Per-reviewer or reviewee(?) statistics regarding merge activity, number of -1s, neglected code, etc. Any obvious thinking errors in the above / do the targets make sense / should we look at other metrics or approaches? Erik [1] http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/059940.html -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

1 0

Re: [Analytics] Analytics Digest, Vol 5, Issue 6
by James Heilman 24 Jul '12

24 Jul '12

Hey Jan. Yes this tool looks like almost what I need. The issue is it only does 7 languages. We are currently translating content into nearly 40. Is it possible to expand it to all languages of Wikipedia? James Heilman On Tue, Jul 24, 2012 at 6:00 AM, <analytics-request(a)lists.wikimedia.org> wrote: > Send Analytics mailing list submissions to > analytics(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/analytics > or, via email, send a message with subject or body 'help' to > analytics-request(a)lists.wikimedia.org > > You can reach the person managing the list at > analytics-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Analytics digest..." > > > Today's Topics: > > 1. Re: Calculating page views for projects in other languages > (Jan Ainali) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 23 Jul 2012 15:02:53 +0200 > From: Jan Ainali <jan.ainali(a)wikimedia.se> > To: "A mailinglist for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics." > <analytics(a)lists.wikimedia.org> > Subject: Re: [Analytics] Calculating page views for projects in other > languages > Message-ID: > <CAKwu9WHOiC34E1uZwJiAEaCdHpyL_fQH4jnZozG9j7ayhxhMHQ(a)mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > I just wanted to let you know of a tool that Holger Motzkau > (User:Prolineserver) did. It does not really solve your problem though, but > it is close to. It lists the page views from a category on one Wikipedia > (and all the interwikilinks and the QRpedia statistics). It shouldn't be > too hard too feed it with a list of articles with a certain template I > guess. > > http://toolserver.org/~prolineserver/glamorous/glamorous_cats.php > > -- > Best, > Jan Ainali > Chairman, Wikimedia Sverige <http://se.wikimedia.org/wiki/Huvudsida> > > > 2012/7/23 Erik Zachte <ezachte(a)wikimedia.org> > >> James seeks a one page overview of most read articles for any wiki/project. >> >> A list of articles per project could be retrieved from the mediawiki API. >> I did something similar with list of articles per category (incl subcat, x >> levels deep). >> Perl script on request. >> >> Then the machine readable version of grok could be used to retrieve article >> counts. (see Dario's comment) >> However this might now scale well to 1(0),000's of projects and 100,000's >> pages. >> >> In the somewhat longer run I see two developments that might warrant >> putting >> this on hold: >> >> 1) >> The new analytics cluster will be used to aggregate page and image views. >> (Another use case would be aggregating image views per donating GLAM >> institute) >> Which aggregations precisely better be determined once the infrastructure >> is >> available, and capacity is known. >> >> 2) >> There are scripts to aggregate Domas' hourly page view feeds into monthly >> files. >> These aggregates are so much smaller, after cruft removal only 2Gb per >> month, without losing hourly resolution, easy to download and >> archive/process somewhere else. >> http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054644.html >> These need final work, I spoke to long time dev. wikimedian EMW at >> Wikimania >> and he might be interested to take this upon him, starting October. From >> these aggregates the 1(0),000 of projects overviews could be generated in a >> batch process, mind you after month completed. >> >> Erik Zachte >> >> >> -----Original Message----- >> From: analytics-bounces(a)lists.wikimedia.org >> [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dario >> Taraborelli >> Sent: Friday, July 20, 2012 6:56 PM >> To: A mailinglist for the Analytics Team at WMF and everybody who has an >> interest in Wikipedia and analytics. >> Subject: Re: [Analytics] Calculating page views for projects in other >> languages >> >> James, >> >> can you expand on this request? If you are interested in per-article >> pageview stats you can use: http://stats.grok.se/ >> >> For example: http://stats.grok.se/fr/201207/Paris >> A machine readable version: http://stats.grok.se/json/fr/201207/Paris >> >> Dario >> >> On Jul 20, 2012, at 9:38 AM, Sumana Harihareswara wrote: >> >> > James: Good places to add your requests to: >> > >> > https://lists.wikimedia.org/mailman/listinfo/toolserver-l >> > >> > https://www.mediawiki.org/wiki/Annoying_large_bugs >> > >> > >> > -- >> > Sumana Harihareswara >> > Engineering Community Manager >> > Wikimedia Foundation >> > >> > >> > >> > On 07/20/2012 12:38 PM, Diederik van Liere wrote: >> >> It is always tricky to convince someone to start working on a request >> for >> yourself. Given the fact that there is an existing code base I would say >> that your best bet is to study that and tweak it your own requirements. If >> you have specific technical questions then there are enough people within >> the different wikimedia communities that can help you. >> >> >> >> Good luck! >> >> >> >> Diederik >> >> >> >> Sent from my iPhone >> >> >> >> On 2012-07-20, at 11:47, James Heilman <jmh649(a)gmail.com> wrote: >> >> >> >>> This is something I was hoping to convince someone with programming >> >>> skills to take on. What prevents me from doing it is my complete >> >>> lack of programming skills thus the request here. >> >>> >> >>> -- >> >>> James Heilman >> >>> MD, CCFP-EM, Wikipedian >> >>> >> >>> The Wikipedia Open Textbook of Medicine >> >>> www.opentextbookofmedicine.com >> >>> >> >>> _______________________________________________ >> >>> Analytics mailing list >> >>> Analytics(a)lists.wikimedia.org >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> _______________________________________________ >> >> Analytics mailing list >> >> Analytics(a)lists.wikimedia.org >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> > >> > >> > _______________________________________________ >> > Analytics mailing list >> > Analytics(a)lists.wikimedia.org >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >

2 1

Re: [Analytics] Calculating page views for projects in other languages
by James Heilman 23 Jul '12

23 Jul '12

This is something I was hoping to convince someone with programming skills to take on. What prevents me from doing it is my complete lack of programming skills thus the request here. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com

6 5

Calculating page views for projects in other languages
by James Heilman 22 Jul '12

22 Jul '12

Yes I guess this could be done by hand using http://stats.grok.se/ It would be nice to have an automated tool so that changes in sums of page views over time for these 80 articles could be tracked in other languages similar to how we are taking them for English. Here is the page in English http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Translation_tas… Here is the toolserver program http://toolserver.org/~alexz/pop/view.php Here is the mostly inactive user who maintains it sometimes http://en.wikipedia.org/wiki/User_talk:Mr.Z-man -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com

1 0

Calculating page views for projects in other languages
by James Heilman 19 Jul '12

19 Jul '12

In English we have this great tool by Mr.Z for presenting pages views of articles within a Wikiproject here http://toolserver.org/~alexz/pop/view.php It also produces a sum of all page which I have been putting into this table here http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Popular_pages I am wondering if this tool can be expanded by someone so that it can do the same thing in other languages? Mr. Z is only slightly active now. I am wanting this to determine the impact for the translation project I am working on with Translators Without Borders described here http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Translation_tas… -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com

2 1

Fwd: SECURITY exposure issue - HADOOP
by Asher Feldman 03 Jul '12

03 Jul '12

FYI ---------- Forwarded message ---------- From: Douglas Moore <douglas.moore(a)thinkbiganalytics.com> Date: Tue, Jul 3, 2012 at 12:05 PM Subject: SECURITY exposure issue - HADOOP To: noc(a)wikimedia.org Hello, While searching for Hadoop related material on Google, I found an administrative page on your Hadoop server, and that it is exposed to the Internet and indexed by Google. Hadoop is not intended to run directly on the Internet so we believe this situation represents a potential security risk to your fine organization and think you should investigate further (and close public access to this research cluster). Here is one of the open URLs: http://analytics1001.wikimedia.org:50070 Please kindly acknowledge the receipt of this email. Thanks, -- Douglas Moore 781-454-5971 @Douglas_MA skype: dmoore247 Douglas.Moore(a)thinkbiganalytics.com http://www.thinkbiganalytics.com

2 1

Upcoming hackathon for experts AND newbies: Washington, DC, USA July 10-11
by Sumana Harihareswara 21 Jun '12

21 Jun '12

This is a reminder that you're invited to the pre-Wikimania hackathon, 10-11 July in Washington, DC, USA: https://wikimania2012.wikimedia.org/wiki/Hackathon In order to come, you have to register for the Wikimania conference: https://wikimania2012.wikimedia.org/wiki/Registration (Unfortunately, the period for requesting scholarships is now over.) At the hackathon, we'll have trainings and projects for novices, and we welcome creators of all Wikimedia technologies -- MediaWiki, gadgets, bots, mobile apps, you name it -- to hack on stuff together and teach each other. Hope to see you! -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

1 1

Our fundraising staff at TechWeek Chicago
by Sumana Harihareswara 21 Jun '12

21 Jun '12

This weekend, TechWeek Chicago starts: http://techweek.com/ The Foundation's Peter Gehres is copresenting the analytics presentation "How Wikipedia Doubled its Online Fundraising" this Saturday. If you're at TechWeek, he and other Wikimedians want to meet with you and talk shop! http://schedule.techweek.com/event/003fc017e0530c08eb34f08033c50f86 Saturday June 23, 2012 4:00pm - 4:45pm @ 1 - Main Stage (222 Merchandise Mart Plaza, Chicago, IL) "In 2010, online donations to Wikipedia more than doubled, from $7.5 million to $16 million and, in 2011, increased another 33%. Much of this increase was driven by user research conducted in Chicago. Design researcher Billy Belchev from Webitects will get into the nitty-gritty of form design and testing, user interviews. Do one-step forms work better than multi-step? Does PayPal help or hurt your numbers? What are the effect of “Jimmy” banners? The answers are based on data from the fifth most trafficked website in the world." -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics