Analytics February 2014

analytics@lists.wikimedia.org

36 participants
21 discussions

Measuring ulsfo's impact on site performance

by Faidon Liambotis

Hi folks, As you've probably heard, last week we deployed ulsfo in production, reducing latency for Oceania, East/Southeast Asia & US/Canada pacific/west coast states. My estimation of the user base affected by this is 360 million users (as in, Internet users, not Wikipedia users). I was wondering if you have an easy way to measure and plot the impact in page load time, perhaps using Navigation Timing data? The operations team has spent a considerable amount of time and money to deploy ulsfo and I believe it'd be useful for us and the organization at large to be able to quantify this effort. The exact dates of the rollout by country/region codes can be found in operations/dns' git history: https://git.wikimedia.org/summary/?r=operations/dns.git (the commits should be self-explanatory, but I'd be happy to clarify if needed) Thanks! Faidon

9 years, 10 months

Re: [Analytics] Visualizing Indic Wikipedia projects.

by sumandro

Erik, Thanks a lot for the appreciation. As Sajjad mentioned, we have already obtained a edit-per-location dataset from Evan (Rosen) that has the following column structure: *language,country,city,start,end,fraction,ts* *start* and *end* denote the beginning and ending date for counting the number of edits, and *ts* is time stamp. The *fraction*, however, gives a national ratio of edit activity, that is it gives the ratio of 'total edits from that city for that language Wikipedia project' divided 'total edits from that country for that language Wikipedia project'. Hence, it cannot be used to understand global edit contributions to a Wikipedia project (for a time period). It seems that the original data (from where this dataset is extracted) should also have the global fractions -- total edit from a city divided by total edit from the whole world, for a project, for a time period. Would you know if the global fractions can also be derived from the XML dumps? Or, even better, is the relevant raw data available in CSV form somewhere else? Bests, sumandro ------------- sumandro ajantriks.net On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org wrote: > Send Analytics mailing list submissions to > analytics(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/analytics > or, via email, send a message with subject or body 'help' to > analytics-request(a)lists.wikimedia.org > > You can reach the person managing the list at > analytics-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Analytics digest..." > > ---------------------------------------------------------------------- > > > Date: Tue, 14 May 2013 19:40:00 +0200 > From: "Erik Zachte" <ezachte(a)wikimedia.org> > To: "'A mailing list for the Analytics Team at WMF and everybody who > has an interest in Wikipedia and analytics.'" > <analytics(a)lists.wikimedia.org> > Subject: Re: [Analytics] Visualizing Indic Wikipedia projects. > Message-ID: <016f01ce50ca$0fe736b0$2fb5a410$(a)wikimedia.org> > Content-Type: text/plain; charset="iso-8859-1" > > Awesome work! I like the flexibility of the charts, easy to switch metrics > and presentation mode. > > > > 1. WMF has never captured ip->geo data on city level, but afaik this is > going to change with Kraken. > > > > 2. Total edits per article per year can be derived from the xml dumps. I may > have some csv data that come in handy. > > For edit wars you need track reverts on an per article basis, right? That > can also be derived from dumps. > > For long history you need full archive dumps and need to calc checksum per > revision text. (stub dumps have checksum but only for last year or two) > > > > Erik Zachte > > >

10 years, 1 month

the use of the templates: comparison between different wikipedias

by Yury Katkov

Hi everyone! Has anyone tried to observer how different wikipedias use the templates: how often, what's the average depth of template calls, etc? ----- Yury Katkov, WikiVote

10 years, 1 month

Phasing out stat1001's mobile_device_props-daily.tsv, and mobile_platform-daily.tsv

by Christian Aistleitner

Hi, with the Analytics Development team's decision to move from dclass to ua-parser, life support for http://stat1001.wikimedia.org/public-datasets/analytics/mobile/mobile_devic… http://stat1001.wikimedia.org/public-datasets/analytics/mobile/mobile_platf… would become a burden. As we additionally do not know of any user of those files up to now, we are considering to stop generating those two files [1]. If you are using any of the above two files, please let us know by 2014-03-05, so we can discuss how to move forward. Best regards, Christian [1] It's really only those two files listed above. We keep on generating all the other mobile files as for example the mobile-sampled-100 as usual for the time being. -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian(a)quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/ --------------------------------------------------------------- OpenPGP key transition from 0xEF78CCDE to 0x13C1072F: http://quelltextlich.at/openpgp-transition-0xEF78CCDE-to-0x13C1072F.txt

10 years, 1 month

If it did not happen on the list

by Christian Aistleitner

Dear fellow Analytics Developer team members, over the past few weeks, it seems we at least twice discussed that maybe we want to adopt “If it didn't happen on the list, it didn't happen”. It appeared to me that both times, we actually wanted to move forward with that ... but it never got posted to the mailing list :-D So I'm being bold: Either slap me with a large trout by 2014-02-28 or agree with me that the Analytics development team adopts “If it didn't happen on the list, it didn't happen.” (With “list” being analytics(a)lists.wikimedia.org for public things, and “list” being analytics-internal(a)lists.wikimedia.org for not so public things.) To state the obvious: meetings in real life, hangouts, or personal email do not qualify as “list” :-) Have fun, Christian /me hands a large trout to all of you to slap me with and ducks. -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian(a)quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/ --------------------------------------------------------------- OpenPGP key transition from 0xEF78CCDE to 0x13C1072F: http://quelltextlich.at/openpgp-transition-0xEF78CCDE-to-0x13C1072F.txt

10 years, 1 month

Wikimedia monthly research showcase: Feb 26, 11.30 PT

by Dario Taraborelli

Starting tomorrow (February 26), we will be broadcasting the monthly showcase of the Wikimedia Research and Data team. The showcase is an opportunity to present and discuss recent work researchers at the Foundation have been conducting. The showcase will start at 11.30 Pacific Time and we will post a link to the stream a few minutes before it starts. You can also join the conversation on the #wikimedia-office IRC channel on freenode (we’ll be sticking around after the end of the showcase to answer any question). This month, we’ll be talking about Wikipedia mobile readers and article creation trends: Oliver Keyes Mobile session times A prerequisite to many pieces of interesting reader research is being able to accurately identify the length of users' 'sessions'. I will explain one potential way of doing it, how I’ve applied it to mobile readers, and what research this opens up. (20 mins) https://meta.wikimedia.org/wiki/Research:Mobile_sessions Aaron Halfaker Wikipedia article creation research I'll present research examining trends in newcomer article creation across 10 languages with a focus on English and German Wikipedias. I'll show that, in wikis where anonymous users can create articles, their articles are less likely to be deleted than articles created by newly registered editors. I’ll also show the results of an in-depth analysis of Articles for Creation (AfC) which suggest that while AfC’s process seems to result in the publication of high quality articles, it also dramatically reduces the rate at which good new articles are published. (30 mins) https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation Looking forward to seeing you all tomorrow! Dario

10 years, 1 month

Re: [Analytics] [Labs-l] User registration date on DB replicas

by Felipe Ortega

Hello all. @Tim: By "feature" I mean having values for column user.user_registration filled for DB replicas accessible from Tool-Labs, if possible. As Oliver has suggested, I don't see any reason for this info not being available, as it is already public from Special:ListUsers. @Aaron: Thanks a lot. I belive that is a fairly decent approximation. In fact, I suspect that daily or weekly aggregates would be enough for time-series characterization. My actual goal is comparing trends between different languages, and eventually correlation with other known activity metrics. Best regards, Felipe. El Viernes 14 de febrero de 2014 16:00, Aaron Halfaker <aaron.halfaker(a)gmail.com> escribió: I have a dataset containing estimated registration dates for editors who registered before Dec. 2005. My method assumes that user_id is monotonically increasing and sets the lowest upper-bound available. > > >For example. Let's assume the following rows: > > > user_id first_edit > 12345 20040102030405 > 12344 NULL > 12343 20040102050102 > > >Since an editor couldn't have saved a revision before registering their account, we can assume that user 12345 registered there account on or before 20040102030405. If user_id is monotonically increasing, we also know that user 12344 must have registered on or before 20040102030405, which lets us fill in a NULL. Similarly, we have a first_edit timestamp for user 12343, but that edit happened pretty late. We can actually just continue to propagate the 20040102030405timestamp to this user too. > > >After performing this approximation, we'd have the following rows: > > > user_id first_edit user_registration_approx > 12345 20040102030405 20040102030405 > 12344 NULL 20040102030405 > 12343 20040102050102 20040102030405 > > >In effect, this is similar to the approximation discussed in https://bugzilla.wikimedia.org/show_bug.cgi?id=18638, but I'm not trying to interpolate probable registration timings on users. In practice we're talking about a difference of seconds, so I haven't bothered with the extra work. > > >I'm generating a datafile for English now that I should be able to share the the end of the day: > * user_id > * registration_type (see https://meta.wikimedia.org/wiki/Research:Attached_user and https://meta.wikimedia.org/wiki/Research:Newly_registered_user) > * user_registration (from user table) > * first_edit (lowest timestamp from "revision" and "archive" for user_id) > * registration_approx (my approximation based on the method described above) >-Aaron > > > >On Fri, Feb 14, 2014 at 6:06 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote: > >Felipe Ortega, 14/02/2014 12:05: >> >> >>Thanks a lot. Then, I look forward to the confirmation and >>>implementation of this feature. In case it's better to open a new issue >>>on bugzilla or any other action on my side (lend a hand with value >>>reviewing/testing) just let me know. >>> >> You could help assess the correctness of and/or code the guesstimate method proposed in https://bugzilla.wikimedia.org/show_bug.cgi?id=18638 , for the script to fill further blanks. >> >> >>Nemo >> >>_______________________________________________ >>Labs-l mailing list >>Labs-l(a)lists.wikimedia.org >>https://lists.wikimedia.org/mailman/listinfo/labs-l >> > > >

10 years, 1 month

Upcoming research newsletter: new papers open for review

by Dario Taraborelli

Hi everybody, with CSCW just concluded and conferences like CHI and WWW coming up we have a good set of papers to review for the February issue of the Research Newsletter [1] Please take a look at: https://etherpad.wikimedia.org/p/WRN201402 and add your name next to any paper you are interested in reviewing. As usual, short notes and one-paragraph reviews are most welcome. Instead of contacting past contributors only, this month we’re experimenting with a public call for reviews cross-posted to analytics-l and wiki-research-l. if you have any question about the format or process feel free to get in touch off-list. Dario Taraborelli and Tilman Bayer [1] http://meta.wikimedia.org/wiki/Research:Newsletter

10 years, 1 month

Fwd: [Wikitech-l] Exit stats?

by Jeremy Baron

---------- Forwarded message ---------- From: Strainu <strainu10(a)gmail.com> Date: Sun, Feb 23, 2014 at 6:11 PM Subject: [Wikitech-l] Exit stats? To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Hi, Does Wikipedia have any exit or click-through stats, like what links are the visitors following from an article? If yes, are those public? Thanks, Strainu

10 years, 2 months

Re: [Analytics] Exit stats?

by ENWP Pine

In addition to privacy issues, we want to give as little reason as possible for spammers to post links on Wikimedia sites and give them as little information as possible about how to design effective spam links. Providing data about the effectiveness of external links could benefit spammers. There may be a way to make exceptions for providing aggregated data for a whitelist of GLAM links. Pine > Date: Sun, 23 Feb 2014 20:57:17 +0000 > From: Magnus Manske <magnusmanske(a)googlemail.com> > To: "A mailing list for the Analytics Team at WMF and everybody who > has an interest in Wikipedia and analytics." > <analytics(a)lists.wikimedia.org> > Cc: Strainu <strainu10(a)gmail.com> > Subject: Re: [Analytics] [Wikitech-l] Exit stats? > Message-ID: > <CAGHUEtbjMyaT_c-FCxyOZsF9Bh0r8r7Mb3PwFmfO2T1NorhySg(a)mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > FWIW, I once had some JavaScript added to the common JS script on Commons, > collecting anonymized clicks to external links on image pages, in an > attempt to give some stats to the GLAM community. That would fire off basic > data like time, page, and target URL to a toolserver logging script. Volume > was low enough, but I had to turn it off due to privacy paranoia soon after. > > Cheers, > Magnus >

10 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics February 2014