Analytics October 2016

analytics@lists.wikimedia.org

28 participants
15 discussions

Re: [Analytics] Questions on the PageView Api

by Nuria Ruiz

Birgit: I am cc-ing analytics@ our public e-mail list where questions like this one get asked (and archived). >As far as I know, page views are counted every time someone views a page, and if someone reloads a page several time, each time would be counted as 1 >page view. Is that correct?> >I wonder if you could explain the reasoning behind that a bit Yes, we do not count "distict" pageviews by a user, but pageviews overall. So if look at barak obama pages 10 times one day those are 10 pageviews. >If we would want to be able to know if a page got just reloaded by the same person, we would need to have a unique ID for each user, and this would have to >be browser connected. I'm guessing that this would be a strong privacy issue. Correct. Event with unique IDs we could never identify users, just devices. As uniqueIDs are really cookies on your browser. Thanks, nuria On Thu, Oct 13, 2016 at 6:48 AM, Birgit Müller <birgit.mueller(a)wikimedia.de> wrote: > Hi Nuria, > > we shortly met when you had your team offsite in Berlin :-) > > We're currently running a feedback loop on the PageView Analysis tool by > MusicAnimal/Community Tech, and in that context a question appeared: > > As far as I know, page views are counted every time someone views a page, > and if someone reloads a page several time, each time would be counted as 1 > page view. Is that correct? > > I wonder if you could explain the reasoning behind that a bit - > > my first guess was: > > If we would want to be able to know if a page got just reloaded by the > same person, we would need to have a unique ID for each user, and this > would have to be browser connected. I'm guessing that this would be a > strong privacy issue. > > From the technical perspective, I could imagine that this would also be > very difficult. > > Am I right, or what are the exact reasons behind that? > > Looking forward to your answer, and I hope you're doing well! > > Best, > Birgit > > -- > Birgit Müller > Community Communications Manager > Software Development and Engineering > > > > > Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin > Tel. (030) 219 158 26-0 > http://wikimedia.de > > Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen > Wissens frei teilhaben kann. Helfen Sie uns dabei! > http://spenden.wikimedia.de/ > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für > Körperschaften I Berlin, Steuernummer 27/681/51985. >

7 years, 6 months

Re: [Analytics] https://stats.wikimedia.org/wikimedia/squids/SquidReportOperatingSystems.htm

by Erik Zachte

Dear mr Haar, Thanks for bringing this to our attention. The report you refer to has been discontinued since August 2015. (see page notice) https://stats.wikimedia.org/wikimedia/squids/SquidReportOperatingSystems.htm The successor based on new definitions, and methodology is at https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os I forward you message to the WMF Analytics Team who maintain these stats. Best regards, Erik Zachte From: Haar, Dirk [mailto:Dirk.Haar@partner.commerzbank.com] Sent: Wednesday, October 12, 2016 11:22 To: 'ezachte(a)wikimedia.org' Subject: https://stats.wikimedia.org/wikimedia/squids/SquidReportOperatingSystems.htm Hi Erik! Would you mind to make a change on the OS report page? I always wondered about values for Linux Mint in comparision to Ubuntu there, and now found the reason in Clem's blog entry here <http://segfault.linuxmint.com/2016/09/addressing-fud/> (see "Wikimedia stats"). What you show as Linux Mint are only those version up to Mint 10. Current version is 18, and since versio 11, the user agent you (or better let's say "Wikimedia stats") evaluate is shown as "Ubuntu". There should at least be a note at this line, other distros may be concerned, too. (Btw., that remembers older browser usage statistics, when websites couldn't deal with Netscape Communicator so that you'd had to switch the user agent to "Internet Explorer", shifting the usage values complety.) Best regards, Dirk Haar

7 years, 6 months

Identifying bots and bot edit decline

by Flöck, Fabian

Hi all , two questions, maybe someone can help: 1. I was trying to compile a complete list of all bots that were ever (potentially) active on the English Wikipedia so that one can identify bot accounts in the dumps. Below are all the lists (including historic bots) that I could find [1]. Out of those overlapping lists, I extracted 2795 unique bot names (some seem to be just names for bot approval request pages). Going through the historic edit data (no current redirects), 1377 user names were actually in that list. Does anyone know if that should cover (almost) all ever active bots, or is there even a better list/method? I would like to avoid using unreliable regular expressions. (Similar question for other language editions) 2. I counted bot edits per half year in en.wikipedia and saw a major decrease between in the first half of 2013 from ~ 3 M to ~1M edits per half year between January and July 2013, which seems to be in line with official stats [2]. This is likely not news, so can someone enlighten me regarding what brought about that sharp decline of bot edits? Cheers, Fabian [1] https://en.wikipedia.org/wiki/Wikipedia:List_of_bots_by_number_of_edits https://en.wikipedia.org/wiki/Wikipedia:Bots/Status/inactive_bots_1 https://en.wikipedia.org/wiki/Wikipedia:Bots/Status/inactive_bots_2 https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_ed… https://en.wikipedia.org/w/api.php?action=query&list=allusers&augroup=bot https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitl… https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Approved (+ contents of all archive pages) https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#bots [2] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editor_activity_levels — Dr. Fabian Flöck Researcher Computational Social Science department GESIS - Leibniz Institute for the Social Sciences Unter Sachsenhausen 6-8, 50667 Cologne, Germany Tel: + 49 (0) 221-47694-208 fabian.floeck(a)gesis.org www.gesis.org www.facebook.com/gesis.org

7 years, 6 months

Re: [Analytics] [Wikidata] SPARQL power users and developers

by Yuri Astrakhan

I would highly recommend using X-Analytics header for this, and establishing a "well known" key name(s). X-Analytics gets parsed into key-value pairs (object field) by our varnish/hadoop infrastructure, whereas the user agent is basically a semi-free form text string. Also, user agent cannot be set for by any javascript client, so we will constantly have to perform two types of analysis - those that came from the "backend" and those that were made by the browser. On Sun, Oct 2, 2016 at 4:28 PM Stas Malyshev <smalyshev(a)wikimedia.org> wrote: > Hi! > > > I'll try to throw in a #TOOL: comment where I can remember using SPARQL, > > but I'll be bound to forget a few... > > Thanks, though using distinct User-Agent may be easier for analysis, > since those are stored as separate fields, and doing operations on > separate field would be much easier than extracting comments from query > field e.g. when doing Hive data processing. > > -- > Stas Malyshev > smalyshev(a)wikimedia.org > > _______________________________________________ > Wikidata mailing list > Wikidata(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >

7 years, 6 months

Seeking feedback (+ answer to 1 question) on a timeline of Wikipedia analytics

by Vipul Naik

Dear Analytics mailing list, I am working, along with Issa Rice (cc'ed) on an analysis of changes to Wikipedia pageviews since December 2007, when pageview statistics first started being maintained. To help with our analysis, we collected key events related to changes to user experience on the site as well as to statistics availabilty and measurement. We've recorded our findings on this page in Issa's userspace: https://en.wikipedia.org/wiki/User:Riceissa/Timeline_of_Wikipedia_analytics I'd appreciate it if you can highlight: (a) Factual errors in the material currently in the timeline (b) Missing events that you think should belong in the timeline, with regards to the availability of statistics as well as any other events that affected user experience significantly. In addition, I had the following question: In the Wikimedia per-article pageviews API https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_ metrics_pageviews_per_article_project_access_agent_article_g ranularity_start_end, where do Wikipedia Zero pageviews get recorded? Do they go under mobile-web, or mobile-app, or neither? https://wikitech. wikimedia.org/wiki/Analytics/Data/Pageviews <https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview> gives some information on how the underlying pageviews are recorded in the pageviews dataset, but I wasn't clear on how the pageview REST API processes that data. Thank you! Vipul NOTE: We don't intend to move the page to Wikipedia's main space, as we know it won't meet the notability criterion. The user space was just a convenient place to store it while taking advantage of MediaWiki's syntax and Wikipedia's templates.

7 years, 6 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics October 2016