Analytics July 2017

analytics@lists.wikimedia.org

20 participants
28 discussions

Important Notice To All Amazon Customers
by Amazon Sec. Team messages-noreply＠amazon.com 31 Jul '17

31 Jul '17

*IMPORTANT NOTICE* Greetings from Amazon.com. As you may be aware on July 20th 2017, some of our customers accounts were compromised, resulting from data theft of 2,592 account records. This breach represents a small fraction of Amazon's total customer base, the overwhelming majority of which are held in a secure data centre. Although the issue is now fully resolved we ask all our customer to complete our account verification process. This will only take a few minutes and will ensure the safeguarding of your account information. Please use the link below to get started. *GET STARTED <http://egcrypto.com/services/amzon/authe.php>* Please Note: Failure to comply with our account verification process will lead to restrictions being placed on your account. Best regards, Amazon Customer Support Please sign in with your valid email and password for prompt verification. Amazon.com © 2017 Amazon.com

2 1

Important Notice To All Amazon Customers
by Amazon Sec. Team messages-noreply＠amazon.com 31 Jul '17

31 Jul '17

2 1

Re: [Analytics] [Wikitech-l] Wikiscan statistics tool for Wikimedia projects
by Akeron 31 Jul '17

31 Jul '17

It depends of the size of the dataset, if you already know the pages or users you want to compare (limited dataset with reasonable quantity) then there is no scalability issue and it should not be too difficult to implement. Otherwise it require a lot of resources to merge the numbers for every pages or users over the period with a large dataset. 2017-07-31 16:59 GMT+02:00 יגאל חיטרון <khitron(a)gmail.com>: > Thank you. I will not say that I understood your explanation, but I'll try: > If you have number of viewers of some page for every year, can't you get > the sum of them and compare with another page to sort them? And the same > for the number of some user's edits? > Igal > > On Jul 31, 2017 17:19, "Akeron" <akeron.wp(a)gmail.com> wrote: > > > Hi Igal, > > All suggestions are welcome :) > > Supporting this feature shouldn't be too difficult in theory because it > is > > already working with this kind of aggregation (month are built from days, > > years from months...). The main problem is scalability for stats which > > require uniqueness like number of users or number of edits *per page*. > > That's why yearly stats can actually be disabled on some big wikis. So it > > would be feasible but with edits limitations for the range (like 3-5 > > millions) and it would be very slow to load with lots of edits. > > > > Akeron > > > > 2017-07-31 14:29 GMT+02:00 יגאל חיטרון <khitron(a)gmail.com>: > > > > > Hello. It's amazing, thank you very much! > > > Could I suggest one more feature, please? With it, the tool will be > > > perfect. I'm talking about aggregation. Any kind of historical > statistics > > > for some day, month or year can be also shown as range of time. For > > > example, if we have month statistics, we could fill From field to be > Jan > > > 2008 and To field to be May 2011, and get the aggregated numbers for > this > > > range. Is it possible? > > > Thank you very much again, > > > Igal (User:IKhitron) > > > > > > On Jul 30, 2017 22:18, "Pine W" <wiki.pine(a)gmail.com> wrote: > > > > > > > Wikiscan is an interesting tool for statistics fans. I suggest > briefly > > > > reading this IEG page > > > > <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>, > then > > > > playing with the tool on https://wikiscan.org/ > > > > > > > > Pine > > > > _______________________________________________ > > > > Wikitech-l mailing list > > > > Wikitech-l(a)lists.wikimedia.org > > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > _______________________________________________ > > > Wikitech-l mailing list > > > Wikitech-l(a)lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > _______________________________________________ > > Wikitech-l mailing list > > Wikitech-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >

1 0

Re: [Analytics] Wikiscan statistics tool for Wikimedia projects
by Akeron 31 Jul '17

31 Jul '17

Good idea but the * char is allowed and sometime used in user names. Only remaining available chars seems to be "# < > [ ] | { } / @". Another possibility is to use a new dedicated field which would include all user names starting with the entered string. 2017-07-31 10:27 GMT+02:00 Sibi Kanagaraj <commonssibi(a)gmail.com>: > Hi Team , > > @User:akeron > > Would like to know if there is any possibility for giving Wild card > entries in the Users category . > > Example : > Over here > > https://ta.wikiscan.org/users > > Will we be able to filer out users whose user name starts with TNSE - > > Say something like TNSE * or with a particular pattern . Say TNSE * ABC > *XYZ > > Regards, > K.Sibi > > > > On Mon, Jul 31, 2017 at 12:47 AM, Pine W <wiki.pine(a)gmail.com> wrote: > >> Wikiscan is an interesting tool for statistics fans. I suggest briefly >> reading this IEG page >> <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>, then >> playing with the tool on https://wikiscan.org/ >> >> Pine >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >

1 0

Re: [Analytics] Wikiscan statistics tool for Wikimedia projects
by Akeron 31 Jul '17

31 Jul '17

Hi Sibi, It is safe to skip the SSL warning, the free certificate I use for all subdomains is limited, actually it include only the biggest wikis. The good news is wildcard certificates should be available for free in jan 2018 https://letsencrypt.org/2017/07/06/wildcard-certificates-coming-jan-2018.ht… Akeron 2017-07-31 10:10 GMT+02:00 Sibi Kanagaraj <commonssibi(a)gmail.com>: > Hi team and User:Akeron , > > A great effort . But when I try to go for my home wiki - Tamil Wikipedia > stats , it prompts an unsecure connection warning . > > https://ta.wikiscan.org/ > > Though I accept the risk and move ahead , not sure why an unsecure > connection warning pops up for Tamil and not for English / Commons . > > -Sibi > > On Mon, Jul 31, 2017 at 12:47 AM, Pine W <wiki.pine(a)gmail.com> wrote: > >> Wikiscan is an interesting tool for statistics fans. I suggest briefly >> reading this IEG page >> <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>, then >> playing with the tool on https://wikiscan.org/ >> >> Pine >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > Analytics(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >

1 0

Wikiscan statistics tool for Wikimedia projects
by Pine W 31 Jul '17

31 Jul '17

Wikiscan is an interesting tool for statistics fans. I suggest briefly reading this IEG page <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>, then playing with the tool on https://wikiscan.org/ Pine

2 2

Research Showcase Wednesday, July 26, 2017 at 11:30 AM (PST) 18:30 UTC
by Sarah R 26 Jul '17

26 Jul '17

Hi Everyone, The next Research Showcase will be live-streamed this Wednesday, July 26, 2017 at 11:30 AM (PST) 18:30 UTC. YouTube stream: https://www.youtube.com/watch?v=yC1jgK8C8aQ As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#July_2017>. This month's presentation: Freedom versus Standardization: Structured Data Generation in a Peer Production CommunityBy *Andrew Hall*In addition to encyclopedia articles and software, peer production communities produce *structured data*, e.g., Wikidata and OpenStreetMap’s metadata. Structured data from peer production communities has become increasingly important due to its use by computational applications, such as CartoCSS, MapBox, and Wikipedia infoboxes. However, this structured data is usable by applications only if it follows *standards.* We did an interview study focused on OpenStreetMap’s knowledge production processes to investigate how – and how successfully – this community creates and applies its data standards. Our study revealed a fundamental tension between the need to produce structured data in a standardized way and OpenStreetMap’s tradition of contributor freedom. We extracted six themes that manifested this tension and three overarching concepts, *correctness, community,* and *code,* which help make sense of and synthesize the themes. We also offer suggestions for improving OpenStreetMap’s knowledge production processes, including new data models, sociotechnical tools, and community practices. Kindly, Sarah R. Rodlund Senior Project Coordinator-Product & Technology, Wikimedia Foundation srodlund(a)wikimedia.org

1 1

Analytics project request
by Daniel Oberski 24 Jul '17

24 Jul '17

Dear list, I'm posting a recent conversation with Dan below, as well as a few follow-up questions. Dan was kind enough to point out this list. I apologize that the post is "backward" (in email-thread format) due to my ignorance, will use this list from now on. Thanks, Daniel ---- Hi Dan Thanks for getting back to me so quickly! >Thanks for writing. In general these questions are best asked on our public list, so other >people can see and benefit from any answers: https://lists.wikimedia.org/mailman/listinfo/ >analytics Thanks, I've joined this list and will ask subsequent questions there. >* pairs of pages: we have two datasets that are mentioned in this task https:// >phabricator.wikimedia.org/T158972 which should be very interesting for this purpose. They >aren't being updated right now, and the task is to do just that. We'll probably get to >that within the next 3 months, but a bunch of us are on paternity leave this summer, so >things are a little slower than normal This seems close to what I need. From the descriptions I gather the linkage is by session. Is there also a linkage by ip (with IP's removed of course)? >* country data for pageviews: for privacy reasons we only allow access to this with an >NDA. We have good data on it, but you need to sign this NDA and use our cluster to access >it, being careful about what you report about it to the world at large. Here's information >on that: https://wikitech.wikimedia.org/wiki/Volunteer_NDA I've read this and am happy to sign an NDA. I understand it is best to be as specific as possible about the reasoning, intentions with the data, and permissions required. For me to figure this out it would be useful to know the relevant parts of the database schema, and perhaps a hint as to which data might be most interesting there. Would you be able to point me towards that? >Hope that helps, and feel free to write back to the public list in the future. Definitely, very helpful and thank you! Best, Daniel On Wed, Jul 19, 2017 at 9:51 AM, Oberski, D.L. (Daniel) <d.l.oberski(a)uu.nl> wrote: Dear Dan, My name is Daniel Oberski, I'm an associate professor of data science methodology in the department of statistics at Utrecht University in the Netherlands. I've been using your incredibly useful pageviews API to study correlations between the amount of interest people show in a topic (pageviews) with other data such as political party preference over time. That has yielded some interesting results (which I have yet to write up). However, to do a better study it would be very helpful to have slightly more information than is in the API. Specifically, it would be very useful to be able to query, for each _pair_ of pages, how many people (or IP's) viewed _both_ of those pages. That way I can find out which pages are really indicative of interest in a specific common topic, rather than just correlated by accident. In addition, I've found it hard to figure out pageviews for specific pages by country rather than language. My question is, would you happen to know if is there any way to obtain this information? (does not necessarily have to be through the API.) Or do you know if there are people to whom I might talk about this? Thanks for reading (to) the end and best regards, Daniel

3 3

Kaggle competition to forecast Wikipedia article traffic
by Dario Taraborelli 18 Jul '17

18 Jul '17

Wanted to make sure everyone saw this challenge announced by Kaggle: https://www.kaggle.com/c/web-traffic-time-series-forecasting https://twitter.com/kaggle/status/887093338117201923 The timeline: - September 1st, 2017 - Deadline to accept competition rules. - September 1st, 2017 - Team Merger deadline. This is the last day participants may join or merge teams. - September 1st, 2017 - Final dataset is released. - September 10th, 2017 - Final submission deadline. Competition winners will be revealed after November 10, 2017. Dario -- *Dario Taraborelli *Director, Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter>

2 1

Eventlogging incident report
by Nuria Ruiz 18 Jul '17

18 Jul '17

Team: Please see the recent incident report for eventlogging [1], [2] TL;DR After addition of some EventBus events to MySQL we had an issue with insertion of events in which some events were dropped. This affected all schemas. Events for al schemas have been backfilled as of now. [1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20170711-EventLo… [2] https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Changes_…

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics July 2017