Analytics November 2016

analytics@lists.wikimedia.org

36 participants
27 discussions

Re: [Analytics] Unique Devices / Pageviews across all properties in the Wiki universe?
by Dan Andreescu 17 Nov '16

17 Nov '16

Hi Melody, I'm cc-ing our public list which is the best place to ask questions like this. So, the unique devices and pageviews numbers on the vital signs dashboard are fetched from our public APIs. They don't allow bulk download. To get bulk numbers, you can: * download everything from our public dumps [1] where you'll find pageviews by article or by project [2] and unique device numbers by project [3] * ask one of our analysts to crunch numbers (in this case, the reading team would be the relevant one to ask) * use our internal cluster to crunch numbers yourself (I can help show you around) One important thing to keep in mind: You can aggregate pageviews by language or project, but you can't do that for unique devices. Because the same devices might be used to visit many sites and there's no way to deduplicate that. We're working on counting global unique devices so we have those numbers as well, though Tilman from reading has some interesting work on that too. [1] https://dumps.wikimedia.org/other/analytics/ [2] https://dumps.wikimedia.org/other/pageviews/ [3] https://dumps.wikimedia.org/other/unique_devices/ On Thu, Nov 17, 2016 at 11:44 AM, Melody Kramer <mkramer(a)wikimedia.org> wrote: > Hey Dan and Mikhail, > > I'm working on a map of the Wikimedia universe that will show the relative > size of entities under the Wikimedia umbrella (Wikipedia, Wikibooks, > Wikinews, etc.) grouped by language, articles contributed and then > pageviews and/or unique devices. > > On this site: https://analytics.wikimedia.org/dashboards/ > vital-signs/#projects=eswiki,itwiki,enwiki,jawiki,dewiki, > ruwiki,frwiki,enwikibooks,enwikinews,wikidatawiki,commonswiki/metrics= > UniqueDevices I'm able to manually enter each language/wikiproject to see > them all on the graph. > > Is there a way to acquire everything at once, and download it into a csv? > Or say "Show all?" > > I'm happy to say more! Thanks so much for your help/expertise in this area > in advance (and if there's someone else I should reach out to, please let > me know who that might be!) > > Mel > > > -- > Melody Kramer > Read a random featured article from Wikipedia! > <https://en.wikipedia.org/wiki/Special:RandomInCategory/Featured_articles> > > mkramer(a)wikimedia.org > >

3 2

Making Charts More Interactive
by Jan Dittrich 17 Nov '16

17 Nov '16

Hi Dehaya, > If we were to make the legend Interactive and the world map dynamic, we can > improve legibility. > We should making all the values (1 GJ, 10GJ etc) in the legend as clickable > buttons. > On clicking say 10kJ the World Map should show Boloid Events of 10GJ > magnitude and remove the rest. This will make it easier to answer my > earlier question. > As far as I am concerned, the chart extension is build on vega, which does support interactive behavior: https://github.com/vega/vega/wiki/Signals However, such is hard(er) to define than the mapping to graphics, in particular if you want the kind of cross filtering style behavior you refer to. Cheers, Jan -- Jan Dittrich UX Design/ User Research Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 http://wikimedia.de Imagine a world, in which every single human being can freely share in the sum of all knowledge. That‘s our commitment. Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

Upcoming Research Showcase, November 16, 2016
by Leila Zia 16 Nov '16

16 Nov '16

[Apologies for cross-posting] Hi everyone, Almost a year ago, we [1] embarked on a research project to understand who Wikipedia readers are. More specifically, we set a goal for finding a taxonomy of Wikipedia readers. In the upcoming Research Showcase, I will present the findings of this research. *Logistics* The Research Showcase will be live-streamed on Wednesday, November 16, 2016 at 11:35 (PST) 19:35 (UTC). YouTube stream: https://www.youtube.com/watch?v=O24F1xkbNwI As usual, you can join the conversation on IRC freedone at #wikimedia-research. And, you can watch our past research showcases at https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase. *Title* Why We Read Wikipedia *Abstract* Every day, millions of readers come to Wikipedia to satisfy a broad range of information needs, however, little is known about what these needs are. In this presentation, I share the result of a research that sets to help us understand Wikipedia readers better. Based on an initial user study on English, Persian, and Spanish Wikipedia, we build a taxonomy of Wikipedia use-cases along several dimensions, capturing users’ motivations to visit Wikipedia, the depth of knowledge they are seeking, and their knowledge of the topic of interest prior to visiting Wikipedia. Then, we quantify the prevalence of these use-cases via a large-scale user survey conducted on English Wikipedia. Our analyses highlight the variety of factors driving users to Wikipedia, such as current events, media coverage of a topic, personal curiosity, work or school assignments, or boredom. Finally, we match survey responses to the respondents’ digital traces in Wikipedia’s server logs, enabling the discovery of behavioral patterns associated with specific use-cases. Our findings advance our understanding of reader motivations and behavior on Wikipedia and have potential implications for developers aiming to improve Wikipedia’s user experience, editors striving to cater to (a subset of) their readers’ needs, third-party services (such as search engines) providing access to Wikipedia content, and researchers aiming to build tools such as article recommendation engines. *How to prepare? What to expect?* If you decide to attend, here are a few things I would like to ask you to keep in mind, especially if this will be your first time to one of our research showcases: * Like many other research projects in fields that are not heavily explored, the findings of this research will create more questions than they answer. I encourage you to keep these questions in mind throughout the presentation and discussion: "What can we do with this finding? What other questions can we ask? What other ideas can we try?" * Be open to ask these questions to yourself, especially if you are a Wikipedia editor, even before coming to the showcase: "Why do I edit Wikipedia? Who am I writing the content for, if anyone? Will I change the way I write content if I know more about who reads it (to encourage or discourage certain types of reading or readers)? What needs an encyclopedia should serve? What is Wikipedia: A place one can quickly find the answer to his/her questions, or a place that one can go to when he/she wants to spend a quiet time reading and learning, or a place for both and even more? etc." * And, see if you would be interested to see the result of this study in your language. What will be presented is based on research on English, Persian, and Spanish Wikipedia (the data from the latter two projects have been used only for one part of the research). We are interested in running the study on at least 2-3 more languages to understand the robustness of some of the results across different languages, and to also help communities with having access to the results for their specific language project. Looking forward to seeing you there, and if you can't make it, please feel free to watch the video later and get in touch with us with questions/comments. :) Best, Leila -- Leila Zia Senior Research Scientist Wikimedia Foundation [1] WMF Research and researchers from three academic institutions: EPFL, GESIS, and Stanford University, in collaboration with WMF Reading.

2 2

Making Charts More Interactive
by dhayakar marur 16 Nov '16

16 Nov '16

Dear Analytics team, The general legibility of Charts in wikipedia are relatively poor. We can improve it with making them more interactive and dynamic. Please refer to the Chart in the attachment (Boloid Events.jpg). The chart represents the distribution of Bolide events from 1994-2013 on the world map. The legend describe the magnitude of each event in Joules. >From the chart can you count the number of 10GJ Bolide events in Africa? You can count, but we take an awfully long time to find the answer. If we were to make the legend Interactive and the world map dynamic, we can improve legibility. We should making all the values (1 GJ, 10GJ etc) in the legend as clickable buttons. On clicking say 10kJ the World Map should show Boloid Events of 10GJ magnitude and remove the rest. This will make it easier to answer my earlier question. Regards Dhaya

4 3

pageviews data
by Alexander Ugarov 16 Nov '16

16 Nov '16

Hi! I'm a Ph.D. student in economics, using some of the Wikimedia data in my research. My question is whether it's possible to get the data on Wikipedia pageviews by country and article category? Currently the Wikimedia Foundation provides the aggregate data on pageviews by country and the less aggregate data on pageviews by article, but it looks that there is no way to find out, for example, the pageviews of math articles in India. More specifically, my questions are: 1) If is it possible in some way to extract the information on pageviews by country and subject area from your publicly available data? The amount of data currently available is already vast, and I could miss it. 2) If it is not possible, then how can I persuade you into making this data available? I'm going to argue that the data can be made available without losing confidentiality by using either first IP numbers or by publishing only the country of the user, as well as aggregating by the category. I'm looking forward to hear from you. I'm sure that many social scientists will be also glad to use the opportunity to produce more interesting and policy-relevant research. Best regards, Alexander Ugarov, Ph.D. Candidate Sam M. Walton College of Business Department of Economics University of Arkansas

3 2

Pageviews for 11/10 and 11/11
by Marcus Schorow 16 Nov '16

16 Nov '16

Hi, I can't seem to access page view counts for anything after 11/10 19:00:00 Is https://dumps.wikimedia.org/other/pageviews/2016/2016-11/ the right URL? Do you know when these files will be added? Thanks,Marcus

2 1

ensuring reader anonymity
by James Salsman 15 Nov '16

15 Nov '16

Are there any reasons to not replace HTTP GET request IP addresses and proxy information with their SHA-512 secure hash prior to writing them to permanent media?

9 24

Statsv
by Gilles Dubuc 14 Nov '16

14 Nov '16

Hi, With Ori not responsible for statsv maintenance in an official capacity, should the Analytics team handle statsv maintenance going forward? Ori has tried to leave it in a state that doesn't need much maintenance (read: restarts in case of issues) and is still trying to make it do so <https://gerrit.wikimedia.org/r/#/c/321230/2>. Which means that it shouldn't require that much actual work other than keeping an eye on it and kicking it if the things Ori has been trying to put in place don't work. Given that we're not the only ones using statsv and considering its function, analytics seems like the home it should get. What do you think?

5 4

9 am UTC maintenance for dataset1001 (dumps.wikimedia.org)
by Ariel Glenn WMF 14 Nov '16

14 Nov '16

On Tuesday Nov 13, at 9 am UTC, the web server for the dumps and other datasets will be unavailable due to maintenance. This should take no longer than 10 minutes. Thanks for your understanding. Ariel

1 1

Wikimedia datasets collection on the Internet Archive has surpassed 1 million items
by Hydriz Scholz 14 Nov '16

14 Nov '16

Dear all, The Wikimedia Foundation datasets collection on the Internet Archive [1] has now surpassed 1 million items (and about 50,000 full database dumps)! This marks a major milestone in our archiving efforts of Wikimedia's vast amount of data and ensures that the vital content submitted by volunteers across the moment is preserved. All these would not have been possible without the help of many people, including Nemo, Ariel and Emijrp (thanks!). We started archiving towards the end of 2011 and reached a milestone of half a million items back in June 2015. [2] We have since moved on from archiving just the main database dumps to saving research-worthy data such as the pageviews data and even attempting to keep a copy of Wikimedia Commons. Today, we are working on making the items on the Internet Archive more accessible for researchers by working on an interface for searching old dumps. Despite this feat, we are in constant need of more help. If you are a researcher, a programmer or someone with a computer, we need your help in many tasks! Have a look at WikiTeam's project [3] or Emijrp's Wikipedia Archive page [4] for more information. If you regularly work on the Wikimedia database dumps, please provide your input in the Dumps-Rewrite project [5] and the API interface [6]. As before, here's to the next million! [1]: https://archive.org/details/wikimediadownloads [2]: https://groups.google.com/forum/#!msg/wikiteam-discuss/Vj3oonpYphg/h9HE6r3v… [3]: https://github.com/WikiTeam/wikiteam [4]: https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive [5]: https://phabricator.wikimedia.org/tag/dumps-rewrite/ [6]: https://phabricator.wikimedia.org/T147177 -- Hydriz Scholz

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics November 2016