We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Hello,
I work for a consulting firm called Strategy&. We have been engaged by Facebook on behalf of Internet.org to conduct a study on assessing the state of connectivity globally. One key area of focus is the availability of relevant online content. We are using a the availability of encyclopedic knowledge in one's primary language as a proxy for relevant content. We define this as 100K+ Wikipedia articles in one's primary language. We have a few questions related to this analysis prior to publishing it:
* We are currently using the article count by language based on Wikimedia's foundation public link: Source: http://meta.wikimedia.org/wiki/List_of_Wikipedias. Is this a reliable source for article count - does it include stubs?
* Is it possible to get historic data for article count. It would be great to monitor the evolution of the metric we have defined over time?
* What are the biggest drivers you've seen for step change in the number of articles (e.g., number of active admins, machine translation, etc.)
* We had to map Wikipedia language codes to ISO 639-3 language codes in Ethnologue (source we are using for primary language data). The 2 language code for a wikipedia language in the "List of Wikipedias" sometimes matches but not always the ISO 639-1 code. Is there an easy way to do the mapping?
Many Thanks,
Rawia
[Description: Strategy& Logo]
Formerly Booz & Company
Rawia Abdel Samad
Direct: +9611985655 | Mobile: +97455153807
Email: Rawia.AbdelSamad(a)strategyand.pwc.com<mailto:Rawia.AbdelSamad@strategyand.pwc.com>
www.strategyand.com
Hello,
I am new to this list. I am looking to rejuvenate a semi-active WikiProject
& am looking for a tool or tools that will list the frequency of individual
per-page views for a given category/WikiProject. The time period could be
preset to a period of time or specifiable --- my guess is that this may
depend upon the particular tool(s).
We wish to use this as one of the inputs to determining the importance of
an article to the WikiProject.
Please feel free to email me directly if you wish to avoid adding traffic
to the mail list.
Yours,
Peaceray <https://en.wikipedia.org/wiki/User:Peaceray>
Cascadia Wikimedians User Group <http://cascadia.wiki>
peaceray(a)cascadia.wiki (redirects to)
raymond.f.leonard.jr(a)gmail.com
Hi all,
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+tra…
- These reports are using outdated definitions for page views.
- The scripts haven't seen any maintenance for years.
Even with the new pageview API still in development more and more these reports are misreporting reality anyway.
There was a period were I felt imperfect reports were better than no reports at all, and I warned about unresolved bugs in the report header.
But the anomaly reported below served as a wake-up-call for me that mismatches are intolerably high anyway.
So I propose to put up a notice on the latest reports that those were the last release, and WMF is working to deliver a new infrastructure in the form of a pageview API, ETA later this year.
See also https://phabricator.wikimedia.org/T44259
Whether WMF will also assume responsibility for building new reports on top of that API (and if so in what form) is another matter, but first things first. Current focus is on providing that API, as it should be IMO.
Any thoughts?
Erik Zachte
From: Erik Zachte [mailto:erikzachte@infodisiac.com]
Sent: Friday, July 24, 2015 17:58
To: 'Андрей Лавров'
Subject: RE: Wikimedia Traffic Analysis Report - Operating Systems
Hey Andrey,
You're totally right of course. And not the only to notice. These traffic reports haven't seen much (maintenance) love lately. I'm tempted to disable them. I'm looking forward to the upcoming WMF pageview API as much more promising platform to build better reports: more up to date, more robust, more flexible. Of course there is always a hazard to stop maintaining a solution before a replacement is really there, but this is what actually happened long ago.
Thanks for heads-up.
Erik
From: Андрей Лавров [mailto:andrey.lavrov@wancastle.com]
Sent: Wednesday, July 22, 2015 11:09
To: erikzachte(a)infodisiac.com
Subject: Wikimedia Traffic Analysis Report - Operating Systems
Dear Erik,
Please, improve your analysis reports by including Chrome OS statistics.
Chrome OS has about 10% market share in US now. Almost all chromebooks are online every day. It is very strange to not see Chrome OS market share in your reports.
Best regards,
Andrey Lavrov
Over the past two months there seems to be a clear bug in Page Views for Wikisource, Normalized i'm active on the hebrew site but in general all the domain languages seem to be suffering from drastic reductions in pageviews. but atleast for the hebrew wikisource -- it was averaging around 5.5 million pageviews for a period of consecutive months.. now all of a sudden it's barely breaking 1 million -- very hard to believe. it seems to me there's a glitch. anyone possibly aware of the issue?
| |
| | | | | |
| Page Views for Wikisource, NormalizedJun 2015 ../day ../hour ../min ../sec 45.9 M 1.5 M /d 64 k /h 1.1 k /m 18 /s 12.1 M 403 k /d 17 k /h 280 /m 5 /s 4.8 M 161 k /d 6.7 k /h 112 /m 1.9 /s 2.5 M 84 k /d 3.5 k /h 58 /m |
| |
| View on stats.wikimedia.org | Preview by Yahoo |
| |
| |
Daniel Mokhtar
Hi,
Now that SULF is over, we have no need for the AccountAudit extension
anymore. If you have been relying on it for any statistics, please be
aware that a) it is going away soon and b) the data is generally
inaccurate for global users (which all users are now).
Please follow <https://phabricator.wikimedia.org/T105894> for updates.
Thanks,
-- Legoktm
Hi Franz,
I am CCing your email to the public Analytics list so that others may respond too. In short, those links don’t work because releasing the data was a mistake. You can see the Update from 2012/9/20 at the top of the post that explains why.
There isn’t an easy way to safely release user search queries without potentially compromising private user data. I don’t think we even collect this data at the moment. Since that blog post, our search architecture has changed, and the new one doesn’t have the ability to collect the queries easily. I believe https://phabricator.wikimedia.org/T103505 <https://phabricator.wikimedia.org/T103505> is a ticket to start doing this, but I don’t know if there is a real timeline to make this happen.
-Andrew
> On Jul 20, 2015, at 08:39, franz guenthner <fguenthner(a)gmail.com> wrote:
>
> Dear Andrew Otto,
>
> I have a question about the announcement concerning search logs on this page:http://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wiki… <http://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wikipedia…>. The links given there don't seem to work.
>
> Have you been able to make this service accessible ?
>
> I am working on a new search device for Wikipedia where as much of the searchable information (at the fact level) as possible would be presented in an autosuggest mode; it would be helpful to have an overview of the query complexity of queries to Wikipedia. If you know of any other sources for Wikipedia query logs any information would be highly apprecated.
>
> Thank you in advance for your hellp.
>
> Prof. Franz Guenthner
> University of Munich
>
>