We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Dear Analytics team,
I’m Giacomo Marangoni and I’m working on Pagecounts-raw data. During a little time spent in analysis, I’ve pointed out a little strange behaviour that could affect your data.
Sometimes I found record like this “it.n Addio_al_regista_Sydney_Pollack 1 0” and I can’t explain myself how a page could be visited one time and turn back a response of 0 byte.
Can I take it as a default value when a record is acquired in the wrong way?
Best regards
Giacomo Marangoni
Antony–22 raised a question about accounting for "new articles" that are
moved from other namespaces to article space. For the purposes of counting
total articles, I'm guessing that these are properly accounted for as
deltas to the total, even if they're not considered new articles for the
purpose of NPP under Special:NewPages. Is that correct?
Thanks,
Pine
Hello!
The analytics team is planning to give a presentation about the Pageview
API we are working on on the developer summit (we are hoping to announce
the API pretty soon)
Please feel free to add to the ticket use cases you would like to talk
about regarding pageView API or any discussion points you would like to
bring up:
https://phabricator.wikimedia.org/T112956
For example. Open questions:
"Will we be able to provide hourly data?"
"What about backfilling?"
"I want to use the API to display view counts of the pages I edit on my
homepage, can I do that?"
Thanks,
Nuria
When I go to https://en.wikipedia.org/wiki/Special:Statistics, clicking on
"content pages" takes me to a list of pages that are included in
"(Article)" namespace. But this includes redirects, which is surprising to
me because the Special:Statistics page implies to me that redirects are
included in "Pages (All pages in the wiki, including talk pages, redirects,
etc.)" which is a separate link. So, is there a way to verify that what
Special:Statistics is showing for "content pages" actually excludes the
redirects that are shown in "(Article)" namespace?
I also have a question about disambiguation pages. When I go to
https://en.wikipedia.org/wiki/Special:AllPages, select "(Article)", and
select "hide redirects", disambiguation pages like
https://en.wikipedia.org/wiki/!!,
https://en.wikipedia.org/wiki/Panda_(disambiguation),
https://en.wikipedia.org/wiki/Teel and https://en.wikipedia.org/wiki/Parsons
are all still appearing in that list of pages. Should we be counting
disambiguation pages as "articles"? I suppose it makes sense to think of
editing disambiguation pages as editing in content space, but I'm a little
hesitant to count them as articles for the purposes of the 5,000,000
milestone. Curious to hear what others think about whether disambiguation
pages should be counted as articles for this purpose.
Thanks,
Pine
Hi Analytics,
On ENWP, does the number of 26,163,773 users include IPs who have made
edits? Does it include editors on all Wikimedia projects or just those who
have registered and/or edited on ENWP?
Thanks,
Pine
Hello:
There are replication issues regarding Eventlogging data.
For some tables (see:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20151022-EventLo…)
data has not been replicated since 2015-10-22.
All dashboards read from the slave rather than master so the data they
display is outdated until this issue is resolved. Ditto for any query
running on 1002.
You can follow the work of our DBA on this regard on the following ticket:
https://phabricator.wikimedia.org/T116599
Thanks,
Nuria