Does the Wikimedia Foundation's technology team have any insight or comment on the finding that (other than the Wikipedia Main Page and the "404 error" page), in September the most popular page on the English Wikipedia was "Mathematical descriptions of opacity", with over 5.1 million views? There was no discernible "bump" in interest in opacity due to outside news events or a book or movie release on the subject.
The phenomenon is outlined here: http://www.examiner.com/wiki-edits-in-national/wikipedia-s-top-10-most-viewe...
Do you think this is some sort of malicious probing activity by a hacker, or is it perhaps the deliberate testing of a developer employed by the WMF?
Thank you,
Greg
If I remember correctly; stats collection is imperfect, and that results in some odd numbers.
That is just my memory of why it looks like that.
Tom
On 4 October 2011 14:10, Gregory Kohs thekohser@gmail.com wrote:
Does the Wikimedia Foundation's technology team have any insight or comment on the finding that (other than the Wikipedia Main Page and the "404 error" page), in September the most popular page on the English Wikipedia was "Mathematical descriptions of opacity", with over 5.1 million views? There was no discernible "bump" in interest in opacity due to outside news events or a book or movie release on the subject.
The phenomenon is outlined here:
http://www.examiner.com/wiki-edits-in-national/wikipedia-s-top-10-most-viewe...
Do you think this is some sort of malicious probing activity by a hacker, or is it perhaps the deliberate testing of a developer employed by the WMF?
Thank you,
Greg
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Oct 4, 2011 at 9:10 AM, Gregory Kohs thekohser@gmail.com wrote:
Do you think this is some sort of malicious probing activity by a hacker, or is it perhaps the deliberate testing of a developer employed by the WMF?
There's no specific testing that I know of. Other than skewing people's stats, I don't really know what I hacker would gain from skewing the "most viewed articles" list.
-Chad
On Tue, Oct 4, 2011 at 3:10 PM, Gregory Kohs thekohser@gmail.com wrote:
Does the Wikimedia Foundation's technology team have any insight or comment on the finding that (other than the Wikipedia Main Page and the "404 error" page), in September the most popular page on the English Wikipedia was "Mathematical descriptions of opacity", with over 5.1 million views? There was no discernible "bump" in interest in opacity due to outside news events or a book or movie release on the subject.
The phenomenon is outlined here: http://www.examiner.com/wiki-edits-in-national/wikipedia-s-top-10-most-viewe...
Do you think this is some sort of malicious probing activity by a hacker, or is it perhaps the deliberate testing of a developer employed by the WMF?
There seem to have been a lot of page views concentrated around September 22-26. This could be something as innocent as someone running a broken bot that's supposed to fetch lots of different articles but instead fetches the same URL again and again due to a typo in the code, or it could be as malicious as someone trying to DoS us in a very simplistic way. I'll look at the sampled logs for those days and see what I can find.
Roan
On Tue, Oct 4, 2011 at 3:48 PM, Roan Kattouw roan.kattouw@gmail.com wrote:
There seem to have been a lot of page views concentrated around September 22-26.
Missing link here: http://stats.grok.se/en/201109/Mathematical%20descriptions%20of%20opacity
Roan
On Tue, Oct 4, 2011 at 3:48 PM, Roan Kattouw roan.kattouw@gmail.com wrote:
There seem to have been a lot of page views concentrated around September 22-26. This could be something as innocent as someone running a broken bot that's supposed to fetch lots of different articles but instead fetches the same URL again and again due to a typo in the code, or it could be as malicious as someone trying to DoS us in a very simplistic way. I'll look at the sampled logs for those days and see what I can find.
I've grepped the sampled (1:1000) Squid logs for September 23rd, 24th and 25th, and I do indeed see that a vast, vast majority of requests for that article come from a single IP. In fact, I got output like this (IP addresses redacted for privacy reasons):
$ zgrep Mathematical_descriptions_of_opacity sampled-1000.log-20110924.gz | cut -d ' ' -f 5 | sort | uniq -c | sort -rn | head 1548 AA.BB.CC.DD 1 EE.FF.GG.HH 1 JJ.KK.LL.MM
which means that in the sampled log (we don't keep full access logs, only a 1:1000 sample) for September 24th, of the 1550 logged requests, 1548 came from our guy and 2 came from different, random people. This doesn't mean there were only 1550 visits to that page that day; due to the sampling, the real number is roughly near 1550*1000 = 1.55 million, which matches the 1.6M reported by stats.grok.se well enough.
Also, these requests all list http://en.wikipedia.org/wiki/Snell%27s_law as their referer: $ zgrep Mathematical_descriptions_of_opacity sampled-1000.log-20110924.gz | grep Snell | wc -l 1548 but the Snell's law article doesn't show any strange access patterns on stats.grok.se .
So I guess this was just one IP hitting the same article ~1.5 million times per day for 3-4 days, for whatever reason. That doesn't really hurt our servers much (unless the article is also edited heavily in the meantime and contains complex templates that take a long time to parse, see Jackson, Michael) but obviously it does skew the traffic stats.
Roan
Yo,
So I guess this was just one IP hitting the same article ~1.5 million times per day for 3-4 days, for whatever reason.
OMG, if anyone can influence quality journalism on examiner.com that easily, we definitely have to go and build proper analysis of full logs with all the shiny modern technologies and whatever cluster we can build for that. That would be bump in program spending \o/
(Though it would be nice to notice someone crap like that. Web activity, not article, I mean).
Domas
On 04/10/11 15:48, Roan Kattouw wrote:
On Tue, Oct 4, 2011 at 3:10 PM, Gregory Kohsthekohser@gmail.com wrote:
Does the Wikimedia Foundation's technology team have any insight or comment on the finding that (other than the Wikipedia Main Page and the "404 error" page), in September the most popular page on the English Wikipedia was "Mathematical descriptions of opacity", with over 5.1 million views? There was no discernible "bump" in interest in opacity due to outside news events or a book or movie release on the subject.
The phenomenon is outlined here: http://www.examiner.com/wiki-edits-in-national/wikipedia-s-top-10-most-viewe...
There seem to have been a lot of page views concentrated around September 22-26. This could be something as innocent as someone running a broken bot that's supposed to fetch lots of different articles but instead fetches the same URL again and again due to a typo in the code, or it could be as malicious as someone trying to DoS us in a very simplistic way. I'll look at the sampled logs for those days and see what I can find.
My conspiracy theory: someone knew about the upcoming Examiner article and bumped his favorite topic :)
wikitech-l@lists.wikimedia.org