On Thu, Sep 17, 2009 at 8:58 PM, Brian Brian.Mingus@colorado.edu wrote:
On Thu, Sep 17, 2009 at 9:55 PM, Robert Rohde rarohde@gmail.com wrote:
On Thu, Sep 17, 2009 at 8:25 PM, Steve Bennett stevagewp@gmail.com wrote:
On Fri, Sep 18, 2009 at 12:20 PM, Robert Rohde rarohde@gmail.com
wrote:
That particular result is unpublished. I could make you a list of infrequently viewed articles, but it would be quite long.
Could you make a list of the 100 least viewed? Or are there are large number which are essentially equal?
My sample consisted of collating 30 non-consecutive hours of data on enwiki traffic where each hour was randomly chosen from any point during the last 8 months. This was filtered to only include page titles that were valid mainspace pages.
In those 30 hours, there are 1.36 million valid article titles that are viewed exactly once [1].
Examples include:
129342_Ependes 1421_in_literature Antiprotonic_helium Antonella_Mularoni Madhusoodhanan_Nair Blue_Murder_(play) Ozonotherapy Veronika_Krausas Verret,_New_Brunswick Bare_Truth_(Nat_album)
As you can see, these are obscure topics, but they are not necessarily crazy topics. If I were to repeat it with a longer baseline (say 1000 hours rather than 30) I'm suspect you might get more interesting information on the tail, but right now probably the best I can say is that a cumulatively significant amount of traffic goes to relatively obscure pages.
-Robert Rohde
[1] Note: Because the traffic data is based on url request stings, and some url strings map to the same pages, i.e. Blue_Ocean and Blue%20Ocean, the number of valid article titles in not necessarily the same as the number of distinct pages. For practical reasons my analysis was based of the url strings, and so probably over counts the number of distinct articles involved, and to a degree overstates the fraction of traffic to obscure pages.
How sure are you that they were viewed by a person and not a bot?
There is no differentiation between people and bots. (Some of these things are why it is an unpublished analysis. ;-) I was actually using traffic data for a totally different purpose, but decided to look at things likes like obscure pages, while I was at it.)
-Robert Rohde