On 20/09/13 03:04, Jon Robson wrote:
Thanks Tim for running those data. That seems to suggest the URL structure works for the most case.
I think the request rate for actual articles in the root is very, very low. And if you look at the paste I gave earlier:
http://paste.tstarling.com/p/uhtFqg.html
there's reason to think that the amount of traffic that comes from naive readers typing URLs and expecting an article is much smaller than even 149k per week. A naive user would be more likely to type a URL starting with a lower-case letter, and if you take those entries, and filter out the obvious client bugs and typos, that leaves only 39 log entries. If we filter out some more log entries that are unlikely search terms for Wikipedia articles ("enregistrement-audio-musique", "is", "unlimited_data_plan", etc.), that leaves maybe 30. http://paste.tstarling.com/p/KWuHif.html
Of these, only 12 actually correspond to Wikipedia articles or redirects:
abolition addicting_games apple_inc carnaval dreamshade facade girls insidious karthik online_coupons snam walkabout
So the number of naive readers actually helped by our 404 Refresh to /wiki/ is probably closer to 12k per week than 149k per week.
Personally, I think the refresh is annoying, since it makes it much more difficult to correct typos in manually-typed URLs. If you actually meant to type some non-article URL like a CSS resource, and make a typo which causes it to hit the refresh, the URL you typed is erased from your browser's address bar and history, making correction of the typo much more difficult. Maybe we should just include a link to the search page, rather than redirect or refresh.
-- Tim Starling