On 03/17/2013 08:15 PM, Erik Zachte wrote:
C1b) Definition of what constitutes an article can even change more profoundly:
Recently the Swedish Wikipedia started to add bot created articles on a large scale, which has previously been done to the Dutch and some other Wikipedias. These articles are not bad, they cite sources and are accurate, so they should be counted among the existing articles. But they are not very popular, since they cover obscure topics.
This leads to the idea that perhaps we should count articles that are actually read. It's easy to identify those articles that are very short or don't cite sources, but in order to count articles that aren't read, we need to be sure that robots of all kinds are excluded.
In excluding robot accesses from the visitor statistics, it's also relevant to ask whether accesses from editors should be counted. If I'm a steam engine enthusiast and writes articles about every engineer and railroad, maybe I'm the only audience for those articles. When I want to know if my articles have any readership, I don't want to include myself in the audience count. If I'm only writing for my own reading, then I don't really need Wikipedia, so the usefulness of Wikipedia starts when the second human reader turns up.
Are there any ideas or strategies for a good audience count?
If, instead of page views, we were to count the number of different IP addresses, then each bot or editor would just count as one identity, and this would reduce their impact.
If we can define a good measurement for audience, then it would start a new statistics series and we would not have any problems with mismatch with any previous data.