webalizer statistics and access breakdowns by country are now available for all Wikimedia projects:
* http://www2.knams.wikimedia.org/stats/ * http://www2.knams.wikimedia.org/country-stats/
if there are any suggestions for other statistics that may be useful to generate, we could look at providing those too.
kate.
What is "italy.wikipedia.orghttp://www2.knams.wikimedia.org/stats/italy.wikipedia.org/" ? Frieda
On 7/5/05, Kate keturner@livejournal.com wrote:
webalizer statistics and access breakdowns by country are now available for all Wikimedia projects:
if there are any suggestions for other statistics that may be useful to generate, we could look at providing those too.
kate.
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Frieda Brioschi wrote in gmane.science.linguistics.wikipedia.technical:
What is
"italy.wikipedia.orghttp://www2.knams.wikimedia.org/stats/italy.wikipedia.org/"
?
statistics are generated based on URLs accessed by users. so, probably someone does not know where it.wp is :-) in the future, it should be filtered by the list of actual languages.
kate.
Kate keturner@livejournal.com wrote:
Frieda Brioschi wrote:
"italy.wikipedia.orghttp://www2.knams.wikimedia.org/stats/italy.wikipedia.org/"?
statistics are generated based on URLs accessed by users. so, probably someone does not know where it.wp is :-) in the future, it should be filtered by the list of actual languages.
If someone fixes that, perhaps a list of domains rejected as not being real projects could be generated? It could be useful to know if there are "bad" URLs that an awful lot of people are trying to enter...
Rowan Collins wrote in gmane.science.linguistics.wikipedia.technical:
Kate keturner@livejournal.com wrote:
Frieda Brioschi wrote:
"italy.wikipedia.orghttp://www2.knams.wikimedia.org/stats/italy.wikipedia.org/"?
statistics are generated based on URLs accessed by users. so, probably someone does not know where it.wp is :-) in the future, it should be filtered by the list of actual languages.
If someone fixes that, perhaps a list of domains rejected as not being real projects could be generated? It could be useful to know if there are "bad" URLs that an awful lot of people are trying to enter...
it is now filtered based on langlist and some other non-language domains. the majority of invalid domains appear to be <fullname>.wikipedia.org and various language aliases or country codes (ger, dk).
kate.
Kate,
Again, I am delighted by these stats.
I am a bit perplex by the numbers for South Africa. Very low though we have quite a big number of south african editors. Any idea ?
For info, the script initially came from Submarine.
ant
Kate a écrit:
Frieda Brioschi wrote in gmane.science.linguistics.wikipedia.technical:
What is
"italy.wikipedia.orghttp://www2.knams.wikimedia.org/stats/italy.wikipedia.org/"
?
statistics are generated based on URLs accessed by users. so, probably someone does not know where it.wp is :-) in the future, it should be filtered by the list of actual languages.
kate.
Anthere wrote:
Again, I am delighted by these stats.
I am a bit perplex by the numbers for South Africa. Very low though we have quite a big number of south african editors. Any idea ?
When we discussed the feasibility of getting accurate statistics, we found two problems:
1. The squid logs were not in an easy, aggregated format and located on many different machines, which makes creating statistics hard.
2. IP-to-country mappings in public databases is very unaccurate for some regions, notably the african continent.
Kate has now solved (1), but (2) is still a problem, and a much harder one to tackle. :)
Mark Bergsma a écrit:
Anthere wrote:
Again, I am delighted by these stats.
I am a bit perplex by the numbers for South Africa. Very low though we have quite a big number of south african editors. Any idea ?
When we discussed the feasibility of getting accurate statistics, we found two problems:
- The squid logs were not in an easy, aggregated format and located on
many different machines, which makes creating statistics hard.
- IP-to-country mappings in public databases is very unaccurate for
some regions, notably the african continent.
Kate has now solved (1), but (2) is still a problem, and a much harder one to tackle. :)
Nod. What actually surprised me, is simply that south africa is one of the most advanced on the continent from a tech point of view. And probably the one country with the highest number of editors (at least, this is what my exploring clearly showed). So ... what I would expected from central africa just surprised me there :-)
ant
This is great stuff, thanks Kate.
Is there a page for the overall picture? The total hits etc. for all Wikimedia projects together?
If not, that would be a useful one.
--sannse
Kate wrote:
webalizer statistics and access breakdowns by country are now available for all Wikimedia projects:
if there are any suggestions for other statistics that may be useful to generate, we could look at providing those too.
kate.
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
sannse wrote:
This is great stuff, thanks Kate.
Is there a page for the overall picture? The total hits etc. for all Wikimedia projects together?
If not, that would be a useful one.
I made a few graphs with total requests/s and total bandwidth (different colors per cluster) a few weeks ago. They are linked off http://ganglia.wikimedia.org
sannse wrote in gmane.science.linguistics.wikipedia.technical:
Is there a page for the overall picture? The total hits etc. for all Wikimedia projects together?
i've added another page at
http://www2.knams.wikimedia.org/stats/00-all-projects/
for the aggregate statistics. (they'll appear the next time the statistics are run, which should be about 15 minutes from now).
If not, that would be a useful one.
--sannse
kate.
Kate wrote:
i've added another page at
http://www2.knams.wikimedia.org/stats/00-all-projects/
for the aggregate statistics. (they'll appear the next time the statistics are run, which should be about 15 minutes from now).
thanks Kate, that's great
--sannse
Kate wrote in gmane.science.linguistics.wikipedia.technical:
i've added another page at
unfortunately, i had to remove this, as webalizer is too slow/uses too much memory to process the logfiles twice.
kate.
Cool stuff. Is there any chance you could configure webalizer to report UA statistics, or would this fall afoul of privacy policies?
Patrick
Le 5 juil. 05 à 08:31, Kate a écrit :
webalizer statistics and access breakdowns by country are now available for all Wikimedia projects:
if there are any suggestions for other statistics that may be useful to generate, we could look at providing those too.
kate.
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Patrick Collison wrote in gmane.science.linguistics.wikipedia.technical:
Cool stuff. Is there any chance you could configure webalizer to report UA statistics, or would this fall afoul of privacy policies?
unfortunately, our access logs do not contain information on user-agents. it may be possible to change that, but it's non-trivial...
Patrick
kate.
On 7/5/05, Kate keturner@livejournal.com wrote:
webalizer statistics and access breakdowns by country are now available for all Wikimedia projects:
if there are any suggestions for other statistics that may be useful to generate, we could look at providing those too.
kate.
Thanks to whomever worked on getting these done. They're great!
Could they be put on an easier to remember address? The old ones were under: http://wikimedia.org/stats/<project name> (e.g. http://wikimedia.org/stats/en.wikipedia.org/)
Perhaps it would be useful to have the %age of accept-language to each language visited.
That is, what %age of the people visiting de.wiki have German as their accept language, or English, or Dutch... also, in some browsers, this would include a country code (such as en-US, zh-CN, es-AR, pt-PT, ar-SA, ku-IQ...) or other denomination of language variant, which could be useful for determining the primary language variant of the audience of a particular Wikipedia (are there more visitors to es.wiki with es-MX selected, or es-AR? more visitors to zh.wiki with zh-SG or zh-HK? more visitors to de.wiki with de-AT or de-CH?).
Mark
PS For those wondering the origin of the countrycode "ch" for Switzerland, it is the only historically language-neutral option (Switzerland has 4 official languages): CH is an abbreviation of "Confoederatio Helvetica", the Latin name of Switzerland (more properly, the "Helvetian Confederation"). "Confoederatio Helvetica" can be found on Swiss stamps, and occaisionally in references to Switzerland in any language.
On 05/07/05, Kate keturner@livejournal.com wrote:
webalizer statistics and access breakdowns by country are now available for all Wikimedia projects:
if there are any suggestions for other statistics that may be useful to generate, we could look at providing those too.
kate.
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Kate wrote:
webalizer statistics and access breakdowns by country are now available for all Wikimedia projects:
This is a table of the 31 highest-pageview countries, but sorted on a per-capita basis, using population figures from Wikipedia: http://en.wikipedia.org/wiki/List_of_countries_by_population
It's a little hard to read here in plain text, but there are some interesting things to observe. First, as I have personally felt for a long time, Wikipedia is most popular (per capita) in German-speaking countries. In Germany, for example, there were .3749 pageviews per person (for whatever time period Kate studied, a day?), as compared to .2519 in the United States.
--Jimbo (sitting in Germany at the moment)
CH SWITZERLAND 3257554 1.50% 7,489,370 0.434957012 DE GERMANY 30918990 14.60% 82,468,000 0.37492106 AT AUSTRIA 2597402 1.20% 8,184,691 0.317348816 FI FINLAND 1497528 0.70% 5,223,442 0.286693717 IL ISRAEL 1736050 0.80% 6,276,883 0.276578359 CA CANADA 8732647 4.10% 32,805,401 0.266195405 NL NETHERLANDS 4132982 1.90% 16,407,491 0.251896039 US UNITED STATES 67764804 31.90% 296,202,709 0.228778475 JP JAPAN 25753169 12.10% 127,333,002 0.202250545 SE SWEDEN 1814517 0.90% 9,001,774 0.201573268 SG SINGAPORE 764117 0.40% 4,425,720 0.172653715 BE BELGIUM 1762902 0.80% 10,364,388 0.170092243 NO NORWAY 758707 0.40% 4,593,041 0.165186202 AU AUSTRALIA 2927652 1.40% 20,090,437 0.145723659 UK UNITED KINGDOM 7746798 3.70% 60,441,457 0.128170272 FR FRANCE 7241067 3.40% 60,424,213 0.119837175 DK DENMARK 630803 0.30% 5,432,335 0.116120048 HK HONG KONG 779969 0.40% 6,898,686 0.113060516 PL POLAND 4327531 2.00% 38,635,144 0.112010221 PT PORTUGAL 966668 0.50% 10,566,212 0.091486713 CL CHILE 1239153 0.60% 16,136,137 0.076793659 IT ITALY 4139186 2.00% 58,103,033 0.071238725 ES SPAIN 2587800 1.20% 43,209,511 0.059889592 PE PERU 1649896 0.80% 27,925,628 0.059081787 MY MALAYSIA 880847 0.40% 23,953,136 0.036773765 AR ARGENTINA 1432035 0.70% 39,537,943 0.036219259 MX MEXICO 2594245 1.20% 106,202,903 0.024427251 PH PHILIPPINES 1674005 0.80% 87,857,473 0.019053644 BR BRAZIL 3353760 1.60% 186,112,794 0.01802004 CN CHINA 2498070 1.20% 1,306,313,812 0.001912305 IN INDIA 1400545 0.70% 1,080,264,388 0.001296484
It wouldn't surprise me if the US figures were even lower. There's already been a comment about the hits for South Africa being ridiculously low. The US national domain, .us, is relatively uncommon, and many people from other countries use a .com, .org, .net or even .gov domain. One wonders to what extent that has been factored into the statistics.
Ec
Jimmy Wales wrote:
It's a little hard to read here in plain text, but there are some interesting things to observe. First, as I have personally felt for a long time, Wikipedia is most popular (per capita) in German-speaking countries. In Germany, for example, there were .3749 pageviews per person (for whatever time period Kate studied, a day?), as compared to .2519 in the United States.
--Jimbo (sitting in Germany at the moment)
CH SWITZERLAND 3257554 1.50% 7,489,370 0.434957012 DE GERMANY 30918990 14.60% 82,468,000 0.37492106 AT AUSTRIA 2597402 1.20% 8,184,691 0.317348816 FI FINLAND 1497528 0.70% 5,223,442 0.286693717 IL ISRAEL 1736050 0.80% 6,276,883 0.276578359 CA CANADA 8732647 4.10% 32,805,401 0.266195405 NL NETHERLANDS 4132982 1.90% 16,407,491 0.251896039 US UNITED STATES 67764804 31.90% 296,202,709 0.228778475 JP JAPAN 25753169 12.10% 127,333,002 0.202250545 SE SWEDEN 1814517 0.90% 9,001,774 0.201573268 SG SINGAPORE 764117 0.40% 4,425,720 0.172653715 BE BELGIUM 1762902 0.80% 10,364,388 0.170092243 NO NORWAY 758707 0.40% 4,593,041 0.165186202 AU AUSTRALIA 2927652 1.40% 20,090,437 0.145723659 UK UNITED KINGDOM 7746798 3.70% 60,441,457 0.128170272 FR FRANCE 7241067 3.40% 60,424,213 0.119837175 DK DENMARK 630803 0.30% 5,432,335 0.116120048 HK HONG KONG 779969 0.40% 6,898,686 0.113060516 PL POLAND 4327531 2.00% 38,635,144 0.112010221 PT PORTUGAL 966668 0.50% 10,566,212 0.091486713 CL CHILE 1239153 0.60% 16,136,137 0.076793659 IT ITALY 4139186 2.00% 58,103,033 0.071238725 ES SPAIN 2587800 1.20% 43,209,511 0.059889592 PE PERU 1649896 0.80% 27,925,628 0.059081787 MY MALAYSIA 880847 0.40% 23,953,136 0.036773765 AR ARGENTINA 1432035 0.70% 39,537,943 0.036219259 MX MEXICO 2594245 1.20% 106,202,903 0.024427251 PH PHILIPPINES 1674005 0.80% 87,857,473 0.019053644 BR BRAZIL 3353760 1.60% 186,112,794 0.01802004 CN CHINA 2498070 1.20% 1,306,313,812 0.001912305 IN INDIA 1400545 0.70% 1,080,264,388 0.001296484
Ray Saintonge wrote in gmane.science.linguistics.wikipedia.technical:
It wouldn't surprise me if the US figures were even lower. There's already been a comment about the hits for South Africa being ridiculously low. The US national domain, .us, is relatively uncommon, and many people from other countries use a .com, .org, .net or even .gov domain. One wonders to what extent that has been factored into the statistics.
statistics are based on IP data from RIRs, not domains (which we don't log or do lookups for).
however, the current small sample size introduces possible errors from problems during log collection (now resolved, but will take a while to be removed from the statistics).
Ec
kate.
Kate wrote:
statistics are based on IP data from RIRs, not domains (which we don't log or do lookups for).
however, the current small sample size introduces possible errors from problems during log collection (now resolved, but will take a while to be removed from the statistics).
Yes, and I can only assume that the identification of the location of ip numbers has to be a bit hazy in places. A particular ip number which is identified as "Switzerland" might well be in Germany or France, I suppose, if the person is close to the boarder and an ISP serves customers on both sides of the border.
Still, as a general indication, I think it's probably quite nice, as long as we don't make too much over small differences.
--Jimbo
Jimmy Wales wrote in gmane.science.linguistics.wikipedia.technical:
Kate wrote:
statistics are based on IP data from RIRs, not domains (which we don't log or do lookups for).
Yes, and I can only assume that the identification of the location of ip numbers has to be a bit hazy in places.
one of the larger problems is AOL, who allocate IPs registered in the US to all customers :-(
i've added a notice to this effect at the top of the page, on David's suggestion.
--Jimbo
kate.
Kate wrote:
one of the larger problems is AOL, who allocate IPs registered in the US to
all customers :-(
i've added a notice to this effect at the top of the page, on David's suggestion.
According to numerous articles and press releases, the current generation of ip-to-country mappers handle AOL users correctly. Is it only proprietary ones that can do that and the one we use doesn't?
See, for example: http://news.com.com/2100-1023-836138.html
-Mark
Delirium wrote in gmane.science.linguistics.wikipedia.technical:
According to numerous articles and press releases, the current generation of ip-to-country mappers handle AOL users correctly.
if you can find the data, anything can be done. do you know where it is? :)
-Mark
kate.
wikitech-l@lists.wikimedia.org