I read all kind of confusions about funny correlations between language versions and countries where visitors are coming from.
As I (privately) communicated with Erik, the following flaws are in the current analysis:
* The country code AU is often used (by apnic in this case) as a placeholder for ranges that are pre-reserved. For instance to allocate parts of that very big range in bits and pieces to countries in the area (e.g. JP) * Similarly Ripe is doing that for the country code EU (not to be confused with the language code eu)
Other misinterpretations may occur because there are some conflicts between country and language codes. An example of this is for instance SL (Sierra Leone) and sl (Slovenian) and I guess UA (Ukraine) and uk (Ukrainian?) is a similar case. But there are certainly more. See also: http://meta.wikimedia.org/wiki/Language_codes/Conflicts, although imo this list is not comprehensive.
Another cause of problems might be the fact that the assignments of IP ranges continuously change. That happens on a small scale (e.g. re-assigning a block of 65536 or much smaller), but also on a larger scale. The result is that you can't fully trust a so-called geo-IP database (like MaxMind). I don't know how quickly such a database is outdated, but have noticed major shifts of ranges of more than 16 million addresses within half a year (concerning the AU - JP confusion). Structured lists do not exist, so the only way is continuously checking the data in such a database via the Regional Internet Registries. That is a complicated, but also a very time-consuming process.
So don't draw conclusions in the case of small countries and/or languages.
Rgds Ronald
wikimedia-l@lists.wikimedia.org