Hi Nemo,
Thanks for the comments and inputs!
- I agree with how you looked at the graph, the triangle below the diagonal is more interesting than the above one, as it contains more information, except for languages that are darken in the above triangle.
- I was surprised by the clustering (Swidish, Dutch, Waray-Waray, Cebuano, Vietnamese, Indonesian, Minangkabau) but if is a bot that created it, than it makes sense.
- I can try ordering the Wikipedias by page views, it might put emphasis on the real activity and not only on the size (or bot generated links/ pages). Actually, I can change the y-axis to be ordered by page views instead of articles so we won't lose information.
- Another point I didn't mentioned is that there are small languages (not appearing in the heat map) with unproportional number of linked pages compared to number of articles. This is due to (I think) bot generated articles that don't have interlinks in the text so they are not counted as articles.
Best,
Neta