Hi Nemo,

Thanks for the comments and inputs!
- I agree with how you looked at the graph, the triangle below the diagonal is more interesting than the above one, as it contains more information, except for languages that are darken in the above triangle. 
- I was surprised by the clustering (Swidish, Dutch, Waray-Waray, Cebuano, Vietnamese, Indonesian, Minangkabau) but if is a bot that created it, than it makes sense.
- I can try ordering the Wikipedias by page views, it might put emphasis on the real activity and not only on the size (or bot generated links/ pages). Actually, I can change the y-axis to be ordered by page views instead of articles so we won't lose information.
- Another point I didn't mentioned is that there are small languages (not appearing in the heat map) with unproportional number of linked pages compared to number of articles. This is due to (I think) bot generated articles that don't have interlinks in the text so they are not counted as articles.

Best,
Neta


On Mon, Jan 19, 2015 at 12:53 AM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Neta Livneh, 18/01/2015 19:57:
I think this is a better version.

Thanks. I think the way to read this graph is that it's naturally darker below the diagonal line, and fairer above it.
        In fact, position (x, y) is the percentage of articles in wiki x which also exist in wiki y. If y > x we can't reach 100 %; for y >> x, we approach zero. So, the things worth noting are mostly the dark areas above the line and white areas below the line.
        Well known botpedias (ceb and war) clearly stand out. At a lesser extent also nl, sv. If you ordered the wikis by pageviews (as per www.wikipedia.org top 10) the shade would look more natural (but we'd lose information, unless you redefined the colouring).
        A non-mystery is the strong correlation between sh and sr: that's basically the same language and they have a similar size.
        A weird thing is the status of "min": you'd expect it to have some stronger correlation to zh; I'd call that a gap to fill. The horizontal lines for ja, vi also stand out: we rarely see users from those wikis, they're more isolated. The vertical lines above (uz, vo) come often with surprises: probably some common bulk of bot-created articles. The dark spots in the vertical line above pms is an antology of secessionist/regional/nostalgic languages; not a surprise given the interests of the core editors.


Nemo

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics