I suggest you contact someone who will attend the meeting, and discuss the issue with them.
Thank you I think I've already found such a person.
Anyway, if using MySQL's memory tables consumes too much resources, perhaps consider alternatives? Have you looked at network analysis frameworks like JUNG (Java) or SNAP (C++)? Relational databases are not good at managing linked structures like trees and graphs anyway.
My view on MySQL capabilities was different. The first thought is that task involves memory intensive computations, i.e. it will use lots of data to produce considerably low amount of results. Operations with memory take major part of the common analysis complexity, that's why it is reasonable to involve an engine especially targeted to work with data efficiently. By efficiency here I mean mostly processing speed. Indeed, the idea was to mark isolated articles with templates to make autors aware of the issue. Practice shown that templates are to be set with use of actual data, which means it is not good if a bot works for many hours. The other thing from practice is that templates are to be set near dayly, overwise autors lose attention to their creatures.
Yes, it takes lots of memory because MEMORY engine stores varchar data in an inefficient way and spends lots of memory for indexes, but on the other hand the processing takes just 1-2 hours for such wiki as ru or de. My estimates for offline implementation made on the initial stage gave me much higher estimate on processing results actuality, that's why I've chosen sql.
The memory requirements shouldn't be that huge anyway: two IDs per edge = 8 byte. The German language Wikipedia for instance has about 13 million links in the main namespace, 8*|E| would need about 1GB even for a naive implementation. With a little more effort, it can be nearly halved to 4*|E|+4*|V|.
My data for dewiki is different. The amount of links between articles (excluding disambigs) after redirects throwing is around 33 milion. The source is here: http://toolserver.org/~mashiah/isolated/de.log. One may find lots of other interesting statistics there.
I have used the trivial edge store for analyzing the category structure before, and Neil Harris is currently working on an nice standalone implementation of this for Wikimedia Germany. This should allow recursive category lookup in microseconds.
I think category tree analysis takes (which is also there) - worst case - minutes for relatively large wiki (7 minutes for about 150 small wikipedias). On the output the categorytree graph is split into strongly connected components. With offline application just data download from the database could take more than Golem's processing time.
mashiah