Hello Mashiah
Connectivity is a property of a graph as a whole,
there is no way to
analyze it having just a part of all nodes and edges. Use of original
tables in language database or use of MyISAM tables makes the analysis
far too slow. Good thing with memory tables is not only in being
located in memory (which is not always true of course), the engine is
optimized for speed itself and the format is designed to allow that.
If your project requires more resources than are available as your fair share on
the toolserver, then either the need for resources needs to be reduced, or the
project has to run elsewhere. If there are good reasons and sufficient funding,
setting aside a VM or even a full server for a special project can be
considered. How individual projects and chapters can participate more in the
givernance (and funding) of the toolserver is one of the topics that will be
discussed at the upcoming chapter's conference in april in berlin. I suggest you
contact someone who will attend the meeting, and discuss the issue with them.
Anyway, if using MySQL's memory tables consumes too much resources, perhaps
consider alternatives? Have you looked at network analysis frameworks like JUNG
(Java) or SNAP (C++)? Relational databases are not good at managing linked
structures like trees and graphs anyway.
The memory requirements shouldn't be that huge anyway: two IDs per edge = 8
byte. The German language Wikipedia for instance has about 13 million links in
the main namespace, 8*|E| would need about 1GB even for a naive implementation.
With a little more effort, it can be nearly halved to 4*|E|+4*|V|.
I have used the trivial edge store for analyzing the category structure before,
and Neil Harris is currently working on an nice standalone implementation of
this for Wikimedia Germany. This should allow recursive category lookup in
microseconds.
In any case, something needs to change. You can't expect to be frequently using
1/8 of the toolserver's RAM. Even more so since this amount of memory can't be
used by MySQL for caching while you are not using it (because of the way the
innodb cache pool works).
Regards,
Daniel
--
Daniel Kinzler
Software Developer
Wikimedia Deutschland
Phone +49 30 219 158 260
Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
Wissens frei teilhaben kann. Helfen Sie uns dabei!
http://spenden.wikimedia.de/
****Neu: Spenden Sie per Telefon! Unter 01805-945473 (14 Cent / Minute)
können Sie einmalig 4,99€ oder 9,99 € oder regelmäßig 9,99 € pro Woche
spenden.*****
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.