Dear colleagues,<br>dear Daniel,<br><br>thanks for the detailed explanation!<br><br>It looks like the Connectivity project definitely overgrew the Toolserver single user status, and maybe also overgrew the Toolserver structure in general.<br>
There is a contradiction between considering it as a single user tool and its real large-scale nature. (For example, it may be considered as equal to all interwiki bots summed together, or even similar to the semantic wiki project.)<br>
<br>It looks like there's a REAL need to find some long-term solution - a complete (or at least major) bot refactoring, integrating support into mediawiki, financing a dedicated server, or something like this.<br><br>
Ok. But then I would like to ask you to consider also some short-term solution (say, for two-three months) which might allow the project to function before a long-term solution is implemented.<br><br>Maybe also Mashiah is able to find a way how to run Golem with a limited functionality, still making the analysis essential for the project?<br>
<br>Of course, I would be happy to discuss Toolserver support opportunities during the chapters' conference in Berlin!<br><br><br><pre>> Hello Vladimir<br><br>> The problem is that Golem uses a very large amount of memory, about 4GB. That's<br>
> 1/8 of the total capacity, and it's memory that can not be used for normal<br>> database operations if it's set aside for the memory tables golem uses (even if<br>> they are not in use). This by far exceeds the fair share of resources for each<br>
> toolserver user.<br><br>> It was only recently discovered that golem does use so much memory (because it<br>> does it on the database server, not the normal user server), but it is suspected<br>> that this is at least one of the causes that triggered system failured in the<br>
> past. We do not currently see the possibility of allowing individual users to<br>> use that much memory, especially not on the database server. Basically, as it is<br>> implemented now, Golem is unfit for the toolserver, because it consumes far too<br>
> much resources.<br><br>> Earlier today, I ask Mashiah to consider alternative ways to implement the<br>> network analysis. I think it would be possible to reduce the memory use by at<br>> least a factor of 8. Should this not be possible, golem would have to run on a<br>
> dedicated system.<br><br>> If there are good reasons and sufficient funding, setting aside a VM or even a<br>> full server for a special project can be considered. How individual projects and<br>> chapters can participate more in the governance (and funding) of the toolserver<br>
> is one of the topics that will be discussed at the upcoming chapters' conference<br>> in april in berlin. I recommend you bring up the topic of golem there.<br><br><br>> Regards,<br>> Daniel<br><br>PS: Below I quote my reply to Mashiah.<br>
<br>><i> Hello Mashiah<br></i>><i> <br></i>>><i> > Connectivity is a property of a graph as a whole, there is no way to<br></i>>><i> > analyze it having just a part of all nodes and edges. Use of original<br>
</i>>><i> > tables in language database or use of MyISAM tables makes the analysis<br></i>>><i> > far too slow. Good thing with memory tables is not only in being<br></i>>><i> > located in memory (which is not always true of course), the engine is<br>
</i>>><i> > optimized for speed itself and the format is designed to allow that.<br></i>><i> <br></i>><i> If your project requires more resources than are available as your fair share on<br></i>><i> the toolserver, then either the need for resources needs to be reduced, or the<br>
</i>><i> project has to run elsewhere. If there are good reasons and sufficient funding,<br></i>><i> setting aside a VM or even a full server for a special project can be<br></i>><i> considered. How individual projects and chapters can participate more in the<br>
</i>><i> givernance (and funding) of the toolserver is one of the topics that will be<br></i>><i> discussed at the upcoming chapter's conference in april in berlin. I suggest you<br></i>><i> contact someone who will attend the meeting, and discuss the issue with them.<br>
</i>><i> <br></i>><i> Anyway, if using MySQL's memory tables consumes too much resources, perhaps<br></i>><i> consider alternatives? Have you looked at network analysis frameworks like JUNG<br></i>><i> (Java) or SNAP (C++)? Relational databases are not good at managing linked<br>
</i>><i> structures like trees and graphs anyway.<br></i>><i> <br></i>><i> The memory requirements shouldn't be that huge anyway: two IDs per edge = 8<br></i>><i> byte. The German language Wikipedia for instance has about 13 million links in<br>
</i>><i> the main namespace, 8*|E| would need about 1GB even for a naive implementation.<br></i>><i> With a little more effort, it can be nearly halved to 4*|E|+4*|V|.<br></i>><i> <br></i>><i> I have used the trivial edge store for analyzing the category structure before,<br>
</i>><i> and Neil Harris is currently working on an nice standalone implementation of<br></i>><i> this for Wikimedia Germany. This should allow recursive category lookup in<br></i>><i> microseconds.<br></i>><i> <br>
</i>><i> <br></i>><i> In any case, something needs to change. You can't expect to be frequently using<br></i>><i> 1/8 of the toolserver's RAM. Even more so since this amount of memory can't be<br></i>><i> used by MySQL for caching while you are not using it (because of the way the<br>
</i>><i> innodb cache pool works).<br></i>><i> <br></i>><i> <br></i>><i> Regards,<br></i>><i> Daniel<br></i>><i> <br></i><br><br>Vladimir Medeyko schrieb:<br>><i> Dear colleagues,<br></i>><i> <br></i>><i> I've heard that Golem bot, which is the heart of the connectivity<br>
</i>><i> project, stopped to function due to the recent toolserver reconfiguration.<br></i>><i> <br></i>><i> Is it possible to adjust configuration specifically for Golem or to do<br></i>><i> something else to make it function again?<br>
</i>><i> <br></i>><i> It is especially a pity that the connectivity project has problems now,<br></i>><i> just two days after the project was reported at Konferencija Wikimedia<br></i>><i> Polska and received much of interest from the listeners.<br>
</i>><i> <br></i>><i> What could be done to fix the situation? Thanks!<br></i>><i> </i></pre><br><br clear="all"><br>-- <br> Медейко Владимир Владимирович<br> НП "Викимедиа РУ"<br> Директор<br> тел. +7-921-940-39-79<br>