Hello
Hm... I knew I forget to mention something. Yes, maximum_max_heap_table_size is now set to 128MB.
This disables my tool to be functional.
4GB is *way* too much; we don't have even close to that much free memory available on the MySQL servers.
As far as I've seen during last two years, it was ok. To be more precise, I am not sure it really used 4 GB, but for codespots where significant amount of data is to be cached the bot sometimes increases allowed memory size up to this limit. Of course, I did not allow to increase it up to infinity. The described functionality for memory limit change can be seen here: https://fisheye.toolserver.org/browse/golem/isolated/memory.sql?r=HEAD
From now on I'll add there something like
SET @b=@@a; SET @@a=2*@b;
# crazy code IF @@a!=2*@b THEN CALL error(); END IF
When we discussed this by email, I had the impression you had reduced its memory usage a lot -- if it's using 4GB *after* being reduced, I hate to think what it required previously...
Ok, here I need to tell a few words about what Golem is and what and how it does. Golem in general is a tool for isolated article clusters recognition and for suggestions generation on how to improve wikipedia connectivity. It also performs some supplementary functions such as analysis of links to disambiguation pages and categorytree cycles recognition. One can try it starting from this page: http://toolserver.org/~lvova/cgi-bin/go.sh?interface=en.
Golem's function is split into a number of subsequent stages, each stage depends on data obtained on previous stages. The very initial phase is caching of links from language database into memory tables: page links, category links, template links and, of course, zero namespace pages, category pages, template pages. Memory requirements for this stage are depend only on size of the wiki. For deutch wikipedia golem requests to allow memory tables to be of 4 GB in size, for russian wikipedia 1 GB is enoug and for smaller wikis it can fit to default limits. For english wikipedia I do not run the analysis as too hard.
After the first stage is completed other stages utilize much less with an exception for the very last stage called iwiki spy. The purpose of iwiki spy is to analyze interwiki links from isolated articles to other languages and spy possible linking suggestions there or find articles to be translated in order to set links from them.
Iwiki spy requires a lot of memory for relatively small wikis (war was the worst case I've seen) because there are lots of suggestions coming for its isolated articles from everywhere. That's why the limit for iwiki spy is always set to 4 GB.
One more thing to take into account is that there are two persons on ts who regularily run Golem. One of them is lvova, she have an official copy linked from wikpedia templates and other pages. My copy is just for development.
As you remember, we've experienced issues with memory during last few weeks. All that issues correllate with situations when both, me and lvova ran the bot together (each requesting for up to 4 GB for temorary data) and both worked on relatively small languages. In such a situation two iwiki spies requested too much memory together.
Now I just disabled interwiki spy stage in my copy of the bot in order to let lvova's copy providing isolated articles linking suggestions. After this change the bot successfully analyzed whole list of available languages and neither hung nor utilize too much of memory. Lvova's copy at the same time was also running with iwiki spy on.
My intention was to start iwiki spy again for both copies after rewriting (which is possible but may take some time).
The current status of the tool is: 1. it does work for languages on s2/s5 because the limitation on allowed memory size there is not on 2. it does not work for relatively big lanuages on s3/s6 even for the first stage, links and pages caching. I have tried fr as an example and the bot hang trying to increase the allowed heap table size up to values around 512 MB for category pages caching. 3. golem's web page works well with small exceptions caused by a number of SQL procedures not moved to the new server, the example can be seen here: http://toolserver.org/~mashiah/cgi-bin/go.sh?language=ru&interface=en&am... 4. I am really in a difficult situation with the latest configuration change because from now on I think the connectiwity project running in russian wiki have no data it had for last two years. Ukrainian comunity actively used golem's data for approximately a year. Poland community just familiarized with Golem's functions during national wiki conference will not be able to use functions just introduced to them. Work other people do on translating Golem's interface to deutch is probably nor longer required.
Sorry if something looks too emotional in this message; indeed too much efforts were spent to implement it and I am really not sure "way too much" statement is well proved to cause the restrictions just introduced.
mashiah