No subject


Wed Mar 24 19:33:50 UTC 2010


SET @b=3D@@a;
SET @@a=3D2*@b;

# crazy code
IF @@a!=3D2*@b
THEN
  CALL error();
END IF

> =C2=A0When we discussed this by email, I had
> the impression you had reduced its memory usage a lot -- if it's using
> 4GB *after* being reduced, I hate to think what it required
> previously...

Ok, here I need to tell a few words about what Golem is and what and
how it does.
Golem in general is a tool for isolated article clusters recognition
and for suggestions generation on how to improve wikipedia
connectivity. It also performs some supplementary functions such as
analysis of links to disambiguation pages and categorytree cycles
recognition. One can try it starting from this page:
http://toolserver.org/~lvova/cgi-bin/go.sh?interface=3Den.

Golem's function is split into a number of subsequent stages, each
stage depends on data obtained on previous stages. The very initial
phase is caching of links from language database into memory tables:
page links, category links, template links and, of course, zero
namespace pages, category pages, template pages.
Memory requirements for this stage are depend only on size of the
wiki. For deutch wikipedia golem requests to allow memory tables to be
of 4 GB in size, for russian wikipedia 1 GB is enoug and for smaller
wikis it can fit to default limits. For english wikipedia I do not run
the analysis as too hard.

After the first stage is completed other stages utilize much less with
an exception for the very last stage called iwiki spy. The purpose of
iwiki spy is to analyze interwiki links from isolated articles to
other languages and spy possible linking suggestions there or find
articles to be translated in order to set links from them.

Iwiki spy requires a lot of memory for relatively small wikis (war was
the worst case I've seen) because there are lots of suggestions coming
for its isolated articles from everywhere. That's why the limit for
iwiki spy is always set to 4 GB.

One more thing to take into account is that there are two persons on
ts who regularily run Golem. One of them is lvova, she have an
official copy linked from wikpedia templates and other pages. My copy
is just for development.

As you remember, we've experienced issues with memory during last few
weeks. All that issues correllate with situations when both, me and
lvova ran the bot together (each requesting for up to 4 GB for
temorary data) and both worked on relatively small languages. In such
a situation two iwiki spies requested too much memory together.

Now I just disabled interwiki spy stage in my copy of the bot in order
to let lvova's copy providing isolated articles linking suggestions.
After this change the bot successfully analyzed whole list of
available languages and neither hung nor utilize too much of memory.
Lvova's copy at the same time was also running with iwiki spy on.

My intention was to start iwiki spy again for both copies after
rewriting (which is possible but may take some time).

The current status of the tool is:
1. it does work for languages on s2/s5 because the limitation on
allowed memory size there is not on
2. it does not work for relatively big lanuages on s3/s6 even for the
first stage, links and pages caching. I have tried fr as an example
and the bot hang trying to increase the allowed heap table size up to
values around 512 MB for category pages caching.
3. golem's web page works well with small exceptions caused by a
number of SQL procedures not moved to the new server, the example can
be seen here: http://toolserver.org/~mashiah/cgi-bin/go.sh?language=3Dru&in=
terface=3Den&listby=3Dsuggest,title&title=3D%D0%91%D0%BE%D1%83%D0%BC%D0%B5%=
D0%BD,%20%D0%93%D0%B0%D1%80%D1%80%D0%B8%20%D0%94%D0%B6%D0%BE%D0%B7%D0%B5%D1=
%84
4. I am really in a difficult situation with the latest configuration
change because from now on I think the connectiwity project running in
russian wiki have no data it had for last two years. Ukrainian
comunity actively used golem's data for approximately a year. Poland
community just familiarized with Golem's functions during national
wiki conference will not be able to use functions just introduced to
them. Work other people do on translating Golem's interface to deutch
is probably nor longer required.

Sorry if something looks too emotional in this message; indeed too
much efforts were spent to implement it and I am really not sure "way
too much" statement is well proved to cause the restrictions just
introduced.

mashiah



More information about the Toolserver-l mailing list