Francesco Cosoleto ha scritto:
I had always asked myself why we have adopted this solution because I have a doubt about the amount of RAM requested by mediawiki-messages that the bot actually use. I think a list of items not to discard would have been simpler. Although I have really appreciated this more sophisticated solution.
It requires about 1-2 Kb for site on wikipedia family, this family has a total of 255 sites. Looks for me as acceptable memory usage (recently I have reduced memory requested by wikipedia module of about 60 Kb with r6751 if I remember rightly...). And with diskcache enabled are 50 Mb or more of diskspace wasted (software should use temporary files only if really needed).
Simple test script:
grep --exclude-dir=.svn -rohP ".mediawiki_message\s*(\s*['"][^)]+)" ./ | sort | uniq | sed -e "s/^/ sum += len(site/" -e "s/$/)/" -e 1i"import wikipedia\nsum = 0\nfor lang in wikipedia.Site('en', 'wikipedia').languages():\n site = wikipedia.getSite(lang, 'wikipedia')" -e "$a\ print sum" >mwmsg_length.py
(I have disabled 'sp-contributions-older' line to run it as it raises an exception on wikipedia:gv)