Fri, 04 Mar 2011 14:53:50 +0100, Krinkle <krinklemail(a)gmail.com> wrote:
On March 4 2011, Seb35 wrote:
Fri, 04 Mar 2011 03:45:53 +0100, MZMcBride
<z(a)mzmcbride.com> wrote:
Seb35 wrote:
I'm from the French chapter and we need
sometimes a lot of CPU power
and/or a lot of memory for some projects. For now it happened two
times:
It's difficult to know what "a lot" of CPU power or memory is from
your
post. Toolserver accounts have account limits
(<https://wiki.toolserver.org/view/Account_limits>), so if you're
staying
within those limits, there's generally no problem. If you want to
exceed
those limits, you should talk to the Toolserver roots first
(<https://wiki.toolserver.org/view/System_administrators>). There are
places
like /mnt/user-store that can be used for large media storage as
well.
As always, the Toolserver resources that you use need to relate to
Wikimedia
in some way, but it sounds like both of your projects do. :-)
MZMcBride
Ok, thank you, I didn't find this page.
For the BnF project we needed in fact about one day of computation
(most
of the time was used by the disk accesses), but we thought it would be
more (we optimized too by using SAX instead of DOM to read big XML
files,
it used too much memory with DOM too).
For the video encoding to OGV (it's not me who done that), it was 4-5
hours for a single video but some time was used to swap (and there
are 100
videos corresponding to the conferences).
Thank you for the response.
Seb35
Hi Seb35,
"One day" or "4-5 hours" still don't mean a lot in terms of
technical
requirements.
One day of computing with what equipment ? With 24 hours of runtime a
small
difference can make a big difference. What kind of server server/setup
did this run
on ?
How much is "too much memory" ?
We needed to transform and crop TIFF images, read an XML associated with a
book containing the OCRized text of the digitized book, and create a DjVu
with the images and the text layer.
For that we rent a server, I cannot remember exactly the hardware we
choosed, but it was probably a 4-core (or 8-core) with 4GB (or 8GB) of RAM
and 200-300GB of disk (and a server bandwith, useful to download the files
from the FTP of the BnF, about 500 files by book (1 XML/page + TIFF
multipage + some others) x 1416 books = 2-3 days of download on the server
because of many small files).
From what I remember, "Too much memory" means my laptop (2-core 2.8GHz,
3GB of RAM) on which I developed the (Python) program had difficulies to
load the whole XML file (with DOM). Then I tried with SAX and the work was
done in some seconds without a lot of memory (I didn't used SAX before,
but I ♥ SAX now :-)
We wrote a technical report about that, but didn't published it for now
(perhaps a day, I hope), you can see
<http://commons.wikimedia.org/wiki/Commons:Bibliothèque_nationale_de_France>
for an "outreach" document and
<https://fisheye.toolserver.org/browse/Seb35/BnF_import> for the Python
program.
Seb35