Fri, 04 Mar 2011 14:53:50 +0100, Krinkle krinklemail@gmail.com wrote:
On March 4 2011, Seb35 wrote:
Fri, 04 Mar 2011 03:45:53 +0100, MZMcBride z@mzmcbride.com wrote:
Seb35 wrote:
I'm from the French chapter and we need sometimes a lot of CPU power and/or a lot of memory for some projects. For now it happened two times:
It's difficult to know what "a lot" of CPU power or memory is from your post. Toolserver accounts have account limits (https://wiki.toolserver.org/view/Account_limits), so if you're staying within those limits, there's generally no problem. If you want to exceed those limits, you should talk to the Toolserver roots first (https://wiki.toolserver.org/view/System_administrators). There are places like /mnt/user-store that can be used for large media storage as well.
As always, the Toolserver resources that you use need to relate to Wikimedia in some way, but it sounds like both of your projects do. :-)
MZMcBride
Ok, thank you, I didn't find this page.
For the BnF project we needed in fact about one day of computation (most of the time was used by the disk accesses), but we thought it would be more (we optimized too by using SAX instead of DOM to read big XML files, it used too much memory with DOM too). For the video encoding to OGV (it's not me who done that), it was 4-5 hours for a single video but some time was used to swap (and there are 100 videos corresponding to the conferences).
Thank you for the response. Seb35
Hi Seb35,
"One day" or "4-5 hours" still don't mean a lot in terms of technical requirements. One day of computing with what equipment ? With 24 hours of runtime a small difference can make a big difference. What kind of server server/setup did this run on ?
How much is "too much memory" ?
We needed to transform and crop TIFF images, read an XML associated with a book containing the OCRized text of the digitized book, and create a DjVu with the images and the text layer.
For that we rent a server, I cannot remember exactly the hardware we choosed, but it was probably a 4-core (or 8-core) with 4GB (or 8GB) of RAM and 200-300GB of disk (and a server bandwith, useful to download the files from the FTP of the BnF, about 500 files by book (1 XML/page + TIFF multipage + some others) x 1416 books = 2-3 days of download on the server because of many small files).
From what I remember, "Too much memory" means my laptop (2-core 2.8GHz, 3GB of RAM) on which I developed the (Python) program had difficulies to load the whole XML file (with DOM). Then I tried with SAX and the work was done in some seconds without a lot of memory (I didn't used SAX before, but I ♥ SAX now :-)
We wrote a technical report about that, but didn't published it for now (perhaps a day, I hope), you can see http://commons.wikimedia.org/wiki/Commons:Bibliothèque_nationale_de_France for an "outreach" document and https://fisheye.toolserver.org/browse/Seb35/BnF_import for the Python program.
Seb35