[openZIM dev-l] Benchmarks

List overview All Threads
Download

newer

older

Re: [openZIM dev-l] Benchmarks

[openZIM dev-l] another update in...

Tommi Mäkitalo

24 Dec 2009 24 Dec '09

5:08 a.m.

Hi,

I've done some benchmarking. I have created 2 zim files from my collection of 640000 articles. One with bzip2 and one with lzma. I burnt both files on a DVD. A zim benchmark program (can be found at zimlib/zimDump/zimBench) gives interesting results. The benchmark program reads linear and random access. The linear results are not that interesting but the random access.

Reading the bzip2 compressed file gives me about 12 articles per second. Lzma about 38. So uncompressing lzma is much faster.

Creating the files took with bzip2 2:09 and with lzma 3:25.

Size is almost identical (both 1.5G).

Zimlib manages 2 caches. One for directory entries and one for uncompressed data. Varying them makes no big difference. Looks like the OS cache does a good job already. This may of course look different on other hardware. I had a fast CPU and a slow device.

Tommi

Show replies by date

Manuel Schneider

24 Dec 24 Dec

1:56 p.m.

Hi,

I really like your work on this, the analysis and the results.

Is it possible to run the same benchmark on the Ben NanoNote? 1.5 G should fit on the memory card and as far as I understood you the ZIM software has been already ported to the NN?

Because I wonder how the caches impact the result on the NN and what the optimal settings would be. As far as you say that the caches don't have a real impact on "big" hardware, we could just go with the optimal settings for the NN as defaults in the zimlib.

If you have some data from your analysis (like the actual result tables etc.) I ask you to just put them into the wiki. When I have time I will create a nice Benchmark article where we illustrate why we have choosen algorythms, compression and cache sizes as they are.

Have a good time,

Manuel

Am 23.12.2009 23:08, schrieb Tommi Mäkitalo:

...

Hi,

I've done some benchmarking. I have created 2 zim files from my collection of 640000 articles. One with bzip2 and one with lzma. I burnt both files on a DVD. A zim benchmark program (can be found at zimlib/zimDump/zimBench) gives interesting results. The benchmark program reads linear and random access. The linear results are not that interesting but the random access.

Reading the bzip2 compressed file gives me about 12 articles per second. Lzma about 38. So uncompressing lzma is much faster.

Creating the files took with bzip2 2:09 and with lzma 3:25.

Size is almost identical (both 1.5G).

Zimlib manages 2 caches. One for directory entries and one for uncompressed data. Varying them makes no big difference. Looks like the OS cache does a good job already. This may of course look different on other hardware. I had a fast CPU and a slow device.

Tommi _______________________________________________ dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

-- Regards Manuel Schneider Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch

Tommi Mäkitalo

4:31 p.m.

Hello,

I have difficulties to get into the NanoNote development. I have not yet found a good way to really develop software for it. But as soon as I get into it I really want to do that benchmarking. I have already a 4 GB card in my NanoNote, so there are no problems getting the files to it. ZimReader and zimDump already works on my device.

My plan is to automate the test a little, so I will get good results, which can be published. Currently it is too much manual testing. Good testing is not that easy as it sounds. I have unreliable results e.g. due to caching of the OS. I have to unmount the DVD between the tests to get better results. And even then longer running tests tend to be faster since the OS caches more and more. But the real-world scenario is indeed getting just a few random accessed articles.

By the way what do you think; should we drop zlib and bzip2 compression completely? We do not depend on zlib and bzip2 libraries any more. We have already dropped compatibility.

Lzma is the fastest and compresses as good as bzip2. The disadvantage is, that we really depend of a very new and not yet released lzma library?

Tommi

Am Donnerstag, 24. Dezember 2009 07:56:24 schrieb Manuel Schneider:

...

Hi,

I really like your work on this, the analysis and the results.

Is it possible to run the same benchmark on the Ben NanoNote? 1.5 G should fit on the memory card and as far as I understood you the ZIM software has been already ported to the NN?

Because I wonder how the caches impact the result on the NN and what the optimal settings would be. As far as you say that the caches don't have a real impact on "big" hardware, we could just go with the optimal settings for the NN as defaults in the zimlib.

If you have some data from your analysis (like the actual result tables etc.) I ask you to just put them into the wiki. When I have time I will create a nice Benchmark article where we illustrate why we have choosen algorythms, compression and cache sizes as they are.

Have a good time,

Manuel

Am 23.12.2009 23:08, schrieb Tommi Mäkitalo:

...
Hi,

I've done some benchmarking. I have created 2 zim files from my collection of 640000 articles. One with bzip2 and one with lzma. I burnt both files on a DVD. A zim benchmark program (can be found at zimlib/zimDump/zimBench) gives interesting results. The benchmark program reads linear and random access. The linear results are not that interesting but the random access.

Reading the bzip2 compressed file gives me about 12 articles per second. Lzma about 38. So uncompressing lzma is much faster.

Creating the files took with bzip2 2:09 and with lzma 3:25.

Size is almost identical (both 1.5G).

Zimlib manages 2 caches. One for directory entries and one for uncompressed data. Varying them makes no big difference. Looks like the OS cache does a good job already. This may of course look different on other hardware. I had a fast CPU and a slow device.

Tommi _______________________________________________ dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

5447

Age (days ago)

5448

Last active (days ago)

offline-l@lists.wikimedia.org

2 comments

2 participants

tags (0)

participants (2)

Manuel Schneider
Tommi Mäkitalo