Hi,
I created a openzim file from the german wikipedia dump I became from Josch last year. The file contains all articles from the german wikipedia without images and its size is 1,3G (or more precisely 1302052315 bytes). Generation took only about 1:10 on our server with the new zimwriter. It could be even improved by parallizing the compression phase, since this is CPU bound and takes the most time, but I feel, that it is not necessary. There are more important task to do.
You can download the file from http://www.openzim.org/download/dewiki.zim.
The zimreader (the tntnet based webapplication) is almost working with that file. There are some bugs to fix, but this will be done soon.
Emmanuel: the file is updated. I fixed some bugs. The zimDump crashed when reading redirects and the writer failed to generate redirects correctly.
Josch: do you have an updated dump?
Tommi
These are great news.
Am Dienstag, 7. April 2009 22:54:41 schrieb Tommi Mäkitalo:
I created a openzim file from the german wikipedia dump I became from Josch last year. The file contains all articles from the german wikipedia without images and its size is 1,3G (or more precisely 1302052315 bytes). Generation took only about 1:10 on our server with the new zimwriter. It could be even improved by parallizing the compression phase, since this is CPU bound and takes the most time, but I feel, that it is not necessary. There are more important task to do.
How much is 1:10 now in absolute numbers? Can you please have a look into http://www.openzim.org/Status_and_next_steps and strike the things which have been done or add other things, which aren't yet on the list?
You can download the file from http://www.openzim.org/download/dewiki.zim.
Please allow me to rename that file soon to wikipedia-de.zim. I try to establish a scheme for filenames which take into account that we will later host all kind of different ZIM files.
The zimreader (the tntnet based webapplication) is almost working with that file. There are some bugs to fix, but this will be done soon.
Is it just the zimreader or in libzim? What about the status of Kiwix?
Josch: do you have an updated dump?
I just had a look into the members list of this mailinglist. Josch is currently not subscribed to this list. Tommi, can you handle that, please?
Once the current holidays in Israel are over I assume that we will have Asaf on this list as well. I told the news to one of his colleagues at Wikimedia Israel.
Greets,
Manuel
Hi,
On Mittwoch 08 April 2009 12:05:14 Manuel Schneider wrote:
These are great news.
Am Dienstag, 7. April 2009 22:54:41 schrieb Tommi Mäkitalo:
I created a openzim file from the german wikipedia dump I became from Josch last year. The file contains all articles from the german wikipedia without images and its size is 1,3G (or more precisely 1302052315 bytes). Generation took only about 1:10 on our server with the new zimwriter. It could be even improved by parallizing the compression phase, since this is CPU bound and takes the most time, but I feel, that it is not necessary. There are more important task to do.
How much is 1:10 now in absolute numbers?
It is one hour and 10 minutes. I feel, that this is pretty fast.
Can you please have a look into http://www.openzim.org/Status_and_next_steps and strike the things which have been done or add other things, which aren't yet on the list?
I will update the status page. I have to reorganize some of the tasks.
You can download the file from http://www.openzim.org/download/dewiki.zim.
Please allow me to rename that file soon to wikipedia-de.zim. I try to establish a scheme for filenames which take into account that we will later host all kind of different ZIM files.
You can do that. Can you also give me/us some advise, how to put zim files for download. I just created a download-directory in mediawiki. Is there a better way?
The zimreader (the tntnet based webapplication) is almost working with that file. There are some bugs to fix, but this will be done soon.
Is it just the zimreader or in libzim? What about the status of Kiwix?
zimlib is mostly finished. Mostly some testing and bugfixing needed. In zimlib we have zimDump to read and test zimfiles from the command line. I'm currently working on the webaplication zimreader.
I think Emmanuel is porting Kiwix to the new zimlib. Emmanuel can tell more for sure.
Josch: do you have an updated dump?
I just had a look into the members list of this mailinglist. Josch is currently not subscribed to this list. Tommi, can you handle that, please?
Once the current holidays in Israel are over I assume that we will have Asaf on this list as well. I told the news to one of his colleagues at Wikimedia Israel.
Greets,
Manuel
I will keep you up to date here and also update the wiki from time to time. It was helpful even for me to update the page about the zim file format. This should be up to date now.
Tommi
Am Mittwoch, 8. April 2009 14:26:25 schrieb Tommi Mäkitalo:
How much is 1:10 now in absolute numbers?
It is one hour and 10 minutes. I feel, that this is pretty fast.
ah, I see. I interpreted 1:10 as 1 10th of the time before. As far as I know it was around 25 hours before, right?
Can you please have a look into http://www.openzim.org/Status_and_next_steps and strike the things which have been done or add other things, which aren't yet on the list?
I will update the status page. I have to reorganize some of the tasks.
Thx.
Please allow me to rename that file soon to wikipedia-de.zim. I try to establish a scheme for filenames which take into account that we will later host all kind of different ZIM files.
You can do that. Can you also give me/us some advise, how to put zim files for download. I just created a download-directory in mediawiki. Is there a better way?
I think www.openzim.org/download/ is a good URL, isn't it?
I thought about categorysing the ZIM files, like "knowledge", "literature", "news" etc. We will definitely do that in the Wiki, but maybe we should have the same directory structure as well?
zimlib is mostly finished. Mostly some testing and bugfixing needed. In zimlib we have zimDump to read and test zimfiles from the command line. I'm currently working on the webaplication zimreader.
Ok, I was just wondering if the bugs in zimreader were more likely bugs in zimlib, as the zimreader doesn't really handle the ZIM format. But well, I understand that the API also changed.
I think Emmanuel is porting Kiwix to the new zimlib. Emmanuel can tell more for sure.
As I wrote in another mail: I saw in the svn that all the zeno-source files have been replaced by zim.
I think we should have all kind of reader applications on the DVD as we have. So it would be nice having Kiwix ready for the DVD by May.
I will keep you up to date here and also update the wiki from time to time. It was helpful even for me to update the page about the zim file format. This should be up to date now.
That's great. We need to keep our site up-to-date - documentation as well as subversion. Visitors want to see progress and what we're working on. And I think we could use some more developers and users :-)
Manuel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Manuel Schneider a écrit :
The zimreader (the tntnet based webapplication) is almost working with that file. There are some bugs to fix, but this will be done soon.
Is it just the zimreader or in libzim? What about the status of Kiwix?
I have downloaded the file and test it: * Contents are readable. * I have many problem with encoding, but HTML contents are not valid (there is only the body). I will test again as soon a new dump with the templating system is available and see if problem occur again. * Main problem with Kiwix and such big ZIM file is the builtin search engine indexer : this is too slow, but this part is not optimized yet.
Emmanuel