Offline-l April 2009

offline-l@lists.wikimedia.org

7 participants
15 discussions

[openZIM dev-l] windows status of zimlib
by Tommi Mäkitalo 15 Apr '09

15 Apr '09

Hi, looks like I have to clearify some things about windows compatiblity of zimlib. Zimlib is developed on linux but not for linux. It is as far as possible platform indipendent. To compile zimlib you first need to compile cxxtools. Both use autoconf and automake as the build system. This is a platform indipendend build system. The only prerequisites you need is a (bourne- compatible) shell and some simple tools like awk, sed and make. Cxxtools requires a c++-compiler. Zimlib depends on libz and libbz2. All of these are easily installable by your package manager of your operating system. As you know, windows does not fulfil these simple requirements. It does not have a shell, nor sed or awk. It even does not have a C++-compiler or a package manager. So it is not that easy to just "emerge libz" or "apt-get install zlib-dev" or whatever. This is the main problem in porting zimlib to windows. You have to generate your build stuff on your own and manually install zlib, libbz2 and a c++-compiler. All this is freely availabe. Ok - there is at least 2 things which won't work out of the box on windows. These are libiconv, which is needed by cxxtools and used in zimreader and opendir/readdir/closedir. The libiconv stuff is not needed for zimlib. This can be excluded from the build of cxxtools if you write your build system. Opendir/readdir/closedir is abstracted away by cxxtools. We just need a windows implementation. Fortunately this opendir/readdir/closedir-stuff is adopted from pt-framework, which has this windows implementation. It is just a matter of adopt this into cxxtools. Mutexes were used in libzeno but are not any more used in zimlib. There may be some other problems, which need to be resolved, but I don't think there are real showstoppers. The work just need to be done. Nothing really difficult. The patches I got from Guillaume mainly copied the pt-framework-directory-stuff into cxxtools. The other thing was to remove the mutex stuff from libzeno. This was really not much work. The main problem is to set up the build environment. This was not done. A Makefile for windows would be helpful, but I haven't got one from Guillaume. Or a document "howto setup my windows environment to compile cxxtools and libzeno". Tommi

2 2

[openZIM dev-l] new zimlib code checked in
by Tommi Mäkitalo 13 Apr '09

13 Apr '09

Hi, I would like to inform you, that I have reached a major milestone with the new zim format: I created successfully a zim file and read it with zimDump. The changes are: * rewritten large parts * updated the zim file format * redesigned zimwriter Let me say some words about these changes and why I did this. * Rewritten large parts: Rewrite helped me to improve code quality. With my knowledge of today and my experience with the zeno file format, it was possible to clean up the library code. * Updated the zim file format: Since we decided to leave the compatibility I rethought some parts of the zeno file format. The zeno file format did not support clustering of articles to get better compression. I did a minor change and added a offset and size to the directory entry of the article. The offset to the data blob was left in the article. But now multiple articles pointed to the same blob. In the new format I added another datastructure: the chunk, which is a collection of blobs. We have a pointerlist similar to the directory pointer list, which points to the chunks. The article addresses his blob by chunk number and blob number. Also redirect entries do not need these pointers at all. I just skipped them. This saves some bytes for each redirect. * Redesign zimwriter: Now the source of articles is abstracted from the generator. Also the database is not used any more for temporary data. The writer builds the directory entries in memory and uses a temporary file to collect the compressed data. This will improve performance significantly. The cavet is, that more RAM is used, but I estimated, that we have enough even for very large zim files. The abstraction of data source gives us the opportunity implement other sources easier, e.g. read data from the file system or wikipedia dumps without using the database at all. I hope this will motivate you to go on dumping data, so that we soon can start testing. There is still quite some work to do for me. I need to make the zimreader working again. And the next big task is the full text index. My plan is to read the data from zim files directly and add the full text index to the zim files in a separate step or optionall generate a separate zim file for the index as it was done with the german wikipedia DVD. Tommi

3 5

[openZIM dev-l] layout of pages in zim files
by Tommi Mäkitalo 12 Apr '09

12 Apr '09

Hi, I'm thinking about the layout of pages in zim files. I have some ideas, what to do and I would like to share thes with you. Especially I would like to hear your expectations about the content of zim files. Let me explain the problem. In the german wikipedia DVD the layout of the page was partly hardcoded into the reader. The html-frame, css, images and javascript-files were compiled into the application. With a special namespace '-', these files were accessable. E.g. "/-/monobookde.css" loads the css file from the application. It is easy to move these into the zim file. The html-frame is more difficult, since it contains dynamic parts like "<title>...</title>" or the actual article text. So to move this page into the zim file, we need some placeholders, which need to be parsed at runtime. My plan is to introduce a special syntax for these placeholders. The tag "<%something%>" may be replaced. This "something" need to be defined. This syntax may be used in the layout page, which is already in the zim file header as well as in arbitrary pages. We might also add a special mime type in addition to zimMimeTextHtml, where the zimlib parses these tags. This "something" may be: <%title%> title of the page <%url%> the url of the page (e.g. /A/Linux) <%namespace%> the namespace <%/A/Linux%> insert another article here <%content%> placeholder for the article content in the layout page ... (maybe more in the future) In the zim lib we have a class "zim::Article". This has a method "getData()", which returns the article data of the page. I would add a new method "getPage()", which uses the layout page to return the complete page. This layout page should only be used, when the mime type is zimMimeTextHtml. This way the creator of the zim file can specify, how to show pages without repeating the html-header and footer on each page. A reader may ignore this layout if wanted. Tommi

2 1

[openZIM dev-l] Support by Wikimedia Foundation? Phone conference at April 16, 20:00 CEST
by Manuel Schneider 11 Apr '09

11 Apr '09

Dear openZIM developers team! As I wrote last week I was at the Wikimedia Conference in Berlin. == Wikimedia Israel / hebrew Wikipedia on OLPC == One result was that I meet Asaf Bartov from Israel who is working on hebrew Wikipedia for the One Laptop Per Child project (OLPC, XO). I haven't heard from him since I came back, but I am pretty sure we will hear from him after the Easter holidays. I keep him up-to-date by forwarding the most relevant mails from this mailinglist. == Wikimedia Italia / italian Wikipedia on DVD == I also talked to Frieda Brioschi from Wikimedia Italia concerning their Wikipedia DVD. They have a company in Italy which created a proprietary software with GUI for the DVD and an "unknown" storage for the data (reminds me of the Directmedia approach in the first place). She said the software was "awkward". They plan to have a new DVD this year and the company promised to write a new software which is better. Frieda stated that WMIT is not yet sure if they will sign a contract with this company again. After talking about openZIM Frieda said that she will take that into account. I offered her that WMIT can make their DVD on its own by using our software and maybe Kiwix, or they point the company to our software which they can use and integrate zimlib or she may also ask Emmanuel - maybe he would be willing to make a DVD for WMIT. == Wikimedia Polska / polish Wikipedia on DVD == Wikimedia Polska had a DVD using HTML dumps and a Java applet as search engine. Didn't sell, no current plans on having a new DVD. == Wikimedia Foundation Conference Call == Erik Möller (Deputy Director of Wikimedia Foundation) contacted me after the conference by mail, stating that he heard from openZIM and would like to talk to me about that. Quote: "Since it seems like prior offline reader efforts are merging into it, I would like to understand the goals and deliverables of the project better, and also discuss whether we could support it in any fashion." We have now fixed a date with James Owen, his assistant, for a conference call at Thursday evening, April 16th at 20:00 our local time (CEST). I would like to collect some statements and ideas how Wikimedia Foundation could help us. To do that in a collaborative way I created http://openzim.org/Wikimedia_Foundation_Relationship and will now start filling in what I have in mind, but invite you to add what you feel is important to mention from your point of view. Thanks for your attention, Manuel -- Regards Manuel Schneider Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch

2 2

[openZIM dev-l] status - ZimReader now working
by Tommi Mäkitalo 11 Apr '09

11 Apr '09

Hi, the ZimReader is fixed and I can browse through the zim file I created from Josch's dewiki-Dump. There were some smaller bugs in the reader as well as in the library, but the main bug was in cxxtools. So if you want to test the reader please update cxxtools. There are still some tasks to do in the reader. There is many hardcoded stuff from the old wikipedida DVD, like the text "DVD-ROM-Ausgabe 2007" in the title area and a reference to Directmedia. I also updated the status and next steps page in our wiki. Tommi

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Offline-l April 2009