Offline-l April 2009

offline-l@lists.wikimedia.org

7 participants
15 discussions

[openZIM dev-l] Postgresql help in creating ZIM files

by Asaf Bartov

Hello, everyone. As Manuel mentioned some time ago, I'm from Wikimedia Israel, and we're interested in trying the OpenZIM tools for a static version of the Hebrew Wikipedia we are working on, for inclusion in the Israeli government's One Laptop Per Child (OLPC) project. So far, I've been able to compile and run all the tools, but am having some trouble creating ZIM files: I've dumped a locally-loaded Mediawiki installation to HTML using these instructions <http://openzim.org/Wiki2html>, but when I try to run the builZimFileFromDirectory.pl script, I get silly postgresql errors about failing to connect using "kiwix" user. I should probably create a user, or a database, or both, but I've never used postgresql (I use mysql), and despite trying some combinations of commands to create a 'kiwix' user, with and without a password, couldn't get the DBI::connect call to succeed. I am sure you are all familiar with postgresql and can give me a quick pointer on how to proceed. I'm using a modern unstable Debian box, with a default packager-supplied installation of postgresql and the related Perl DBD module. Thanks in advance, Asaf Bartov -- Asaf Bartov <asaf(a)forum2.org>

14 years, 10 months

Re: [openZIM dev-l] Wikimedia-updated MediaWiki instance for dumping

by Manuel Schneider

Hi Thomasz, thanks for your answer. I was thinking of a wiki on a server which is not part of the cluster but configured the same way as the productive Wikimedia wikis. Especially the updates of the software is the issue, so the installation is always suitable for Wikimedia content (Mediawiki version, extensions etc.). Then we need the possibility to maintain our own skin and our own version of wiki2html on that server (I think both should become part of Mediawiki anyway). The initial idea was that we use it as a blank setup where we can import the latest dump of whatever wiki we want to convert to ZIM. But if a database connection to the Wikimedia cluster would be possible this would boost the whole process a lot. But after thinking through all this I am at the point where I see the big solution: Put the zimwriter into Mediawiki / the Wikimedia dumps and we're set... What do you think? At the moment the DVD for LinuxTag in June is priority and the zimwriter is still work in progress, but we need a solution for current Wikipedia dumps. Greets, Manuel -- Urspr. Mitt. -- Betreff: Re: [openZIM dev-l] Wikimedia-updated MediaWiki instance for dumping Von: Tomasz Finc <tfinc(a)wikimedia.org> Datum: 20.04.2009 22:29 Manuel Schneider wrote: > Dear Brion, > > is it possible that we can have a MediaWiki instance which is being kept > up-to-date by the regular synchronisation on the WM cluster, so we can import > all types of dumps in there to make ZIM files from it? > > We will use a modified version of wiki2html and a special skin for the dumps, > so we need an extra instance but it should be always up-to-date with the > Wikimedia projects. This would be extremely helpful for us. > > Any ideas? > > Greets, > > > Manuel > Filling in since Brion is out in meetings today. I'm a little confused with what you are asking here. Are you requesting a machine that is configured the same as any one of our web server nodes to serve wikipedia traffic and connects to our databases but is not part of our general cluster? Or are you asking for a blank node to populate with different types of data that you can in turn generate ZIM files from? Let know and we can get something in place. --tomasz _______________________________________________ dev-l mailing list dev-l(a)openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

14 years, 11 months

[openZIM dev-l] WG: [LT2009] Project "openZIM" accepted

by Manuel Schneider

Dear all, we managed to get a booth at LinuxTag in Berlin this year (June 24th - 27th). This is good news! But it also means work - now we have to get ready for the Wikipedia DVD, the ISO must be ready by May 31st. Please keep me up to date with the current status. Tommi, did you hear about your talk submission? Cheers, Manuel -- Urspr. Mitt. -- Betreff: [LT2009] Project "openZIM" accepted Von: projects(a)linuxtag.org Datum: 29.04.2009 21:03 Dear Manuel Schneider, we are excited to inform you, that your application for the project openZIM qualified for a sponsored booth at this year's LinuxTag. Please note this is just the confirmation of our general acceptance of your submission. The size and location of the booth will be decided later mainly depending on the overall amount of accepted project submissions and the quality of the exhibition concept you provide us using the virtual conference center and in dialogue with the project committee. Even there is plenty of time until the LinuxTag's doors open from June 24-27, 2009, we encourage you to start your preparations right now. First you should ensure, that your projects' profile at the virtual conference center (vcc) is complete: https://vcc.linuxtag.org/cc.pl?rm=editproject;personality=project;id=15015 >From now on you can order exhibitor passes for your booth staff. In comparison to other events we have no hard limit for the amount of exhibitor passes, but we kindly ask you to order only a decent amount. We review all orders and approve them manually. The exhibitor passes have just enable its possessor to enter Berlin fairgrounds a little bit earlier than the public. Please do not order passes for friends or project members, which do not help at your booth. For them we will provide you enough eTickets, which are free tickets for all four days. To order exhibitor passes, please use the vcc: https://vcc.linuxtag.org/cc.pl?rm=exhibitorpass;personality=project;id=15015 We organize all information in the public LinuxTag Wiki. Please find all related information here: http://wiki.linuxtag.org/w/fp:Main_Page Before sending questions to the Project Committee we kindly ask you to have a look into the FAQ and the A-Z pages. We have subscribed you to our mailing lists for all participating free software projects: * projects-announce(a)linuxtag.org - A low traffic, read only list where all newsletters and information regarding the free projects participation at LinuxTag will be sent to. * projects-discussion(a)linuxtag.org - The official list for sharing ideas and discussing everything related to LinuxTag. In case you have any questions please do not hesitate to contact the projects committee by mail: projects(a)linuxtag.org. Your friendly VCC email bot on behalf of the Project Committee LinuxTag 2009

14 years, 12 months

[openZIM dev-l] Wikimedia-updated MediaWiki instance for dumping

by Manuel Schneider

Dear Brion, is it possible that we can have a MediaWiki instance which is being kept up-to-date by the regular synchronisation on the WM cluster, so we can import all types of dumps in there to make ZIM files from it? We will use a modified version of wiki2html and a special skin for the dumps, so we need an extra instance but it should be always up-to-date with the Wikimedia projects. This would be extremely helpful for us. Any ideas? Greets, Manuel -- Regards Manuel Schneider Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch

15 years

[openZIM dev-l] Thougts about zimlib

by Tommi Mäkitalo

Hi, I've some ideas concerning libzim, which I would like to share with you. There is a class zim::Files, which represents a set of zim-files in one directory. I would like to drop that class, if it is not used in Kiwix. (Emmanuel: Can you verify that?) Let me explain, why the class is there, why I would like to remove it and how to replace the functionality. In the original german wikipedia-DVD the content of wikipedia was in one zeno- file and the word index, which is also a zeno-file was in another. Images where there in a very bad quality due to limited capacity of the DVD. There was a compact and premium edition. The premium edition had images in a much better quality on 3 additional DVDs. Each had a zeno file with part of the images. To use these better quality images, the user had to copy all zeno files into one directory on his hard drive and configure the reader to use that directory. When images were requested, the reader then searches the image in all zeno files and fetches the one with the best quality. I feel, that this solution is not optimal. There is also a technical reason, why this is not that good and since I have a better idea (at least I think the idea is better ;-) ), I would like to drop that feature. The technical reason is simple: there is no common API to read directories. Almost all operating systems has opendir/readdir/closedir. Unfortunately reactos (and other systmes, which use win32) do not have these functions. So the zimlib has to use a different API for these systems. This is not really a big problem but still a problem. It is actually the only code, which is OS specific in zimlib (or actually in the used cxxtools, which provides a wrapper for these functions). The functionality is not needed on the planned wikipedia DVD for the linux tag, since we won't have a premium edition but have all data in one single zim file. In the future I plan to create a utility, which merges the content of 2 or more zim files into one. The structure of zim files makes this a quite easy operation. Much easier than creating new zim files. Especially the changes I made compared to zeno makes it very easy and fast. Combining 2 zim files is almost as cheap as copying these files. So for users instead of copying all zim files into one directory, they can just combine multiple zim files to creat one single big file. The utility will have an option to control how to handle duplicate articles. The utility may prefer the articles of one of the files, so it will be possible to make update files, which just provides the changed files. The resulting combined file will not be exactly the same as a new file, since it will have empty blob entries for removed article data. But the user won't see any difference. Tommi

15 years

Re: [openZIM dev-l] status - ZimReader now working

by Manuel Schneider

Great! I am also curious hearing from Kiwix. Will it be ready so we can put both readers onto the DVD? Greets, Manuel Am Samstag, 11. April 2009 schrieb Tommi Mäkitalo: > Hi, > > the ZimReader is fixed and I can browse through the zim file I created from > Josch's dewiki-Dump. There were some smaller bugs in the reader as well as > in the library, but the main bug was in cxxtools. So if you want to test > the reader please update cxxtools. > > There are still some tasks to do in the reader. There is many hardcoded > stuff from the old wikipedida DVD, like the text "DVD-ROM-Ausgabe 2007" in > the title area and a reference to Directmedia. > > I also updated the status and next steps page in our wiki. > > Tommi > _______________________________________________ > dev-l mailing list > dev-l(a)openzim.org > https://intern.openzim.org/mailman/listinfo/dev-l -- Regards Manuel Schneider Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch

15 years

[openZIM dev-l] openzim file

by Tommi Mäkitalo

Hi, I created a openzim file from the german wikipedia dump I became from Josch last year. The file contains all articles from the german wikipedia without images and its size is 1,3G (or more precisely 1302052315 bytes). Generation took only about 1:10 on our server with the new zimwriter. It could be even improved by parallizing the compression phase, since this is CPU bound and takes the most time, but I feel, that it is not necessary. There are more important task to do. You can download the file from http://www.openzim.org/download/dewiki.zim. The zimreader (the tntnet based webapplication) is almost working with that file. There are some bugs to fix, but this will be done soon. Emmanuel: the file is updated. I fixed some bugs. The zimDump crashed when reading redirects and the writer failed to generate redirects correctly. Josch: do you have an updated dump? Tommi

15 years

[openZIM dev-l] Three new ZIM files for testing

by Emmanuel Engelhart

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I have released three "small" demo ZIM files: * wikipedia_en_wp1_0.5_2000+_03_2006_rc1.zim, a ZIM version of the WP1 0.5 selection done 2 years ago. * wikipedia_es_fa+ga_2000+_04_2008_alpha1.zim, with all featured and good articles from the Wikipedia in Spanish. * wikipedia_it_fa_1000+_-4_2008_alpha1.zim, with all featured articles from the Wikipedia in Italian. You can download them at http://tmp.kiwix.org/zim/ A few remarks: * These files are alpha version and a few things are wrong with them: not GFDL compliant, etc. * You can read with the tntreader or the last alpha version of Kiwix (http://tmp.kiwix.org/bin/kiwix-0.8-alpha2+xulrunner.tar.bz2). * We do not have a reader for Windows now. Regards Emmanuel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknp/6oACgkQn3IpJRpNWtNAxgCeL2xuu4PMDdRV7Bo71nmvdWEn Dz8AoNRJBtFXa3XkfNEEikvesqN06IJt =7vch -----END PGP SIGNATURE-----

15 years

[openZIM dev-l] News / Outage

by Manuel Schneider

Dear all, you may have noticed a short outage tonight. The server lost its physical network link several time. We did now exchange the patch cable. I hope the issue is now resolved. Then we had the conference call with Erik Möller, Tomasz Finc and Brion Vibber from Wikimedia Foundation tonight. The WMF is interested in openZIM as a standard storage format for offline content and offers support, eg. funding DVDs or similar. The WMF has some plans in standardizing the distribution of content in several ways. The ZIM format may play a major role there, but this will have to evolve in the next one or two years. Then we might see how the plans of the WMF work out and in which state ZIM will be at that time and what features it has by then. I will add the outcome later to the preparation page under http://openzim.org/Wikimedia_Foundation_Relationship I am happy to welcome Brion Vibber, Chief Technical Officer of the Wikimedia Foundation, on our list. Some unrelated news: For our Windows instance I added RDP support as well. Feel free to use that instead of VNC (which doesn't work so seemlessly). It works the same way - just install "rdesktop" and tunnel port 3389 through ssh. $ ssh -L 3389:localhost:3389 USERNAME(a)openzim.org $ rdesktop localhost Thanks for your support, Manuel -- Regards Manuel Schneider Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch

15 years

[openZIM dev-l] template engine checked in

by Tommi Mäkitalo

Hi, I just checked in the suggested template engine, which is used in the layout page. To use it, you need to define a layout page, which contains a tag <%content%>. The class zim::Article has in addition to getData a method getPage, which returns the article embedded into that layout page. I added a option -p into zimDump to test that feature. "zimDump -o 12 -d myfile.zim" prints the data from article number 12 and "zimDump -o 12 -p myfile.zim" prints the data embedded into the layout. Tommi

15 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Offline-l April 2009