Hello, everyone.
As Manuel mentioned some time ago, I'm from Wikimedia Israel, and we're
interested in trying the OpenZIM tools for a static version of the Hebrew
Wikipedia we are working on, for inclusion in the Israeli government's One
Laptop Per Child (OLPC) project.
So far, I've been able to compile and run all the tools, but am having some
trouble creating ZIM files: I've dumped a locally-loaded Mediawiki
installation to HTML using these instructions <http://openzim.org/Wiki2html>,
but when I try to run the builZimFileFromDirectory.pl script, I get silly
postgresql errors about failing to connect using "kiwix" user. I should
probably create a user, or a database, or both, but I've never used
postgresql (I use mysql), and despite trying some combinations of commands
to create a 'kiwix' user, with and without a password, couldn't get the
DBI::connect call to succeed. I am sure you are all familiar with
postgresql and can give me a quick pointer on how to proceed. I'm using a
modern unstable Debian box, with a default packager-supplied installation of
postgresql and the related Perl DBD module.
Thanks in advance,
Asaf Bartov
--
Asaf Bartov <asaf(a)forum2.org>
Hi Thomasz,
thanks for your answer.
I was thinking of a wiki on a server which is not part of the cluster but configured the same way as the productive Wikimedia wikis. Especially the updates of the software is the issue, so the installation is always suitable for Wikimedia content (Mediawiki version, extensions etc.).
Then we need the possibility to maintain our own skin and our own version of wiki2html on that server (I think both should become part of Mediawiki anyway).
The initial idea was that we use it as a blank setup where we can import the latest dump of whatever wiki we want to convert to ZIM. But if a database connection to the Wikimedia cluster would be possible this would boost the whole process a lot.
But after thinking through all this I am at the point where I see the big solution: Put the zimwriter into Mediawiki / the Wikimedia dumps and we're set...
What do you think?
At the moment the DVD for LinuxTag in June is priority and the zimwriter is still work in progress, but we need a solution for current Wikipedia dumps.
Greets,
Manuel
-- Urspr. Mitt. --
Betreff: Re: [openZIM dev-l] Wikimedia-updated MediaWiki instance for dumping
Von: Tomasz Finc <tfinc(a)wikimedia.org>
Datum: 20.04.2009 22:29
Manuel Schneider wrote:
> Dear Brion,
>
> is it possible that we can have a MediaWiki instance which is being kept
> up-to-date by the regular synchronisation on the WM cluster, so we can import
> all types of dumps in there to make ZIM files from it?
>
> We will use a modified version of wiki2html and a special skin for the dumps,
> so we need an extra instance but it should be always up-to-date with the
> Wikimedia projects. This would be extremely helpful for us.
>
> Any ideas?
>
> Greets,
>
>
> Manuel
>
Filling in since Brion is out in meetings today.
I'm a little confused with what you are asking here.
Are you requesting a machine that is configured the same as any one of
our web server nodes to serve wikipedia traffic and connects to our
databases but is not part of our general cluster? Or are you asking for
a blank node to populate with different types of data that you can in
turn generate ZIM files from?
Let know and we can get something in place.
--tomasz
_______________________________________________
dev-l mailing list
dev-l(a)openzim.org
https://intern.openzim.org/mailman/listinfo/dev-l
Dear all,
we managed to get a booth at LinuxTag in Berlin this year (June 24th - 27th).
This is good news! But it also means work - now we have to get ready for the Wikipedia DVD, the ISO must be ready by May 31st.
Please keep me up to date with the current status.
Tommi, did you hear about your talk submission?
Cheers,
Manuel
-- Urspr. Mitt. --
Betreff: [LT2009] Project "openZIM" accepted
Von: projects(a)linuxtag.org
Datum: 29.04.2009 21:03
Dear Manuel Schneider,
we are excited to inform you, that your application for the project
openZIM
qualified for a sponsored booth at this year's LinuxTag. Please note
this is just the confirmation of our general acceptance of your
submission. The size and location of the booth will be decided later
mainly depending on the overall amount of accepted project submissions
and the quality of the exhibition concept you provide us using the
virtual conference center and in dialogue with the project committee.
Even there is plenty of time until the LinuxTag's doors open from June
24-27, 2009, we encourage you to start your preparations right now.
First you should ensure, that your projects' profile at the virtual
conference center (vcc) is complete:
https://vcc.linuxtag.org/cc.pl?rm=editproject;personality=project;id=15015
>From now on you can order exhibitor passes for your booth staff. In
comparison to other events we have no hard limit for the amount of
exhibitor passes, but we kindly ask you to order only a decent
amount. We review all orders and approve them manually. The exhibitor
passes have just enable its possessor to enter Berlin fairgrounds a
little bit earlier than the public. Please do not order passes for
friends or project members, which do not help at your booth. For them
we will provide you enough eTickets, which are free tickets for all
four days. To order exhibitor passes, please use the vcc:
https://vcc.linuxtag.org/cc.pl?rm=exhibitorpass;personality=project;id=15015
We organize all information in the public LinuxTag Wiki. Please find
all related information here:
http://wiki.linuxtag.org/w/fp:Main_Page
Before sending questions to the Project Committee we kindly ask you
to have a look into the FAQ and the A-Z pages.
We have subscribed you to our mailing lists for all participating free
software projects:
* projects-announce(a)linuxtag.org - A low traffic, read only list where
all newsletters and information regarding the free projects
participation at LinuxTag will be sent to.
* projects-discussion(a)linuxtag.org - The official list for sharing
ideas and discussing everything related to LinuxTag.
In case you have any questions please do not hesitate to contact the
projects committee by mail: projects(a)linuxtag.org.
Your friendly VCC email bot on behalf of the
Project Committee LinuxTag 2009
Dear Brion,
is it possible that we can have a MediaWiki instance which is being kept
up-to-date by the regular synchronisation on the WM cluster, so we can import
all types of dumps in there to make ZIM files from it?
We will use a modified version of wiki2html and a special skin for the dumps,
so we need an extra instance but it should be always up-to-date with the
Wikimedia projects. This would be extremely helpful for us.
Any ideas?
Greets,
Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Hi,
I've some ideas concerning libzim, which I would like to share with you.
There is a class zim::Files, which represents a set of zim-files in one
directory. I would like to drop that class, if it is not used in Kiwix.
(Emmanuel: Can you verify that?)
Let me explain, why the class is there, why I would like to remove it and how
to replace the functionality.
In the original german wikipedia-DVD the content of wikipedia was in one zeno-
file and the word index, which is also a zeno-file was in another. Images where
there in a very bad quality due to limited capacity of the DVD. There was a
compact and premium edition. The premium edition had images in a much better
quality on 3 additional DVDs. Each had a zeno file with part of the images. To
use these better quality images, the user had to copy all zeno files into one
directory on his hard drive and configure the reader to use that directory.
When images were requested, the reader then searches the image in all zeno
files and fetches the one with the best quality.
I feel, that this solution is not optimal. There is also a technical reason,
why this is not that good and since I have a better idea (at least I think the
idea is better ;-) ), I would like to drop that feature. The technical reason
is simple: there is no common API to read directories. Almost all operating
systems has opendir/readdir/closedir. Unfortunately reactos (and other
systmes, which use win32) do not have these functions. So the zimlib has to
use a different API for these systems. This is not really a big problem but
still a problem. It is actually the only code, which is OS specific in zimlib
(or actually in the used cxxtools, which provides a wrapper for these
functions).
The functionality is not needed on the planned wikipedia DVD for the linux
tag, since we won't have a premium edition but have all data in one single zim
file.
In the future I plan to create a utility, which merges the content of 2 or
more zim files into one. The structure of zim files makes this a quite easy
operation. Much easier than creating new zim files. Especially the changes I
made compared to zeno makes it very easy and fast. Combining 2 zim files is
almost as cheap as copying these files. So for users instead of copying all zim
files into one directory, they can just combine multiple zim files to creat one
single big file.
The utility will have an option to control how to handle duplicate articles.
The utility may prefer the articles of one of the files, so it will be possible
to make update files, which just provides the changed files. The resulting
combined file will not be exactly the same as a new file, since it will have
empty blob entries for removed article data. But the user won't see any
difference.
Tommi
Great!
I am also curious hearing from Kiwix. Will it be ready so we can put both
readers onto the DVD?
Greets,
Manuel
Am Samstag, 11. April 2009 schrieb Tommi Mäkitalo:
> Hi,
>
> the ZimReader is fixed and I can browse through the zim file I created from
> Josch's dewiki-Dump. There were some smaller bugs in the reader as well as
> in the library, but the main bug was in cxxtools. So if you want to test
> the reader please update cxxtools.
>
> There are still some tasks to do in the reader. There is many hardcoded
> stuff from the old wikipedida DVD, like the text "DVD-ROM-Ausgabe 2007" in
> the title area and a reference to Directmedia.
>
> I also updated the status and next steps page in our wiki.
>
> Tommi
> _______________________________________________
> dev-l mailing list
> dev-l(a)openzim.org
> https://intern.openzim.org/mailman/listinfo/dev-l
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Hi,
I created a openzim file from the german wikipedia dump I became from Josch
last year. The file contains all articles from the german wikipedia without
images and its size is 1,3G (or more precisely 1302052315 bytes). Generation
took only about 1:10 on our server with the new zimwriter. It could be even
improved by parallizing the compression phase, since this is CPU bound and
takes the most time, but I feel, that it is not necessary. There are more
important task to do.
You can download the file from http://www.openzim.org/download/dewiki.zim.
The zimreader (the tntnet based webapplication) is almost working with that
file. There are some bugs to fix, but this will be done soon.
Emmanuel: the file is updated. I fixed some bugs. The zimDump crashed when
reading redirects and the writer failed to generate redirects correctly.
Josch: do you have an updated dump?
Tommi
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I have released three "small" demo ZIM files:
* wikipedia_en_wp1_0.5_2000+_03_2006_rc1.zim, a ZIM version of the WP1
0.5 selection done 2 years ago.
* wikipedia_es_fa+ga_2000+_04_2008_alpha1.zim, with all featured and
good articles from the Wikipedia in Spanish.
* wikipedia_it_fa_1000+_-4_2008_alpha1.zim, with all featured articles
from the Wikipedia in Italian.
You can download them at http://tmp.kiwix.org/zim/
A few remarks:
* These files are alpha version and a few things are wrong with them:
not GFDL compliant, etc.
* You can read with the tntreader or the last alpha version of Kiwix
(http://tmp.kiwix.org/bin/kiwix-0.8-alpha2+xulrunner.tar.bz2).
* We do not have a reader for Windows now.
Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknp/6oACgkQn3IpJRpNWtNAxgCeL2xuu4PMDdRV7Bo71nmvdWEn
Dz8AoNRJBtFXa3XkfNEEikvesqN06IJt
=7vch
-----END PGP SIGNATURE-----
Dear all,
you may have noticed a short outage tonight.
The server lost its physical network link several time. We did now exchange
the patch cable. I hope the issue is now resolved.
Then we had the conference call with Erik Möller, Tomasz Finc and Brion Vibber
from Wikimedia Foundation tonight.
The WMF is interested in openZIM as a standard storage format for offline
content and offers support, eg. funding DVDs or similar.
The WMF has some plans in standardizing the distribution of content in several
ways. The ZIM format may play a major role there, but this will have to
evolve in the next one or two years. Then we might see how the plans of the
WMF work out and in which state ZIM will be at that time and what features it
has by then.
I will add the outcome later to the preparation page under
http://openzim.org/Wikimedia_Foundation_Relationship
I am happy to welcome Brion Vibber, Chief Technical Officer of the Wikimedia
Foundation, on our list.
Some unrelated news:
For our Windows instance I added RDP support as well. Feel free to use that
instead of VNC (which doesn't work so seemlessly). It works the same way -
just install "rdesktop" and tunnel port 3389 through ssh.
$ ssh -L 3389:localhost:3389 USERNAME(a)openzim.org
$ rdesktop localhost
Thanks for your support,
Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Hi,
I just checked in the suggested template engine, which is used in the layout
page. To use it, you need to define a layout page, which contains a tag
<%content%>. The class zim::Article has in addition to getData a method
getPage, which returns the article embedded into that layout page.
I added a option -p into zimDump to test that feature.
"zimDump -o 12 -d myfile.zim" prints the data from article number 12 and
"zimDump -o 12 -p myfile.zim" prints the data embedded into the layout.
Tommi