Dear offline wikihackers, if you are interested in general discussions about
offline reading and editing, please join the new offline-l mailing list. It
is intended for offline use of mediawiki collections, but for those working
on other wiki backends, I expect that much of the work and tools will be
wiki-neutral.
https://lists.wikimedia.org/mailman/listinfo/offline-l
In particular, I expect the recent discussion about how best to get Spanish
Wikipedia to the students in offline schools in Latin America will move
there (so we can stop cc:ing so many people and lists :) - so please
subscribe if you want to follow that thread.
(Alejandro: por favor difundir este mensaje -- debates en español también
son bienvenidos en la nueva lista.)
Regards,
Sam.
---------- Forwarded message ----------
From: Erik Moeller <erik(a)wikimedia.org>
Date: Fri, Oct 23, 2009 at 9:17 PM
Subject: [Wiki-offline-reader-l] New list: offline-l
To: wiki-offline-reader-l(a)lists.wikimedia.org
After some discussion with WMF staff and people working on wiki
offline tools, we wanted to make sure we have an official
WMF-designated forum to discuss both offline reader and editor
technologies for Wikimedia Foundation projects and other MediaWikis.
We felt the list name "wiki-offline-reader" didn't reflect
appropriately the need for offline editing and sharing tools, and have
therefore created a new list called offline-l:
https://lists.wikimedia.org/mailman/listinfo/offline-l
We'll advertise this new list in the next couple of weeks in various
places. If you're interested in offline discussions, I strongly
encourage you to subscribe there.
Because wiki-offline-reader-l hasn't been very active, I haven't moved
the archives over, but this list won't be deleted for now (I added a
deprecation notice to the list info page).
Thanks and all best,
Erik
--
Erik Möller
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
wiki-offline-reader-l(a)lists.wikimedia.org
_______________________________________________
Wiki-offline-reader-l mailing list
Wiki-offline-reader-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-offline-reader-l
Hey there,
Manuel and I have worked on a short text about openZIM as a project to
spread the news and attract more attention.
You can find the text in the openZIM wiki [1]. If you have time please
review it and comment etc. We are also not quite certain on the heading yet,
so if you have a ideas, let's hear them :)
The idea is then to push it out on as many places as we can cover, so if you
have ideas about where to put it up great!
Regards,
/mirko
[1] http://www.openzim.org/2009-10-22_Blog_Post_on_Project_Paroli
Thank you Tommi.
I came to the same conclusions.
Have also updated a little bit the documentation:
https://openzim.org/index.php?title=ZIM_File_Format&diff=568&oldid=504
Le mar 20/10/09 14:52, "Tommi Mäkitalo" tommi(a)tntnet.org a écrit:
> On Dienstag, 20. Oktober 2009 13:57:08 emmanu
> el(a)engelhart.org wrote:
> Hi,
> >
> > I help currently someone wanting to build his own
> ZIM parser.
> I know this is not necessary... but he wants to do
> it and I find this is
> interesting to have someone trying to do
> that.
>
> > I have remarked that on the wiki nothing is written
> about the compression:
> * How do we know if a cluster is compressed or not
> ?
> * What are the possible value (for different
> compression methods) ?
>
> > Regards
> > Emmanuel
> >
> >
> _______________________________________________
> dev-l mailing list
> > dev-l@openz
> im.org
> https://intern.openzim.org/mailman/listinfo/dev-l
> >Hi,
>
> unfortunately there are several areas, which are not specified completely.
> But
luckily this compression flag is at least partly ;-)
>
> The first byte in a cluster specifies the compression. The value is:
> 0 default (no compression)
> 1 none (also no compression, I don't know, why vlado specified this in
> zeno,
but I take ist over to zim)
> 2 zip (zlib)
> 3 bzip2 (currently used in writer)
> 4 lzma (not implemented in reader or writer due to lack of compression
>
library)
>
> You can find the flag here: http://openzim.org/ZIM_File_Format#Clusters. The
>
actual values are not documentated but can be found in the header zim/zim.h
> as
a enum.
>
> It is really no necessary to implement a parser but it does not hurt. It
> helps
discussing ideas for improvements. And it helps pushing zim as a standard,
> if
we encourage people to work with it.
>
>
> Tommi
>
>
>
Hi,
I help currently someone wanting to build his own ZIM parser.
I know this is not necessary... but he wants to do it and I find this is interesting to have someone trying to do that.
I have remarked that on the wiki nothing is written about the compression:
* How do we know if a cluster is compressed or not ?
* What are the possible value (for different compression methods) ?
Regards
Emmanuel
Hello again.
Two Kiwix issues we see with the Hebrew ZIM (available at
http://benyehuda.org/zim/hewiki_sep2009.zim) --
1. There is no caption for images when hovering with the mouse over them.
2. There is no licensing information for images -- this seems to us to be
violating the GFDL license, and is therefore a problem in distributing ZIM
files.
(I'll open a Bugzilla issue for each if they are acknowledged as issues.)
We note that all this seems to be working fine in the Arabic ZIM you guys
prepared some time ago. Perhaps we're missing something? Perhaps you'd be
willing to try to create a Hebrew Wikipedia ZIM for us, just to see if it's
something in our process or something in the handling of Hebrew?
Many thanks in advance,
Asaf Bartov
Wikimedia Israel
--
--
Asaf Bartov <asaf(a)forum2.org>
Hello, everyone.
While trying to add images to our Hebrew Wikipedia ZIM file, we ran into
memory-consumption problems with the Kiwix mirroring scripts.
Despite using a quad-core Linux machine with 4GB of physical memory, both *
listAllImages.pl* and *mirrorMediawikiPages.pl* ran for a very long time but
eventually crashed the perl process with an *out of memory* error, and
indeed, while running, they consumed ever more memory, until they actually
consumed the entire physical memory.
Emmanuel said this shouldn't happen, but it does, and we'd like your advice
on whether there's any point in attempting this again with a larger machine
(perhaps an Amazon EC2 machine with more physical memory and faster CPUs) or
whether we should try to find the bug in the scripts.
Thanks in advance,
Asaf Bartov
Wikimedia Israel
--
Asaf Bartov <asaf(a)forum2.org>
Hey,
I ported OpenZIM and it's dependencies to OpenWrt (not committed yet) -
mainly, to get it running on the embedded device "Ben NanoNote" of
qi-hardware.
Unfortunately OpenZIM seems to consume an amount of memory the Ben isn't
able to serve (it has 32MB of RAM).
Even with a really minimalistic system (lightweight kernelconfig +
busybox-based rootfs, having just a shell) and accessing the zimreader
via a webbrowser running on another machine, OpenZIM is getting killed
by the kernels out-of-memory killer after a few clicks.
Is there a way of configuring / adjusting the zimreader to reduce memory
consumption to get it running smoothly and stable on device such the ben
NanoNote?
Thanks a lot in advance - great work!
Have a nice weekend,
mirko
--
This email address is used for mailinglist purposes only.
Non-mailinglist emails will be dropped automatically.
If you want to get in contact with me personally, please mail to:
mirko.vogt <at> nanl <dot> de
Hey,
is there a reliable way of getting a signal from the ZimReader
respective webserver, whether it got started successfully?
Thing is, for now I start the ZimReader, wait a few seconds and then
will start my webbrowser, out of the hope, the ZimReader got up in time.
My idea would be sth. like, having an option to be able to start the
ZimReader in daemon-mode which is forking (and therefore
"backgrounding") when it got up successfully.
That way, e.g. this would possible:
< ZimReader /foo/bar.zim && webbrowser http://127.0.0.1:8080 >
Thanks in advance,
mirko
--
This email address is used for mailinglist purposes only.
Non-mailinglist emails will be dropped automatically.
If you want to get in contact with me personally, please mail to:
mirko.vogt <at> nanl <dot> de
Hi Asaf,
Le lun 28/09/09 14:52, Asaf Bartov asaf.bartov(a)gmail.com a écrit:
> In trying to retrieve the images for the Hebrew Wikipedia ZIM Im
> making, I tried running Emmanuels script MIRRORMEDIAWIKIPAGES.PL. My
> command line was this:
>
> ./MIRRORMEDIAWIKIPAGES.PL --SOURCEHOST=HE.WIKIPEDIA.ORG [1]
> --DESTINATIONHOST=LOCALHOST --USEINCOMPLETEPAGESASINPUT --SOURCEPATH=W
bizarre... I do not have this issue by me. Please checkout a recent version of the svn to be sure.
This should also theoriticaly not happen, because at this moment you only have in the worse case twice the list of article in memory.
> After working for more than 20 hours, and still in the stage of
> populating the @pages with incomplete pages, it aborted with "out of
> memory". The machine has 4GB physical memory, and the last time I
> checked -- several hours before it aborted -- the script was consuming
> 3.6GB.
>
> Is there a way to do this in several large chunks, without specifying
> each individual page? How do you do it?
I do like you and never have had this issue.
The script memory usage should never increase.
By me with 8GB memory, it uses almost 5%.
But, if you want to get only the pictures there is an alternative way:
* use ./listAllPages to get the list of all pages
* use ./listDependences.pl to get the missing images (maybe you also need to use grep to filter out a few templates)
* at the end use ./mirrorMediawikiPages.pl with in STDIN your list of images
Regards
Emmanuel
Hello again,
In trying to retrieve the images for the Hebrew Wikipedia ZIM I'm making, I
tried running Emmanuel's script *mirrorMediawikiPages.pl*. My command line
was this:
*./mirrorMediawikiPages.pl
--sourceHost=he.wikipedia.org--destinationHost=localhost
--useIncompletePagesAsInput --sourcePath=w
*
After working for more than 20 hours, and still in the stage of populating
the @pages with incomplete pages, it aborted with "out of memory". The
machine has 4GB physical memory, and the last time I checked -- several
hours before it aborted -- the script was consuming 3.6GB.
Is there a way to do this in several large chunks, without specifying each
individual page? How do you do it?
Thanks in advance,
Asaf Bartov
Wikimedia Israel
--
Asaf Bartov <asaf(a)forum2.org>