Hi, everyone.
Can someone explain what procedure you use to add (some) images to the dump
before packaging a ZIM file?
I am preparing a fresh Hebrew Wikipedia ZIM file, and would like to test the
integration of images as well as recent improvements to Kiwix.
So far, I found Wikix <http://meta.wikimedia.org/wiki/Wikix>, and no
specific instructions on including images in Emmanuel's ZIM-building script,
so I'm guessing that if images are downloaded and integrated into the local
Wikipedia server it's enough?
My questions are:
1. How do you know which images are referenced by the local Wikipedia? I
see Wikix extracts this information into a bunch of shell script files, but
maybe there's another/better way? What do you use?
2. Given a list of images, what is the best way to retrieve them without
pounding the Wikimedia servers? Is there an accepted way? Should I
coordinate it with anyone? The shell scripts generated by Wikix don't seem
to make any provision for delays or anything, and I'm afraid running them
would get me banned. Again, what do you use?
3. What if we want only the thumbnail/low-res version incorporated in the
articles themselves, and not the full resolution version from commons etc.?
4. Once you have a local tree of the image files (in directories 0, 1,
2,..., f), what else do you need to do to get Emmanuel's buildZim....pl
script to include them in the ZIM file?
Many thanks in advance,
Asaf Bartov
Wikimedia Israel
--
--
Asaf Bartov <asaf(a)forum2.org>
http://bugs.openzim.org/show_bug.cgi?id=4
Manuel Schneider <manuel.schneider(a)wikimedia.ch> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dev-l(a)openzim.org
AssignedTo|manuel.schneider@wikimedia. |tommi(a)tntnet.org
|ch |
--
Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Hi
I have made today a new small bug report:
http://bugs.openzim.org/show_bug.cgi?id=8
I think we do not deal enough with our bugzilla because we are not triggered if new bugs are open.
I propose that:
* every new entry (bug or feature request) generates an notification email to this ML,
* Tommi is the default owner for every ticket.
Regards
Emmanuel
OK for me.
Le lun 31/08/09 14:41, "Manuel Schneider" manuel.schneider(a)wikimedia.ch a écrit:
> I haven't heard any objections, so November 22nd is fixed?
>
> Emmanuel, Tomasz?
>
> /Manuel
>
> Am Mittwoch, 26. August 2009 22:04:22 schrieb Tommi Mäkitalo:
> > Hi all,
> >
> > I would prefer November 20th-22th. Can we fix
> that?
>
> > Tommi
> >
> > On Mittwoch, 26. August 2009 20:04:38 Manuel
> Schneider wrote:
> > Dear all,
> > >
> > > refering to http://www.doodle.com/xaf4tzpuwk2xf59h I fix now
> the date of
> > the Developers Meeting on November 13th -
> 15th.
> >
> > > Quoting Tomasz "if you have any horrible
> objections you'd better say it
> > now" :-)
> > >
> > > We would be happy if the openmoko / Qi-Hardware
> quys at the meeting.
> > Tomasz tries to get someone from WikiPock
> (Wikireader on Symbian) to the
> > meeting as well.
> > > Tomasz from the Wikimedia Foundation will be
> there definetely.
> >
> > > The meeting will start on Friday evening
> (around 18 - 19 hr) and end on
> > Sunday afternoon (around 16 hr).
> > >
> > > Please register yourself as soon as you have
> fixed the date in your
> > schedule on https://openzim.org/Developer_Meetings/2009-2
> >> >
> > > We will try to get a big appartment where we
> have internet, accommodation
> > and a kitchen, so we can have a decent meeting
> with full service at the
> > place.
> > >
> > > Greets from Buenos Aires,
> > >
> > >
> > > Manuel
> >
> >
> _______________________________________________
> dev-l mailing list
> > dev-l@openz
> im.org
> https://intern.openzim.org/mailman/listinfo/dev-l
> >
>
>
> --
> Regards
> Manuel Schneider
>
> Wikimedia CH - Verein zur Förderung Freien Wissens
> Wikimedia CH - Association for the advancement of free knowledge
> www.wikimedia.ch
_______________________________________________
> dev-l mailing list
> dev-l@openz
> im.orghttps://intern.openzim.org/mailman/listinfo/dev-l
>
>
Dear all,
refering to http://www.doodle.com/xaf4tzpuwk2xf59h I fix now the date of the
Developers Meeting on November 13th - 15th.
Quoting Tomasz "if you have any horrible objections you'd better say it
now" :-)
We would be happy if the openmoko / Qi-Hardware quys at the meeting. Tomasz
tries to get someone from WikiPock (Wikireader on Symbian) to the meeting as
well.
Tomasz from the Wikimedia Foundation will be there definetely.
The meeting will start on Friday evening (around 18 - 19 hr) and end on Sunday
afternoon (around 16 hr).
Please register yourself as soon as you have fixed the date in your schedule
on https://openzim.org/Developer_Meetings/2009-2
We will try to get a big appartment where we have internet, accommodation and
a kitchen, so we can have a decent meeting with full service at the place.
Greets from Buenos Aires,
Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Dear developers, publishers and users,
we have fixed the time for our second developers meeting of the openZIM
project, a meeting for all users, developers and publishers of software or
works using the ZIM file format.
== Time ==
It will take place over the weekend from November 20th to 22nd 2009.
We will start on Friday evening and shutdown at Sunday afternoon.
== Topics ==
=== Technical topics: ===
* fixing fulltext search (should we stick with the current system? should we
add different search engines?)
* category handling
* additional ZIM headers (a proposal for adding additional headers to the
format to be able to store all HTML meta data has been made)
* portability of libzim (efforts have been made to port libzim to Windows,
Linux on ARM...)
=== Wikimedia related topics: ===
* requirements for Wikimedia ZIM creation
* MediaWiki interfaces
* other reader applications on different platforms
=== Communication ===
* future exhibitions
* marketing
== Location ==
The meeting will take place in south-western Germany, near Basel.
The openZIM team tries to cover all costs for accommodation and catering
during the meeting. To achieve this we have to know very early how many
people will arrive, so we can choose the best location possible.
We plan to book an appartment with Internet access and kitchen, so we can work
and sleep at the same place and cater ourselves.
== Details and Registration ==
Please register (and add your comments) at
http://openzim.org/Developer_Meetings/2009-2
There you will find also details how to reach the place etc.
REGISTRATION CLOSES AT SEPTEMBER 31st!
Kind regards,
Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Dear list members,
as you know Wikimania will start in two days in Buenos Aires, Argentina.
This year I will be there again and participating in at leats three different
sessions:
* Thursday, 27th, talk "openZIM - serving the Wikipedia Offline Projects"
http://wikimania2009.wikimedia.org/wiki/Proceedings:96
* Firday, 28th, panel "Panel on Wikimedia chapters"
http://wikimania2009.wikimedia.org/wiki/Proceedings:140
* a chapters meeting is being planned:
http://wikimania2009.wikimedia.org/wiki/Chapters_meeting
As far as I know unfortunately noone else from Switzerland is going there. But
I will provide reports to the board like I did at the Chapters Meeting in
April this year as well.
If you have any suggestions or oppinions I should place at the panels don't
hesitate to contact me.
I will leave today (Monday 24th) and return on Sunday, 31st.
Greets,
Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Hi Marc
Le mar 11/08/09 16:26, "Marc Bantle" openmoko(a)rcie.de a écrit:
> I observed that Kiwix is producing an "ad-hoc" type
> index. This may be usefull for desktops as they have
> the power to generate an index file on the fly. On
> small footprint devices this will not reasonably be
> possible, due to lacking memory and cpu resources.
Yes
> Even on a dual core desktop with 3.5 GB of memory
> Kiwix failed to produce "ad-hoc" index of the
> openzim-edition of the German Wikipedia running
> out of memory after many hours.
Yes, this is Kiwix's specific issue with really big (with a lot of text) ZIM files.
> Question 1:
> >From the change log I see that kiwix is using
> a prominent search engine (Xapian) instead of the
> mechanism ZimReader/Writer are using. Is there an
> easy way to reuse an index produced by Kiwix on
> a different machines?
Yes although this is not trivial, the Xapian database is in your ~/.www.kiwix.org directory ([md5sum].index directory)
You can copy it to every other profile/account/computer and like that Kiwix will be able to search trough a ZIM without running the indexing process. To reduce the size of the directory, you can also use "xapian-compact".
We know, they are a list of improvements to do to improve the current index management usability.
> Question 2:
> Are there plans to enable Kiwix to read reusable
> indexes of the format released for ZimReader/
> Writer?
I have nothing against to make Kiwix compatible with different search engine backends... but this is not a priority yet for me.
I think I will do it in a middle far future, as soon as I have time for that or if for any reason a user really need that.
> Question 3:
> Are there plans to enable Kiwix to produce such a
> reusable index.
Not sure to understand the question? Do you speak from the ZIM indexes ?
In this case, cf. Question2 comment.
> Question 4:
> Wouldn't it be desirable to deliver reusable indexes
> together with zim-article-databases for all those
> people with less capable devices (mids, netbooks,
> phones) on the Kiwix site?
I do not believe having only one type of search engine is good at all: usages are multiple and for this reason with have different search engines. I think the ZIM format should not forced the user to make a choice. I also think, we have to be able to spread contents without data twice (with indexes). And finaly, I do not think that having compatible indexes should be a priority because 99% of the users don't care about that (they simply use only one client).
> Question 5:
> The zim databases supplied on the Kiwix site [1]
> seem to use the articles title field as article id field,
> which - I'm sure - solves some problems for Kiwix,
> but results in a list of article ids as result of a search
> on zimreader instead of a list of article titles. Since
> both Kiwix and ZimReader are part of the openzim
> standardization effort, this confuses me a bit. Which
> format is supposed to be the standard?
Tommi already answered to that...
IMO this is the job of the indexer to find the title... if there is a HTML page with a title, it has to use it.
More globaly, IMO forcing ZIM creators with url=title is a bad idea, we never should forced (with the format) to adopt special way of representig/storing Informations. All what if possible with "normal" HTTP/HTML should be also possible with contents in a ZIM file.
Emmanuel
Hey there,
Mirko from Qi-Hardware again.
Sorry for the long period of silence, but I had to get certain things in
motion before getting back to you.
The first units (sample units) are making their way to Europe now, to Berlin
to be precise and we are still very much into OpenZim and applications using
it on the Ben NanoNote[1].
I would propose to get one of the units to a member of your developer team
so that you can have a closer look and see what could be done.
As our software effort is mainly focused on kernel development we have not
reached the level of Xorg yet, so do you know of an ncurses viewer for the
OpenZim format, which we could deploy as a reference point for other
developers to get started? Or any other low-resource viewer usng fb only?
The biggest limitation will be the 32MB RAM, which excludes any webbrowsers
if I understand our engineers correctly.
Looking forward to your reply,
/mirko
[1] http://www.qi-hardware.com/products/ben-nanonote/
Hi all,
as I wrote in an earlier post, I have compiled
ZimReader for Openmoko platform. To make more
databases available for the device I also had look at
kiwix and the ZIM-Files supplied on it's site [1].
Some question arose from that and it appears, that this
list is the right place to discuss them. Please correct me,
if I'm wrong.
I observed that Kiwix is producing an "ad-hoc" type
index. This may be usefull for desktops as they have
the power to generate an index file on the fly. On
small footprint devices this will not reasonably be
possible, due to lacking memory and cpu resources.
Even on a dual core desktop with 3.5 GB of memory
Kiwix failed to produce "ad-hoc" index of the
openzim-edition of the German Wikipedia running
out of memory after many hours.
Question 1:
>From the change log I see that kiwix is using a
prominent search engine (Xapian) instead of the
mechanism ZimReader/Writer are using. Is there an
easy way to reuse an index produced by Kiwix on
a different machines?
Question 2:
Are there plans to enable Kiwix to read reusable
indexes of the format released for ZimReader/
Writer?
Question 3:
Are there plans to enable Kiwix to produce such a
reusable index.
Question 4:
Wouldn't it be desirable to deliver reusable indexes
together with zim-article-databases for all those
people with less capable devices (mids, netbooks,
phones) on the Kiwix site?
Question 5:
The zim databases supplied on the Kiwix site [1]
seem to use the articles title field as article id field,
which - I'm sure - solves some problems for Kiwix,
but results in a list of article ids as result of a search
on zimreader instead of a list of article titles. Since
both Kiwix and ZimReader are part of the openzim
standardization effort, this confuses me a bit. Which
format is supposed to be the standard?
Question 6:
I succeeded in producing a ZIM-Format index of
the openzim-edition of the German Wikipedia using
ZimWriter on the above 3.5 GB machine. Other than
the index supplied on DVD, the generated index is
1. 5 GB of size (instead of 1.1 GB ). Any ideas why
that is?
Cheers,
Marc
[1] http://tmp.kiwix.org/zim