http://bugs.openzim.org/show_bug.cgi?id=14
Summary: zim-check
Product: openZIM
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P5
Component: zimwriter
AssignedTo: tommi(a)tntnet.org
ReportedBy: emmanuel(a)engelhart.org
CC: dev-l(a)openzim.org
Estimated Hours: 0.0
We need to have a way to check the quality of of zim file.
Should be at least checked:
* (WARNING) has a welcome page
* (ERROR) broken local HTML links
* (WARNING) redundant content
* ...
--
Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.openzim.org/show_bug.cgi?id=11
Summary: Dynamic mime-types
Product: openZIM
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P3
Component: zimlib
AssignedTo: tommi(a)tntnet.org
ReportedBy: emmanuel(a)engelhart.org
CC: dev-l(a)openzim.org
Estimated Hours: 0.0
Currently, the ZIM format can only support a limited and predefined set of
document mime-types.
You can get the list of supported mime-types here:
http://www.openzim.org/ZIM_File_Format#Mime_types
Mime-types are represented by a number in a ZIM file, and the mapping is done
statically by the zimlib.
This means than you can not store document with custom mime-type.
This is a problem for me because I have people who use Kiwix and deal with
other mime-types, for example: archives or binaries.
I think, this is a necessary improvement to make this mime-type table dynamic.
I see two solutions :
* The ZIM file creator specify it manually during the creation process.
* It goes automatically (also during the creation process).
--
Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Jimbo - thanks for the spur to clean up the existing work.
All - Let's start by cleaning up the mailing lists and setting a few
short-term goals :-) It's a good sign that we have both charity and love
converging to make something happen.
* For all-platform all-purpose wikireaders, let's use
offline-l(a)lists.wikimedia, as we discussed a month ago in the aftermath of
Wikimania (Erik, were you going to set this up? I think we agreed to
deprecate wiki-offline-reader-l and replace it with offline-l.)
* For wikireaders such as WikiBrowse and Infoslicer on the XO, please
continue to use wikireader(a)lists.laptop
I would like to see WikiBrowse become the 'sugarized' version of a reader
that combines the best of that and the openZim work. A standalone DVD or
USB drive that comes with its own search tools would be another version of
the same. As far as merging codebases goes, I don't think the WikiBrowse
developers are invested in the name.
I think we have a good first cut at selecting articles, weeding out stubs,
and including thumbnail images. Maybe someone working on openZim can
suggest how to merge the search processes, and that file format seems
unambiguously better.
Kul - perhaps part of the work you've been helping along for standalone
usb-key snapshots would be useful here.
Please continue to update this page with your thoughts and progress!
http://meta.wikimedia.org/wiki/Offline_readers
SJ
2009/10/23 Iris Fernández <irisfernandez(a)gmail.com>
> On Fri, Oct 23, 2009 at 1:37 PM, Jimmy Wales <jwales(a)wikia-inc.com> wrote:
> >
> > My dream is quite simple: a DVD that can be shipped to millions of people
> with an all-free-software solution for reading Wikipedia in Spanish. It
> should have a decent search solution, doesn't have to be perfect, but it
> should be full-text. It should be reasonably fast, but super-perfect is not
> a consideration.
> >
>
> Hello! I am an educator, not a programmer. I can help selecting
> articles or developing categories related to school issues.
>
Iris - you know the main page of WikiBrowse that you see when the reader
first loads? You could help with a new version of that page. Madeleine
(copied here) worked on the first one, but your thoughts on improving it
would be welcome.
Hi,
I just created a branch for the changes in zim format, we discussed at our
developer meeting. The branch is located in our repository under
branches/zim2.
I already checked in first changes. qunicode is removed from zimlib. Also
distinqushing between url and title is started.
I also decided to change the zint compression. The compression is now similar
to utf-8 and saves a little space compared to the old algorithm (about 2%).
The zint compression is used in the full text index currently.
We already discussed about using it in the directory entry also to save some
bytes. I'm not sure if it worth the trouble. It makes alternative
implementations more difficult. If alternative implementations do not care about
full text index or categories, they don't need to implement the zint
compression. On the other hand the code for it is straight forward and should
be easy to port.
The branch is not working yet. It may take some time until it will since I
have to implement the changes in the reader as well as the writer before I can
even start testing (other than verify, that it compiles). It was easier with
zeno when I started, since I had already working zeno files.
Tommi
Le ven 20/11/09 11:11, "Andy Rabagliati" andyr(a)wizzy.com a écrit:
> On Mon, 16 Nov 2009, Emmanuel Engelhart wrote:
>
> > > These indexes http://ai.cs.utsa.edu/wikipedia0.7/ seem to have
> been built
> > using categories.
> >
> > This dump is one I have build (maybe extract from
> the ZIM)... but a
> little bit modified. This a pretty interesting url,
> would be great to
> know how the dev. behind have done exactly... maybe
> you would be able to
> do the same.
>
> This is his explanation :-
>
>
> This collection of articles is called "beta2" because they were
> extracted from wikipedia_en_wp1_0.7_30000+_05_2009_beta2.zim at
> tmp.kiwix.org. My distribution has different file names for all the
> articles based on their titles and has a title search capability that
> only depends on Javascript. Below is some explanation of the steps I
> performed, not necessarily done in this order.
>
> 1. I extracted the articles from the zim file by downloading,
> compiling and running zimDump from openzim.org. Compiling zimDump is
> nontrivial because it involves downloading and compiling other
> packages in versions that will work together.
>
> 2. I created three lists with Perl scripts and manual cleaning
> afterwards.
>
> A. The first list was a list of all articles: zim file name, UTF8
> title, and ASCII title.
>
> B. The second list was a list of all zim files (articles, images and
> other files, Javascript and CSS): zim file name and target file name
> in my distribution.
>
> C. The third list was a list for redirecting one zim file name to
> another. The zim dump creates a lot of empty files in the A
> subdirectory (A contains all the articles). It turns out that each
> of them needs to be redirected to another article. The redirects
> can be determined by downloading and running the zimReader program
> for Linux, which can be found at openzim.org.
>
> There appear to be a few duplicate articles (none were deleted), which
> I list below (in ASCII) for anyone who is interested:
>
> Abu Rayhan Biruni
> 'Alawi
> Battle of Mohacs
> Beer-Lambert law
> Charismatic movement
> Elian Gonzalez affair
> Ismail Enver
> Ismet Inonu
> Istiklal Marsi
> Izmir Province
> Wikipedia:0.7/0.7geo/Leopold
> Macapa or Macapai
> Maceia or Maceio
> Nicole Vaidisova
> PRIDE Fighting Championships
> War in Afghanistan (2001-present)
>
> 3. I used a Perl script to copy all the files from the zim dump to a
> staging area, modifying the links along the way. There are many, many
> dead image links (26314 in my count); I changed those links to empty
> strings. There are also some dead article links, most of them
> correspond to dead image links, but a few of them should have been
> redirected; they got added to my third list above. Here are all the
> dead article links and any appropriate redirect for anyone who is
> interested.
>
> A/5ISM ignore
> A/35A A/D6N
> A/53Z A/CWO
> A/5J03 ignore
> A/5J55 ignore
> A/9XO A/HQO
> A/APD A/A35
> A/D07 A/9PW
> A/F5G A/163K
> A/PRL ignore
> A/TKV A/S4X
> A/TR4 ignore
> A/VBB ignore
> A/ZM2 ignore
> A/2QE6 ignore
> A/T3B A/NQR
> A/11B0 ignore
> A/1NN2 ignore
> A/5ISU A/4O
> A/5JAZ ignore
> A/5IV3 ignore
> A/5IXU ignore
> A/102Z ignore
> A/1QTM ignore
> A/5IOB ignore
> A/5J51 ignore
> A/5IP6 ignore
> A/5JBO ignore
> A/Y91 ignore
>
> 4. In addition to changing links, I made a few other changes. Each
> article now has a search box for title search. I took some existing
> GPLed Javascript (JSE search engine) and made extensive modifications
> for this application. It only searches the titles; there is no
> keyword index, and there is no text search. The motto of the code is
> "Linear Search FTW". It is surprisingly snappy, though in hindsight,
> searching 30000 titles is not a lot for a computer to do. The results
> page is functional, but otherwise not too exciting.
>
> I changed the titles of the index pages to something less geeky, e.g..
> "Topical Index: Wikipedia" for the topic index page on Wikipedia. I
> also fixed a number of incorrect links to the topical index page to
> alphabetical index page.
>
> Enjoy,
>
> Tom Bylander
Thank you very much Andy for having forwarded this email, this is really interesting.
That's confirm a few things:
* The wikipedia_en_wp1_0.7_30000+_05_2009_beta2 is only a beta and should be improved and I will do it soon.
* I need to code a perl script to check many thing in a HTML directory (https://sourceforge.net/tracker/?func=detail&aid=2901059&group_id=175508&at…) before building the ZIM
* I think we need such a tool (zim-check?) in C++ coded to be able to do the same with ZIM files (I see that pretty necessary if we want to setup a ZIM Library: nobody want that we spread bad quality ZIM) http://bugs.openzim.org/show_bug.cgi?id=14
As soon as I will have finished with that stuff I will publish a new version of the ZIM file and contact Tom.
Regards
Emmanuel
Folks,
Coming here by way of the Wikipedia:Version 1.0 Editorial Team pages,
I have a few questions.
Disclaimer :- I am in South Africa, where bandwidth is expensive, so
a lot of my 'fiddling' is done on my server in the USA, where I do not
have b/w limitations.
My interest is to be able to serve wikipedia content off a server in
a classroom - usually a Thin Client setup, but not always, and they
rarely have internet access.
Weak, small clients, networked to a strong server, running linux.
I downloaded a torrent - en.wikipedia.okawix - which appeared to be
a tarball with these contents :-
-rw-r--r-- 1 andyr andyr 571428918 Jul 6 23:24 article.index
-rw-r--r-- 1 andyr andyr 108812163 Jul 6 23:24 article.map
-rw-r--r-- 1 andyr andyr 5601891132 Jul 6 23:23 en.wikipedia.zeno
-rw-r--r-- 1 andyr andyr 11 Jun 23 18:58 entry.url
-rw-r--r-- 1 andyr andyr 307 Jul 6 23:23 licence.xml
-rw-r--r-- 1 andyr andyr 596303890 Jul 6 23:25 word.index
-rw-r--r-- 1 andyr andyr 108108941 Jul 6 23:25 word.map
I see a zeno file - a precursor to the zim format ?
The ZimReader static binary from http://openzim.org/Releases dated
2009-06-07 (same date as the files above ?) will not read the file.
Am I doing this wrong ?
Secondly, I downloaded wikipedia_en_wp1_0.7_30000+_05_2009_beta2.zim
and pointed the ZimReader at this - hoping to browse to server:8080
and read the file. But I have no index, and no search - do I need to get
the index files separately ?
And the css files (?) are in German - do I need to compile my own
reader from svn to get an English reader ?
Can I build a deb (for Ubuntu ?)
Cheers, Andy!
Le jeu 19/11/09 21:39, "Andy Rabagliati" andyr(a)wizzy.com a écrit:
> On Thu, 19 Nov 2009, Tommi M?kitalo wrote:
>
> > Hi,
> >
> > you try to build svn head cxxtools with the debian
> configuration of cxxtools
> 1.4.8. Looks like it do not work. It really looks
> like the call to aclocal is
> wrong. Aclocal needs "-I m4" to work. This was not
> the case in cxxtools-1.4.8.
>
>
> Thank you. I patched debian/rules.
>
> I will let you know if I have a deb, and of any other problems.
I think this is a very important point:
Having a good solution is at least as important as making it available and easy to use.
I'm currently working on the Kiwix deb package and I have had to integrate the zimlib code in my source because there is no package.
Hope we will be able to find a deb maintainer able to setup and keep uptodate deb packages of the zimlib, zimtools and tntzimreader.
In the worth case, I will do it, but I'm a newbie in deb packaging and the whole stuff may need a few months.
Regards
Emmanuel
Le jeu 19/11/09 14:54, "Andy Rabagliati" andyr(a)wizzy.com a écrit:
> I would like to check out how well zimreader will serve a zim file via
> http. I am compiling zimreader and writer from svn.
>
> Ubuntu karmic has libcxxtools-dev version 1.4.8-3 and
> libtntdb-dev version 1.0.1-3 and
> libtntnet-dev version 1.6.3-4
>
> If I use those, zimwriter throws up errors
It works with the svn version of cxxtools.
> Another question - is there a *small* zim file anywhere I can use ??
http://tmp.kiwix.org/tmp/kiwix_wp_pt_stub.zim
Hope it is small enough.
Emmanuel
I would like to check out how well zimreader will serve a zim file via http.
I am compiling zimreader and writer from svn.
Ubuntu karmic has libcxxtools-dev version 1.4.8-3 and
libtntdb-dev version 1.0.1-3 and
libtntnet-dev version 1.6.3-4
If I use those, zimwriter throws up errors
filesource.cpp: In member function virtual bool zim::writer::FileArticle::isRedirect() const:
filesource.cpp:55: error: CXXTOOLS_ERROR_MSG was not declared in this scope
Zimreader compiles OK
Another question - is there a *small* zim file anywhere I can use ??
Cheers, Andy!
Hi all,
it is Wednesday already and the Developers Meeting is just two days away.
Hotel rooms are booked at the same place as last time. As we are so many
participants we didn't have a chance to realise the idea we had the last
time with renting an appartement with space to sleep, accommodation and
internet access. There were no appartements that were big enough and
provided everything we need.
So we will be at the same hotel (Hotel Löwen, Gündenhausen 16, 79650
Schopfheim, +49 7622 688499-0) and at the same venue (Tranter HES,
Hohe-Flum-Strasse 31, 79650 Schopfheim, 300m from Hotel Löwen).
Many of you already gave me their itineraries, so I can arrange pick-up
from airport or advise how to get to the hotel.
If you need help please contact me now.
Anyway it would be good if you could tell me when you arrive and when
you will depart - or just add it yourself in the wiki:
http://openzim.org/Developer_Meetings/2009-2#Participants
We meet on Friday at Tranter HES, those I pick-up somewhere I will bring
there, after 20:00 we will go over to the hotel. So if you come later
than 20:00 go directly to the hotel.
The hotel rooms are book on your name but already paid by me, these cost
will be covered by the openZIM budget. There is breakfast at the hotel
included.
We will have dinner at Hotel Löwen on Friday and there will be a buffet
available all day at the venue, both is covered by openZIM.
To help us with our tight budget Tomasz managed to get another dinner
covered by the Wikimedia Foundation (except alcoholic beverages), we
will have that on Saturday night.
Emergency phone numbers:
Manuel: +49 160 90803992
Annette: +49 170 7740589
Have a good trip!
Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch