Offline-l November 2009

offline-l@lists.wikimedia.org

8 participants
17 discussions

[openZIM dev-l] [Bug 14] New: zim-check
by bugzilla-daemon＠openzim.org 16 Apr '10

16 Apr '10

http://bugs.openzim.org/show_bug.cgi?id=14 Summary: zim-check Product: openZIM Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: zimwriter AssignedTo: tommi(a)tntnet.org ReportedBy: emmanuel(a)engelhart.org CC: dev-l(a)openzim.org Estimated Hours: 0.0 We need to have a way to check the quality of of zim file. Should be at least checked: * (WARNING) has a welcome page * (ERROR) broken local HTML links * (WARNING) redundant content * ... -- Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

1 2

[openZIM dev-l] [Bug 11] New: Dynamic mime-types
by bugzilla-daemon＠openzim.org 26 Mar '10

26 Mar '10

http://bugs.openzim.org/show_bug.cgi?id=11 Summary: Dynamic mime-types Product: openZIM Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: zimlib AssignedTo: tommi(a)tntnet.org ReportedBy: emmanuel(a)engelhart.org CC: dev-l(a)openzim.org Estimated Hours: 0.0 Currently, the ZIM format can only support a limited and predefined set of document mime-types. You can get the list of supported mime-types here: http://www.openzim.org/ZIM_File_Format#Mime_types Mime-types are represented by a number in a ZIM file, and the mapping is done statically by the zimlib. This means than you can not store document with custom mime-type. This is a problem for me because I have people who use Kiwix and deal with other mime-types, for example: archives or binaries. I think, this is a necessary improvement to make this mime-type table dynamic. I see two solutions : * The ZIM file creator specify it manually during the creation process. * It goes automatically (also during the creation process). -- Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

1 1

Re: [openZIM dev-l] [Argentina] WikiBrowse improvements
by Samuel Klein 02 Dec '09

02 Dec '09

Jimbo - thanks for the spur to clean up the existing work. All - Let's start by cleaning up the mailing lists and setting a few short-term goals :-) It's a good sign that we have both charity and love converging to make something happen. * For all-platform all-purpose wikireaders, let's use offline-l(a)lists.wikimedia, as we discussed a month ago in the aftermath of Wikimania (Erik, were you going to set this up? I think we agreed to deprecate wiki-offline-reader-l and replace it with offline-l.) * For wikireaders such as WikiBrowse and Infoslicer on the XO, please continue to use wikireader(a)lists.laptop I would like to see WikiBrowse become the 'sugarized' version of a reader that combines the best of that and the openZim work. A standalone DVD or USB drive that comes with its own search tools would be another version of the same. As far as merging codebases goes, I don't think the WikiBrowse developers are invested in the name. I think we have a good first cut at selecting articles, weeding out stubs, and including thumbnail images. Maybe someone working on openZim can suggest how to merge the search processes, and that file format seems unambiguously better. Kul - perhaps part of the work you've been helping along for standalone usb-key snapshots would be useful here. Please continue to update this page with your thoughts and progress! http://meta.wikimedia.org/wiki/Offline_readers SJ 2009/10/23 Iris Fernández <irisfernandez(a)gmail.com> > On Fri, Oct 23, 2009 at 1:37 PM, Jimmy Wales <jwales(a)wikia-inc.com> wrote: > > > > My dream is quite simple: a DVD that can be shipped to millions of people > with an all-free-software solution for reading Wikipedia in Spanish. It > should have a decent search solution, doesn't have to be perfect, but it > should be full-text. It should be reasonably fast, but super-perfect is not > a consideration. > > > > Hello! I am an educator, not a programmer. I can help selecting > articles or developing categories related to school issues. > Iris - you know the main page of WikiBrowse that you see when the reader first loads? You could help with a new version of that page. Madeleine (copied here) worked on the first one, but your thoughts on improving it would be welcome.

7 8

[openZIM dev-l] new branch for openzim created
by Tommi Mäkitalo 29 Nov '09

29 Nov '09

Hi, I just created a branch for the changes in zim format, we discussed at our developer meeting. The branch is located in our repository under branches/zim2. I already checked in first changes. qunicode is removed from zimlib. Also distinqushing between url and title is started. I also decided to change the zint compression. The compression is now similar to utf-8 and saves a little space compared to the old algorithm (about 2%). The zint compression is used in the full text index currently. We already discussed about using it in the directory entry also to save some bytes. I'm not sure if it worth the trouble. It makes alternative implementations more difficult. If alternative implementations do not care about full text index or categories, they don't need to implement the zint compression. On the other hand the code for it is straight forward and should be easy to port. The branch is not working yet. It may take some time until it will since I have to implement the changes in the reader as well as the writer before I can even start testing (other than verify, that it compiles). It was easier with zeno when I started, since I had already working zeno files. Tommi

1 0

Re: [openZIM dev-l] zeno, zim formats
by emmanuel＠engelhart.org 20 Nov '09

20 Nov '09

Le ven 20/11/09 11:11, "Andy Rabagliati" andyr(a)wizzy.com a écrit: > On Mon, 16 Nov 2009, Emmanuel Engelhart wrote: > > > > These indexes http://ai.cs.utsa.edu/wikipedia0.7/ seem to have > been built > > using categories. > > > > This dump is one I have build (maybe extract from > the ZIM)... but a > little bit modified. This a pretty interesting url, > would be great to > know how the dev. behind have done exactly... maybe > you would be able to > do the same. > > This is his explanation :- > > > This collection of articles is called "beta2" because they were > extracted from wikipedia_en_wp1_0.7_30000+_05_2009_beta2.zim at > tmp.kiwix.org. My distribution has different file names for all the > articles based on their titles and has a title search capability that > only depends on Javascript. Below is some explanation of the steps I > performed, not necessarily done in this order. > > 1. I extracted the articles from the zim file by downloading, > compiling and running zimDump from openzim.org. Compiling zimDump is > nontrivial because it involves downloading and compiling other > packages in versions that will work together. > > 2. I created three lists with Perl scripts and manual cleaning > afterwards. > > A. The first list was a list of all articles: zim file name, UTF8 > title, and ASCII title. > > B. The second list was a list of all zim files (articles, images and > other files, Javascript and CSS): zim file name and target file name > in my distribution. > > C. The third list was a list for redirecting one zim file name to > another. The zim dump creates a lot of empty files in the A > subdirectory (A contains all the articles). It turns out that each > of them needs to be redirected to another article. The redirects > can be determined by downloading and running the zimReader program > for Linux, which can be found at openzim.org. > > There appear to be a few duplicate articles (none were deleted), which > I list below (in ASCII) for anyone who is interested: > > Abu Rayhan Biruni > 'Alawi > Battle of Mohacs > Beer-Lambert law > Charismatic movement > Elian Gonzalez affair > Ismail Enver > Ismet Inonu > Istiklal Marsi > Izmir Province > Wikipedia:0.7/0.7geo/Leopold > Macapa or Macapai > Maceia or Maceio > Nicole Vaidisova > PRIDE Fighting Championships > War in Afghanistan (2001-present) > > 3. I used a Perl script to copy all the files from the zim dump to a > staging area, modifying the links along the way. There are many, many > dead image links (26314 in my count); I changed those links to empty > strings. There are also some dead article links, most of them > correspond to dead image links, but a few of them should have been > redirected; they got added to my third list above. Here are all the > dead article links and any appropriate redirect for anyone who is > interested. > > A/5ISM ignore > A/35A A/D6N > A/53Z A/CWO > A/5J03 ignore > A/5J55 ignore > A/9XO A/HQO > A/APD A/A35 > A/D07 A/9PW > A/F5G A/163K > A/PRL ignore > A/TKV A/S4X > A/TR4 ignore > A/VBB ignore > A/ZM2 ignore > A/2QE6 ignore > A/T3B A/NQR > A/11B0 ignore > A/1NN2 ignore > A/5ISU A/4O > A/5JAZ ignore > A/5IV3 ignore > A/5IXU ignore > A/102Z ignore > A/1QTM ignore > A/5IOB ignore > A/5J51 ignore > A/5IP6 ignore > A/5JBO ignore > A/Y91 ignore > > 4. In addition to changing links, I made a few other changes. Each > article now has a search box for title search. I took some existing > GPLed Javascript (JSE search engine) and made extensive modifications > for this application. It only searches the titles; there is no > keyword index, and there is no text search. The motto of the code is > "Linear Search FTW". It is surprisingly snappy, though in hindsight, > searching 30000 titles is not a lot for a computer to do. The results > page is functional, but otherwise not too exciting. > > I changed the titles of the index pages to something less geeky, e.g.. > "Topical Index: Wikipedia" for the topic index page on Wikipedia. I > also fixed a number of incorrect links to the topical index page to > alphabetical index page. > > Enjoy, > > Tom Bylander Thank you very much Andy for having forwarded this email, this is really interesting. That's confirm a few things: * The wikipedia_en_wp1_0.7_30000+_05_2009_beta2 is only a beta and should be improved and I will do it soon. * I need to code a perl script to check many thing in a HTML directory (https://sourceforge.net/tracker/?func=detail&aid=2901059&group_id=175508&at…) before building the ZIM * I think we need such a tool (zim-check?) in C++ coded to be able to do the same with ZIM files (I see that pretty necessary if we want to setup a ZIM Library: nobody want that we spread bad quality ZIM) http://bugs.openzim.org/show_bug.cgi?id=14 As soon as I will have finished with that stuff I will publish a new version of the ZIM file and contact Tom. Regards Emmanuel

1 0

[openZIM dev-l] zeno, zim formats
by Andy Rabagliati 20 Nov '09

20 Nov '09

Folks, Coming here by way of the Wikipedia:Version 1.0 Editorial Team pages, I have a few questions. Disclaimer :- I am in South Africa, where bandwidth is expensive, so a lot of my 'fiddling' is done on my server in the USA, where I do not have b/w limitations. My interest is to be able to serve wikipedia content off a server in a classroom - usually a Thin Client setup, but not always, and they rarely have internet access. Weak, small clients, networked to a strong server, running linux. I downloaded a torrent - en.wikipedia.okawix - which appeared to be a tarball with these contents :- -rw-r--r-- 1 andyr andyr 571428918 Jul 6 23:24 article.index -rw-r--r-- 1 andyr andyr 108812163 Jul 6 23:24 article.map -rw-r--r-- 1 andyr andyr 5601891132 Jul 6 23:23 en.wikipedia.zeno -rw-r--r-- 1 andyr andyr 11 Jun 23 18:58 entry.url -rw-r--r-- 1 andyr andyr 307 Jul 6 23:23 licence.xml -rw-r--r-- 1 andyr andyr 596303890 Jul 6 23:25 word.index -rw-r--r-- 1 andyr andyr 108108941 Jul 6 23:25 word.map I see a zeno file - a precursor to the zim format ? The ZimReader static binary from http://openzim.org/Releases dated 2009-06-07 (same date as the files above ?) will not read the file. Am I doing this wrong ? Secondly, I downloaded wikipedia_en_wp1_0.7_30000+_05_2009_beta2.zim and pointed the ZimReader at this - hoping to browse to server:8080 and read the file. But I have no index, and no search - do I need to get the index files separately ? And the css files (?) are in German - do I need to compile my own reader from svn to get an English reader ? Can I build a deb (for Ubuntu ?) Cheers, Andy!

4 10

Re: [openZIM dev-l] Compiling zimreader/writer
by emmanuel＠engelhart.org 20 Nov '09

20 Nov '09

Le jeu 19/11/09 21:39, "Andy Rabagliati" andyr(a)wizzy.com a écrit: > On Thu, 19 Nov 2009, Tommi M?kitalo wrote: > > > Hi, > > > > you try to build svn head cxxtools with the debian > configuration of cxxtools > 1.4.8. Looks like it do not work. It really looks > like the call to aclocal is > wrong. Aclocal needs "-I m4" to work. This was not > the case in cxxtools-1.4.8. > > > Thank you. I patched debian/rules. > > I will let you know if I have a deb, and of any other problems. I think this is a very important point: Having a good solution is at least as important as making it available and easy to use. I'm currently working on the Kiwix deb package and I have had to integrate the zimlib code in my source because there is no package. Hope we will be able to find a deb maintainer able to setup and keep uptodate deb packages of the zimlib, zimtools and tntzimreader. In the worth case, I will do it, but I'm a newbie in deb packaging and the whole stuff may need a few months. Regards Emmanuel

1 0

Re: [openZIM dev-l] Compiling zimreader/writer
by emmanuel＠engelhart.org 19 Nov '09

19 Nov '09

Le jeu 19/11/09 14:54, "Andy Rabagliati" andyr(a)wizzy.com a écrit: > I would like to check out how well zimreader will serve a zim file via > http. I am compiling zimreader and writer from svn. > > Ubuntu karmic has libcxxtools-dev version 1.4.8-3 and > libtntdb-dev version 1.0.1-3 and > libtntnet-dev version 1.6.3-4 > > If I use those, zimwriter throws up errors It works with the svn version of cxxtools. > Another question - is there a *small* zim file anywhere I can use ?? http://tmp.kiwix.org/tmp/kiwix_wp_pt_stub.zim Hope it is small enough. Emmanuel

3 3

[openZIM dev-l] Compiling zimreader/writer
by Andy Rabagliati 19 Nov '09

19 Nov '09

I would like to check out how well zimreader will serve a zim file via http. I am compiling zimreader and writer from svn. Ubuntu karmic has libcxxtools-dev version 1.4.8-3 and libtntdb-dev version 1.0.1-3 and libtntnet-dev version 1.6.3-4 If I use those, zimwriter throws up errors filesource.cpp: In member function virtual bool zim::writer::FileArticle::isRedirect() const: filesource.cpp:55: error: CXXTOOLS_ERROR_MSG was not declared in this scope Zimreader compiles OK Another question - is there a *small* zim file anywhere I can use ?? Cheers, Andy!

1 0

[openZIM dev-l] Developers Meeting - when do you arrive?
by Manuel Schneider 18 Nov '09

18 Nov '09

Hi all, it is Wednesday already and the Developers Meeting is just two days away. Hotel rooms are booked at the same place as last time. As we are so many participants we didn't have a chance to realise the idea we had the last time with renting an appartement with space to sleep, accommodation and internet access. There were no appartements that were big enough and provided everything we need. So we will be at the same hotel (Hotel Löwen, Gündenhausen 16, 79650 Schopfheim, +49 7622 688499-0) and at the same venue (Tranter HES, Hohe-Flum-Strasse 31, 79650 Schopfheim, 300m from Hotel Löwen). Many of you already gave me their itineraries, so I can arrange pick-up from airport or advise how to get to the hotel. If you need help please contact me now. Anyway it would be good if you could tell me when you arrive and when you will depart - or just add it yourself in the wiki: http://openzim.org/Developer_Meetings/2009-2#Participants We meet on Friday at Tranter HES, those I pick-up somewhere I will bring there, after 20:00 we will go over to the hotel. So if you come later than 20:00 go directly to the hotel. The hotel rooms are book on your name but already paid by me, these cost will be covered by the openZIM budget. There is breakfast at the hotel included. We will have dinner at Hotel Löwen on Friday and there will be a buffet available all day at the venue, both is covered by openZIM. To help us with our tight budget Tomasz managed to get another dinner covered by the Wikimedia Foundation (except alcoholic beverages), we will have that on Saturday night. Emergency phone numbers: Manuel: +49 160 90803992 Annette: +49 170 7740589 Have a good trip! Manuel -- Regards Manuel Schneider Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Offline-l November 2009