Offline-l January 2010

offline-l@lists.wikimedia.org

5 participants
12 discussions

by bugzilla-daemon＠openzim.org

http://bugs.openzim.org/show_bug.cgi?id=14 Summary: zim-check Product: openZIM Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: zimwriter AssignedTo: tommi(a)tntnet.org ReportedBy: emmanuel(a)engelhart.org CC: dev-l(a)openzim.org Estimated Hours: 0.0 We need to have a way to check the quality of of zim file. Should be at least checked: * (WARNING) has a welcome page * (ERROR) broken local HTML links * (WARNING) redundant content * ... -- Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

14 years

[openZIM dev-l] [Bug 15] New: zim::File getUuid() should return an HEX encoded MD5 hash

by bugzilla-daemon＠openzim.org

http://bugs.openzim.org/show_bug.cgi?id=15 Summary: zim::File getUuid() should return an HEX encoded MD5 hash Product: openZIM Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: zimlib AssignedTo: tommi(a)tntnet.org ReportedBy: emmanuel(a)engelhart.org CC: dev-l(a)openzim.org Estimated Hours: 0.0 It returns currently the binary value which is not so easy to handle with (can not use standard string manipulation functions and can not easily save it in a file). It would be better to return the same value but in HEX format as a NULL terminated string. This would avoid (I'm sure) each client using the zimlib to recode this conversion. -- Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

14 years, 1 month

[openZIM dev-l] [Bug 11] New: Dynamic mime-types

by bugzilla-daemon＠openzim.org

http://bugs.openzim.org/show_bug.cgi?id=11 Summary: Dynamic mime-types Product: openZIM Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: zimlib AssignedTo: tommi(a)tntnet.org ReportedBy: emmanuel(a)engelhart.org CC: dev-l(a)openzim.org Estimated Hours: 0.0 Currently, the ZIM format can only support a limited and predefined set of document mime-types. You can get the list of supported mime-types here: http://www.openzim.org/ZIM_File_Format#Mime_types Mime-types are represented by a number in a ZIM file, and the mapping is done statically by the zimlib. This means than you can not store document with custom mime-type. This is a problem for me because I have people who use Kiwix and deal with other mime-types, for example: archives or binaries. I think, this is a necessary improvement to make this mime-type table dynamic. I see two solutions : * The ZIM file creator specify it manually during the creation process. * It goes automatically (also during the creation process). -- Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

14 years, 1 month

[openZIM dev-l] Changes in zim tools

by Tommi Mäkitalo

Hi, I implemented some of the changes Emmanuel suggested. The tools are now all lowercase (zimdump, zimsearch, zimreader, zimwriter and zimbench). The mime type is now in a separate table for the zimwriter. The mimetype-table has a column, if the mimetype should be compressed. I also changed the name of the table "zimarticles" to "zimarticle". Tablenames should be singular like "article". It has annoyed me quite some time that the naming was not consistent. There is also a new source for article data in zimwriter. I combined the full text search with the zimwriter. Using e.g.: zimwriter -S film -I wikipedia-de.zim -X wikipedia-de-x.zim film creates a zim file "film.zim" with all articles from wikipedia-de.zim containing the word "film". This is especially useful for me to create smaller subsets from the big wikipedia-de.zim. Tommi

14 years, 3 months

Re: [openZIM dev-l] [ZIMREADER] remarks and questions

by emmanuel＠engelhart.org

Le jeu 14/01/10 21:18, "Tommi Mäkitalo" tommi(a)tntnet.org a écrit: > I have to think about the skin. The zimreader needs a user interface and > this is realized as html files. This is what is called "skin". I currently don't > exactly know, how to change that. I need some dynamic html in the > application. Ok, I do not really know in details your opinion about the whole topic... but my feeling is that having the possibility to choose a skin on the command line, would be an improvement. > The content-type is perfectly ok in zimreader. Your zim file is wrong. And > I know also why. The mimetype column in article is now a text instead of a > integer. You have to put the actual mime type into that column. Remeber, > that we do not use a fixed enum for mime types any more but a dynamically built > list inside the zim file. > > This is also the explanation, why your images are slower. They are indeed > clustered and compressed, since the mime type do not match the hard coded > mime type list found in zimwriter/src/zimcreator.cpp (image/jpeg, image/png, > image/tiff, image/gif, application/zip). It is normally not a good idea to > have it hard coded but I currently have no better idea. Thx, it works now and is faster. About the list, I think a "clean" solution would be to have an additional table "mimetypes" with two fields "name (varchar)" and "compressed (bool)". A default strategy (all mimetypes compressed, all not compressed, or a default configuration (like currently) would garanty that the zimwriter can work without entries in this table. Regards Emmanuel

14 years, 3 months

[openZIM dev-l] [ZIMREADER] remarks and questions

by emmanuel＠engelhart.org

Hi, About the name first. It's called "ZimReader" which is not homogenous with the other ZIM binaries "zimDump" and "zimWriter"... so it should be IMO "zimWriter". But, in fact, having uppercased character in a binary seems to pretty unusual, so wouldn't be "zimwriter", "zimdump" and "zimreader" better? About the default skin, is that possible that currently, per default there is no skin anymore? I had before to tweak the skin to remove the german wikipedia UI and this seems not being necessary anymore! If that's true, great! My feeling is that the zimreader, with the new format is not able anymore to determine correctly the HTTP content-type. The HTTP header for an HTML page looks like following: (...) Keep-Alive: timeout=15000, max=999 Connection: Keep-Alive Last-Modified: Tue, 12 Jan 2010 21:01:19 GMT Content-Type: 0 Set-Cookie: tntnet.=8649805af3b641b7d9f7dbf3193b288d; Path=/;Version=1 (...) The Content-type should be different.... isn't it? Regards Emmanuel

14 years, 3 months

[openZIM dev-l] Compile error on zimreader

by Andy Rabagliati

Folks, I hope it is OK if I post here. I have zimlib, zimreader, zimwriter from svn, revision 295. I am compiling from Ubuntu Karmic. I had to install quite a few dev libraries - these are the tnt ones ? aptitude show libtntnet8 libtntdb-dev tntnet-runtime libtntdb1 | egrep '^(Package|State|Version)' Package: libtntnet8 State: installed Version: 1.6.3-5 Package: libtntdb-dev State: installed Version: 1.0.1-4 Package: tntnet-runtime State: installed Version: 1.6.3-5 Package: libtntdb1 State: installed Version: 1.0.1-4 I start with "make clean ; ./autogen.sh; ./configure; make" zimlib compiles fine. zimreader has been coming up with this error the last few times I tried to compile :- ------------------------------- make[2]: Entering directory `/srv/svn/zim/zimreader/src' depbase=`echo zimcomp.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ g++ -DHAVE_CONFIG_H -I. -I../include -g -O2 -MT zimcomp.o -MD -MP -MF $depbase.Tpo -c -o zimcomp.o zimcomp.cpp &&\ mv -f $depbase.Tpo $depbase.Po zimcomp.ecpp: In member function ‘virtual unsigned int<unnamed>::_component_zimcomp::operator()(tnt::HttpRequest&, tnt::HttpReply&, tnt::QueryParams&)’: zimcomp.ecpp:17: error: no match for ‘operator!=’ in ‘zim::Article::getUrl() const() != pathInfo’ /usr/local/include/zim/qunicode.h:87: note: candidates are: bool zim::QUnicodeString::operator!=(const zim::QUnicodeString&) const zimcomp.ecpp:20: error: no matching function for call to ‘zim::File::getArticle(char&, std::string&)’ /usr/local/include/zim/file.h:51: note: candidates are: zim::Article zim::File::getArticle(zim::size_type) const /usr/local/include/zim/file.h:52: note: zim::Article zim::File::getArticle(char, const zim::QUnicodeString&, bool) zimcomp.ecpp:49: error: no matching function for call to ‘tnt::HttpReply::redirect(zim::QUnicodeString)’ /usr/include/tnt/httpreply.h:77: note: candidates are: unsigned int tnt::HttpReply::redirect(const std::string&) zimcomp.ecpp:58: error: no match for ‘operator=’ in ‘title = zim::Article::getTitle() const()’ /usr/include/c++/4.4/bits/basic_string.h:505: note: candidates are: std::basic_string<_CharT, _Traits, _Alloc>& std::basic_string<_CharT, _Traits, _Alloc>::operator=(const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits<char>, _Alloc = std::allocator<char>] /usr/include/c++/4.4/bits/basic_string.h:513: note: std::basic_string<_CharT, _Traits, _Alloc>& std::basic_string<_CharT, _Traits, _Alloc>::operator=(const _CharT*) [with _CharT = char, _Traits = std::char_traits<char>, _Alloc = std::allocator<char>] /usr/include/c++/4.4/bits/basic_string.h:524: note: std::basic_string<_CharT, _Traits, _Alloc>& std::basic_string<_CharT, _Traits, _Alloc>::operator=(_CharT) [with _CharT = char, _Traits = std::char_traits<char>, _Alloc = std::allocator<char>] make[2]: *** [zimcomp.o] Error 1 ------------------------------- What am I doing wrong ? Cheers, Andy!

14 years, 3 months

[openZIM dev-l] [QUESTION] New index by title

by emmanuel＠engelhart.org

Hi, during the dev. meeting we have decided to have, in addition to the url sorted index, a title sorted index (at least optional). I can't see it in the roadmap. Is something wrong with it? I have started to play with the trunk again. Until now it works perfectly and I was also able to generate new ZIM files. I will introduce the new format in Kiwix trunk and start to look at the Windows portability. Emmanuel

14 years, 3 months

Re: [openZIM dev-l] [QUESTION] New index by title

by emmanuel＠engelhart.org

OK, thank you Manuel. This was a misunderstanding by me. I have achieved to have different "title" and "url" per article in my test ZIM file, so the zimwriter deals from now by me correctly with the DB "url" and "title" fields. :) But, I was not able to check the title index: * Do we have a way with zimDump for example to get all article titles starting with "open" for example? * the zimreader has a suggest feature (able to propose articles URLs which start with the first characters you have typed). Where is the zimreader code which does that ? Is it easy to adapt it to do the same with titles? Emmanuel Le mer 06/01/10 11:56, "Manuel Schneider" manuel.schneider(a)wikimedia.ch a écrit: > Hi, > > see the first item on the Roadmap: > http://openzim.org/Roadmap > add Pointer to UrlPointerList (IndexPointerList will be named > "TitlePointerList") > > and > > add UrlPointerList (article list ordered by URL) > > both already done. > > Regards, > > Manuel > > > Am 06.01.2010 11:43, schrieb emmanu > el(a)engelhart.org: > Hi, > > > > during the dev. meeting we have decided to have, in > addition to the url sorted index, a title sorted index (at least > optional). > I can't see it in the roadmap. Is something wrong > with it? > > > I have started to play with the trunk again. Until > now it works perfectly and I was also able to generate new ZIM > files. > I will introduce the new format in Kiwix trunk and > start to look at the Windows portability. > > > Emmanuel > > > > > > > > > _______________________________________________ > dev-l mailing list > > dev-l@openz > im.org > https://intern.openzim.org/mailman/listinfo/dev-l > >> > > -- > Regards > Manuel Schneider > > Wikimedia CH - Verein zur Förderung Freien Wissens > Wikimedia CH - Association for the advancement of free knowledge > www.wikimedia.ch > >

14 years, 3 months

Re: [openZIM dev-l] [QUESTION] New index by title

by emmanuel＠engelhart.org

Le mer 06/01/10 14:27, "Tommi Mäkitalo" tommi(a)tntnet.org a écrit: > There is a flag -t in zimDump, which switches to search by title. I just > realized that this is missing on the help page. I will add that. > > To find a article by title use: > zimDump -t -i -f "MyTitle" myzimfile.zim :) I wanted to try the "-t" parameter, but I now don' t understand why the second command does not return the article with idx=4 $zimDump -i -f 8P articles.zim url: 9 title: The Beatles idx: 4 namespace: A redirect: 0 mime-type: 0 article size: 177163 $ zimDump -t -i -f "The Beatles" articles.zim url: 10 title: 10 idx: 9 namespace: I redirect: 0 mime-type: 3 article size: 24152 The ZIM file can be found here: http://tmp.kiwix.org/tmp/articles.zim > To use it in your application use iterators: > > zim::File::const_iterator it = zimFile.find('A', "MyUrl"); > or > zim::File::const_iterator it = zimFile.findByTitle('A', "MyTitle"); > > This will return a iterator to the first article whose url (or title) is > equal or greater than the passed string. You have to compare the iterator against > zimFile.end() to check, if it points really to a valid article. If you look > for a range of articles starting with a specific url (or title) you have to > do that manually e.g.: > zim::File::const_iterator itFirst = zimFile.find('A', "MyUrl"); > zim::File::const_iterator itLast = zimFile.find('A', "MyUrm"); > > or by comparing yourself: > for (zim::File::const_iterator it = zimFile.find('A', "MyUrl"); > it != zimFile.end() && it->getUrl().compare(0, 5, "MyUrl") == 0) > ++it) > { > } Perfect, will test it ASAP. Emmanuel

14 years, 3 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Offline-l January 2010