http://bugs.openzim.org/show_bug.cgi?id=14
Summary: zim-check
Product: openZIM
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P5
Component: zimwriter
AssignedTo: tommi(a)tntnet.org
ReportedBy: emmanuel(a)engelhart.org
CC: dev-l(a)openzim.org
Estimated Hours: 0.0
We need to have a way to check the quality of of zim file.
Should be at least checked:
* (WARNING) has a welcome page
* (ERROR) broken local HTML links
* (WARNING) redundant content
* ...
--
Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.openzim.org/show_bug.cgi?id=15
Summary: zim::File getUuid() should return an HEX encoded MD5
hash
Product: openZIM
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P5
Component: zimlib
AssignedTo: tommi(a)tntnet.org
ReportedBy: emmanuel(a)engelhart.org
CC: dev-l(a)openzim.org
Estimated Hours: 0.0
It returns currently the binary value which is not so easy to handle with (can
not use standard string manipulation functions and can not easily save it in a
file).
It would be better to return the same value but in HEX format as a NULL
terminated string. This would avoid (I'm sure) each client using the zimlib to
recode this conversion.
--
Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.openzim.org/show_bug.cgi?id=11
Summary: Dynamic mime-types
Product: openZIM
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P3
Component: zimlib
AssignedTo: tommi(a)tntnet.org
ReportedBy: emmanuel(a)engelhart.org
CC: dev-l(a)openzim.org
Estimated Hours: 0.0
Currently, the ZIM format can only support a limited and predefined set of
document mime-types.
You can get the list of supported mime-types here:
http://www.openzim.org/ZIM_File_Format#Mime_types
Mime-types are represented by a number in a ZIM file, and the mapping is done
statically by the zimlib.
This means than you can not store document with custom mime-type.
This is a problem for me because I have people who use Kiwix and deal with
other mime-types, for example: archives or binaries.
I think, this is a necessary improvement to make this mime-type table dynamic.
I see two solutions :
* The ZIM file creator specify it manually during the creation process.
* It goes automatically (also during the creation process).
--
Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Hi,
I implemented some of the changes Emmanuel suggested.
The tools are now all lowercase (zimdump, zimsearch, zimreader, zimwriter and
zimbench).
The mime type is now in a separate table for the zimwriter. The mimetype-table
has a column, if the mimetype should be compressed.
I also changed the name of the table "zimarticles" to "zimarticle". Tablenames
should be singular like "article". It has annoyed me quite some time that the
naming was not consistent.
There is also a new source for article data in zimwriter. I combined the full
text search with the zimwriter. Using e.g.:
zimwriter -S film -I wikipedia-de.zim -X wikipedia-de-x.zim film
creates a zim file "film.zim" with all articles from wikipedia-de.zim containing
the word "film". This is especially useful for me to create smaller subsets
from the big wikipedia-de.zim.
Tommi
Le jeu 14/01/10 21:18, "Tommi Mäkitalo" tommi(a)tntnet.org a écrit:
> I have to think about the skin. The zimreader needs a user interface and
> this is realized as html files. This is what is called "skin". I currently don't
> exactly know, how to change that. I need some dynamic html in the
> application.
Ok, I do not really know in details your opinion about the whole topic... but my feeling
is that having the possibility to choose a skin on the command line, would be an
improvement.
> The content-type is perfectly ok in zimreader. Your zim file is wrong. And
> I know also why. The mimetype column in article is now a text instead of a
> integer. You have to put the actual mime type into that column. Remeber,
> that we do not use a fixed enum for mime types any more but a dynamically built
> list inside the zim file.
>
> This is also the explanation, why your images are slower. They are indeed
> clustered and compressed, since the mime type do not match the hard coded
> mime type list found in zimwriter/src/zimcreator.cpp (image/jpeg, image/png,
> image/tiff, image/gif, application/zip). It is normally not a good idea to
> have it hard coded but I currently have no better idea.
Thx, it works now and is faster.
About the list, I think a "clean" solution would be to have an additional table "mimetypes"
with two fields "name (varchar)" and "compressed (bool)". A default strategy
(all mimetypes compressed, all not compressed, or a default configuration (like currently)
would garanty that the zimwriter can work without entries in this table.
Regards
Emmanuel
Hi,
About the name first. It's called "ZimReader" which is not homogenous with the other ZIM binaries "zimDump" and "zimWriter"... so it should be IMO "zimWriter". But, in fact, having uppercased character in a binary seems to pretty unusual, so wouldn't be "zimwriter", "zimdump" and "zimreader" better?
About the default skin, is that possible that currently, per default there is no skin anymore? I had before to tweak the skin to remove the german wikipedia UI and this seems not being necessary anymore! If that's true, great!
My feeling is that the zimreader, with the new format is not able anymore to determine correctly the HTTP content-type. The HTTP header for an HTML page looks like following:
(...)
Keep-Alive: timeout=15000, max=999
Connection: Keep-Alive
Last-Modified: Tue, 12 Jan 2010 21:01:19 GMT
Content-Type: 0
Set-Cookie: tntnet.=8649805af3b641b7d9f7dbf3193b288d; Path=/;Version=1
(...)
The Content-type should be different.... isn't it?
Regards
Emmanuel
Folks,
I hope it is OK if I post here.
I have zimlib, zimreader, zimwriter from svn, revision 295.
I am compiling from Ubuntu Karmic.
I had to install quite a few dev libraries - these are the tnt ones ?
aptitude show libtntnet8 libtntdb-dev tntnet-runtime libtntdb1 | egrep '^(Package|State|Version)'
Package: libtntnet8
State: installed
Version: 1.6.3-5
Package: libtntdb-dev
State: installed
Version: 1.0.1-4
Package: tntnet-runtime
State: installed
Version: 1.6.3-5
Package: libtntdb1
State: installed
Version: 1.0.1-4
I start with "make clean ; ./autogen.sh; ./configure; make"
zimlib compiles fine.
zimreader has been coming up with this error the last
few times I tried to compile :-
-------------------------------
make[2]: Entering directory `/srv/svn/zim/zimreader/src'
depbase=`echo zimcomp.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
g++ -DHAVE_CONFIG_H -I. -I../include -g -O2 -MT zimcomp.o -MD -MP -MF $depbase.Tpo -c -o zimcomp.o zimcomp.cpp &&\
mv -f $depbase.Tpo $depbase.Po
zimcomp.ecpp: In member function ‘virtual unsigned int<unnamed>::_component_zimcomp::operator()(tnt::HttpRequest&, tnt::HttpReply&, tnt::QueryParams&)’:
zimcomp.ecpp:17: error: no match for ‘operator!=’ in ‘zim::Article::getUrl() const() != pathInfo’
/usr/local/include/zim/qunicode.h:87: note: candidates are: bool zim::QUnicodeString::operator!=(const zim::QUnicodeString&) const
zimcomp.ecpp:20: error: no matching function for call to ‘zim::File::getArticle(char&, std::string&)’
/usr/local/include/zim/file.h:51: note: candidates are: zim::Article zim::File::getArticle(zim::size_type) const
/usr/local/include/zim/file.h:52: note: zim::Article zim::File::getArticle(char, const zim::QUnicodeString&, bool)
zimcomp.ecpp:49: error: no matching function for call to ‘tnt::HttpReply::redirect(zim::QUnicodeString)’
/usr/include/tnt/httpreply.h:77: note: candidates are: unsigned int tnt::HttpReply::redirect(const std::string&)
zimcomp.ecpp:58: error: no match for ‘operator=’ in ‘title = zim::Article::getTitle() const()’
/usr/include/c++/4.4/bits/basic_string.h:505: note: candidates are: std::basic_string<_CharT, _Traits, _Alloc>& std::basic_string<_CharT, _Traits, _Alloc>::operator=(const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits<char>, _Alloc = std::allocator<char>]
/usr/include/c++/4.4/bits/basic_string.h:513: note: std::basic_string<_CharT, _Traits, _Alloc>& std::basic_string<_CharT, _Traits, _Alloc>::operator=(const _CharT*) [with _CharT = char, _Traits = std::char_traits<char>, _Alloc = std::allocator<char>]
/usr/include/c++/4.4/bits/basic_string.h:524: note: std::basic_string<_CharT, _Traits, _Alloc>& std::basic_string<_CharT, _Traits, _Alloc>::operator=(_CharT) [with _CharT = char, _Traits = std::char_traits<char>, _Alloc = std::allocator<char>]
make[2]: *** [zimcomp.o] Error 1
-------------------------------
What am I doing wrong ?
Cheers, Andy!
Hi,
during the dev. meeting we have decided to have, in addition to the url sorted index, a title sorted index (at least optional).
I can't see it in the roadmap. Is something wrong with it?
I have started to play with the trunk again. Until now it works perfectly and I was also able to generate new ZIM files.
I will introduce the new format in Kiwix trunk and start to look at the Windows portability.
Emmanuel
OK, thank you Manuel. This was a misunderstanding by me.
I have achieved to have different "title" and "url" per article in my test ZIM file, so the zimwriter deals from now by me correctly with the DB "url" and "title" fields. :)
But, I was not able to check the title index:
* Do we have a way with zimDump for example to get all article titles starting with "open" for example?
* the zimreader has a suggest feature (able to propose articles URLs which start with the first characters you have typed). Where is the zimreader code which does that ? Is it easy to adapt it to do the same with titles?
Emmanuel
Le mer 06/01/10 11:56, "Manuel Schneider" manuel.schneider(a)wikimedia.ch a écrit:
> Hi,
>
> see the first item on the Roadmap:
> http://openzim.org/Roadmap
> add Pointer to UrlPointerList (IndexPointerList will be named
> "TitlePointerList")
>
> and
>
> add UrlPointerList (article list ordered by URL)
>
> both already done.
>
> Regards,
>
> Manuel
>
>
> Am 06.01.2010 11:43, schrieb emmanu
> el(a)engelhart.org:
> Hi,
> >
> > during the dev. meeting we have decided to have, in
> addition to the url sorted index, a title sorted index (at least
> optional).
> I can't see it in the roadmap. Is something wrong
> with it?
>
> > I have started to play with the trunk again. Until
> now it works perfectly and I was also able to generate new ZIM
> files.
> I will introduce the new format in Kiwix trunk and
> start to look at the Windows portability.
>
> > Emmanuel
> >
> >
> >
> >
> _______________________________________________
> dev-l mailing list
> > dev-l@openz
> im.org
> https://intern.openzim.org/mailman/listinfo/dev-l
> >>
>
> --
> Regards
> Manuel Schneider
>
> Wikimedia CH - Verein zur Förderung Freien Wissens
> Wikimedia CH - Association for the advancement of free knowledge
> www.wikimedia.ch
>
>
Le mer 06/01/10 14:27, "Tommi Mäkitalo" tommi(a)tntnet.org a écrit:
> There is a flag -t in zimDump, which switches to search by title. I just
> realized that this is missing on the help page. I will add that.
>
> To find a article by title use:
> zimDump -t -i -f "MyTitle" myzimfile.zim
:)
I wanted to try the "-t" parameter, but I now don' t understand why the second command does not return the article with idx=4
$zimDump -i -f 8P articles.zim
url: 9
title: The Beatles
idx: 4
namespace: A
redirect: 0
mime-type: 0
article size: 177163
$ zimDump -t -i -f "The Beatles" articles.zim
url: 10
title: 10
idx: 9
namespace: I
redirect: 0
mime-type: 3
article size: 24152
The ZIM file can be found here:
http://tmp.kiwix.org/tmp/articles.zim
> To use it in your application use iterators:
>
> zim::File::const_iterator it = zimFile.find('A', "MyUrl");
> or
> zim::File::const_iterator it = zimFile.findByTitle('A', "MyTitle");
>
> This will return a iterator to the first article whose url (or title) is
> equal
or greater than the passed string. You have to compare the iterator against
>
zimFile.end() to check, if it points really to a valid article. If you look
>
for a range of articles starting with a specific url (or title) you have to
> do
that manually e.g.:
> zim::File::const_iterator itFirst = zimFile.find('A', "MyUrl");
> zim::File::const_iterator itLast = zimFile.find('A', "MyUrm");
>
> or by comparing yourself:
> for (zim::File::const_iterator it = zimFile.find('A', "MyUrl");
> it != zimFile.end() && it->getUrl().compare(0, 5, "MyUrl") == 0)
> ++it)
> {
> }
Perfect, will test it ASAP.
Emmanuel