Hi,
Le mer 24/03/10 08:13, "Manuel Schneider" manuel.schneider@wikimedia.ch a écrit:
I have to admit that I currently have no idea where there are problems with LTR - I always thought of ZIM just being a container - it doesn't process the data you put into it. SO if you put in LTR text you will get out LTR text again.
I though the same thing ;)
What we would need though is a hebrew/arab/yiddish... stemmer for the fulltext search index creation and of course support on the reader application. But this is Kiwix (which as far as I know supports LTR because it was used by MoulinWiki) or the webbrowser when using zimreader.
I have ZIM files in Arabic, Hebrew and Farsi and as far as I know they are no specific issues related to these languages in Kiwix nor in the zimlib. Asaf, if you have such bugs, the best is to open a ticket in the appropriate bug tracker: http://bugzilla.openzim.org/Main_Page or http://bugs.kiwix.org
About the stemming, Xapian (used by Kiwix) has no solution for these languages... this would be a pretty good idea to have a grant for that (at least one language).
Emmanuel
PS: Asaf, what about my Invitation to meet Reg from Moulinwiki in Tel-Aviv?
Manuel
Am 23.03.2010 23:55, schrieb Asaf Bartov:
I may have mentioned this before, but I'll state it again:
Wikimedia Israel has set aside funds (on the order of $3000) to support development of OpenZIM. We hope to encourage work on the specific issues that still hamper successful deployment of right-to-left Wikipedia (Hebrew, Arabic, Farsi, Yiddish), and I was hoping to discuss the specifics with Manuel and whoever else will be attending the upcoming conference in Berlin.
Asaf
Hi,
I want to inform you that Wikimedia CH has again decided to sponsor openZIM and has approved our full budget of 9'000 EUR for 2010.
For more information and updates on spendings see http://openzim.org/Budget_2010
Side note: Erik Möller, Deputy Director of Wikimedia Foundation, has offered to match the spendings of Wikimedia CH on the openZIM budget through the Chapters Grants program.
I want to thank Wikimedia CH and Wikimedia Foundation for the generous support of the openZIM project!
Happy Easter!
/Manuel
Am 24.03.2010 12:03, schrieb emmanuel@engelhart.org:
Hi,
Le mer 24/03/10 08:13, "Manuel Schneider" manuel.schneider@wikimedia.ch a écrit:
I have to admit that I currently have no idea where there are problems with LTR - I always thought of ZIM just being a container - it doesn't process the data you put into it. SO if you put in LTR text you will get out LTR text again.
I though the same thing ;)
What we would need though is a hebrew/arab/yiddish... stemmer for the fulltext search index creation and of course support on the reader application. But this is Kiwix (which as far as I know supports LTR because it was used by MoulinWiki) or the webbrowser when using zimreader.
I have ZIM files in Arabic, Hebrew and Farsi and as far as I know they are no specific issues related to these languages in Kiwix nor in the zimlib. Asaf, if you have such bugs, the best is to open a ticket in the appropriate bug tracker: http://bugzilla.openzim.org/Main_Page or http://bugs.kiwix.org
About the stemming, Xapian (used by Kiwix) has no solution for these languages... this would be a pretty good idea to have a grant for that (at least one language).
Emmanuel
PS: Asaf, what about my Invitation to meet Reg from Moulinwiki in Tel-Aviv?
Manuel
Am 23.03.2010 23:55, schrieb Asaf Bartov:
I may have mentioned this before, but I'll state it again:
Wikimedia Israel has set aside funds (on the order of $3000) to support development of OpenZIM. We hope to encourage work on the specific issues that still hamper successful deployment of right-to-left Wikipedia (Hebrew, Arabic, Farsi, Yiddish), and I was hoping to discuss the specifics with Manuel and whoever else will be attending the upcoming conference in Berlin.
Asaf
On Sun, 04 Apr 2010, Manuel Schneider wrote:
Side note: Erik M??ller, Deputy Director of Wikimedia Foundation, has offered to match the spendings of Wikimedia CH on the openZIM budget through the Chapters Grants program.
I want to thank Wikimedia CH and Wikimedia Foundation for the generous support of the openZIM project!
Well done.
Cheers, Andy!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Yohooo! Emmanuel
Manuel Schneider a écrit :
Hi,
I want to inform you that Wikimedia CH has again decided to sponsor openZIM and has approved our full budget of 9'000 EUR for 2010.
For more information and updates on spendings see http://openzim.org/Budget_2010
Side note: Erik Möller, Deputy Director of Wikimedia Foundation, has offered to match the spendings of Wikimedia CH on the openZIM budget through the Chapters Grants program.
I want to thank Wikimedia CH and Wikimedia Foundation for the generous support of the openZIM project!
Happy Easter!
/Manuel
Am 24.03.2010 12:03, schrieb emmanuel@engelhart.org:
Hi,
Le mer 24/03/10 08:13, "Manuel Schneider" manuel.schneider@wikimedia.ch a écrit:
I have to admit that I currently have no idea where there are problems with LTR - I always thought of ZIM just being a container - it doesn't process the data you put into it. SO if you put in LTR text you will get out LTR text again.
I though the same thing ;)
What we would need though is a hebrew/arab/yiddish... stemmer for the fulltext search index creation and of course support on the reader application. But this is Kiwix (which as far as I know supports LTR because it was used by MoulinWiki) or the webbrowser when using zimreader.
I have ZIM files in Arabic, Hebrew and Farsi and as far as I know they are no specific issues related to these languages in Kiwix nor in the zimlib. Asaf, if you have such bugs, the best is to open a ticket in the appropriate bug tracker: http://bugzilla.openzim.org/Main_Page or http://bugs.kiwix.org
About the stemming, Xapian (used by Kiwix) has no solution for these languages... this would be a pretty good idea to have a grant for that (at least one language).
Emmanuel
PS: Asaf, what about my Invitation to meet Reg from Moulinwiki in Tel-Aviv?
Manuel
Am 23.03.2010 23:55, schrieb Asaf Bartov:
I may have mentioned this before, but I'll state it again:
Wikimedia Israel has set aside funds (on the order of $3000) to support development of OpenZIM. We hope to encourage work on the specific issues that still hamper successful deployment of right-to-left Wikipedia (Hebrew, Arabic, Farsi, Yiddish), and I was hoping to discuss the specifics with Manuel and whoever else will be attending the upcoming conference in Berlin.
Asaf
Great news! So lets go on doing something useful for that money;-)
I'm back from my vacation and already checked in the first of what I've done. I was offline but had my netbook with me.
I successfully replaced the std::ifstream with my own implentation. So for the windows porters there should be one a single system call left to #ifdef with some win32-specific call. In zimlib/src/fstream.cpp there is a call to lseek64.
Tommi
Am Sonntag 04 April 2010 16:54:13 schrieb Manuel Schneider:
Hi,
I want to inform you that Wikimedia CH has again decided to sponsor openZIM and has approved our full budget of 9'000 EUR for 2010.
For more information and updates on spendings see http://openzim.org/Budget_2010
Side note: Erik Möller, Deputy Director of Wikimedia Foundation, has offered to match the spendings of Wikimedia CH on the openZIM budget through the Chapters Grants program.
I want to thank Wikimedia CH and Wikimedia Foundation for the generous support of the openZIM project!
Happy Easter!
/Manuel
Am 24.03.2010 12:03, schrieb emmanuel@engelhart.org:
Hi,
Le mer 24/03/10 08:13, "Manuel Schneider" manuel.schneider@wikimedia.ch
a écrit:
I have to admit that I currently have no idea where there are problems with LTR - I always thought of ZIM just being a container - it doesn't process the data you put into it. SO if you put in LTR text you will get out LTR text again.
I though the same thing ;)
What we would need though is a hebrew/arab/yiddish... stemmer for the fulltext search index creation and of course support on the reader application. But this is Kiwix (which as far as I know supports LTR because it was used by MoulinWiki) or the webbrowser when using zimreader.
I have ZIM files in Arabic, Hebrew and Farsi and as far as I know they are no specific issues related to these languages in Kiwix nor in the zimlib. Asaf, if you have such bugs, the best is to open a ticket in the appropriate bug tracker: http://bugzilla.openzim.org/Main_Page or http://bugs.kiwix.org
About the stemming, Xapian (used by Kiwix) has no solution for these languages... this would be a pretty good idea to have a grant for that (at least one language).
Emmanuel
PS: Asaf, what about my Invitation to meet Reg from Moulinwiki in Tel-Aviv?
Manuel
Am 23.03.2010 23:55, schrieb Asaf Bartov:
I may have mentioned this before, but I'll state it again:
Wikimedia Israel has set aside funds (on the order of $3000) to support development of OpenZIM. We hope to encourage work on the specific issues that still hamper successful deployment of right-to-left Wikipedia (Hebrew, Arabic, Farsi, Yiddish), and I was hoping to discuss the specifics with Manuel and whoever else will be attending the upcoming conference in Berlin.
Asaf
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Subject was "Re: [openZIM dev-l] Suggested Budget for 2010"
Your new implementation which should help to avoid the 2GB limit under windows works well on GNU/Linux. Unfortunately, on Windows with MSVC this is not the case.
Few things to modify to achieve to compile: * remove the #include <unistd.h> which is a POSIX specific header... and seems not to be necessary under GNU/Linux. * include io.h in fstream.[h|cpp]
After that it will achieve to compile... But unfortunately it does not run *at all*. I always get reading errors. The errors are not always at the same place, depends on file I try to open.
For example with the following file, the process dies pretty early by reading the header: http://tmp.kiwix.org/zim/0.9/wikipedia_en_wp1_0.7_30000+_05_2009_beta3.zim
After reading the 72 first bytes from the header, the stream has the fail flag... In fact, if I only read 16 bytes it's OK (until the next read error), but more than 17 and I will get an error. The byte at the 16th position has the value 26 which is maybe interpreted as a end of file character.
But with an other file... the error may occurs later.
So, it seems to me that the problem depends on the content and is independent of the file size. Maybe the file is read in text mode and not in binary mode... but I have no evidence of that.
Someone has an idea?
Emmanuel
Tommi Mäkitalo a écrit :
Great news! So lets go on doing something useful for that money;-)
I'm back from my vacation and already checked in the first of what I've done. I was offline but had my netbook with me.
I successfully replaced the std::ifstream with my own implentation. So for the windows porters there should be one a single system call left to #ifdef with some win32-specific call. In zimlib/src/fstream.cpp there is a call to lseek64.
Tommi
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Additional information: I have checked the value returned by in.flags() (flags associated to the stream) and they are not the same on GNU/Linux and Windows.
Emmanuel
Emmanuel Engelhart a écrit :
Subject was "Re: [openZIM dev-l] Suggested Budget for 2010"
Your new implementation which should help to avoid the 2GB limit under windows works well on GNU/Linux. Unfortunately, on Windows with MSVC this is not the case.
Few things to modify to achieve to compile:
- remove the #include <unistd.h> which is a POSIX specific header... and
seems not to be necessary under GNU/Linux.
- include io.h in fstream.[h|cpp]
After that it will achieve to compile... But unfortunately it does not run *at all*. I always get reading errors. The errors are not always at the same place, depends on file I try to open.
For example with the following file, the process dies pretty early by reading the header: http://tmp.kiwix.org/zim/0.9/wikipedia_en_wp1_0.7_30000+_05_2009_beta3.zim
After reading the 72 first bytes from the header, the stream has the fail flag... In fact, if I only read 16 bytes it's OK (until the next read error), but more than 17 and I will get an error. The byte at the 16th position has the value 26 which is maybe interpreted as a end of file character.
But with an other file... the error may occurs later.
So, it seems to me that the problem depends on the content and is independent of the file size. Maybe the file is read in text mode and not in binary mode... but I have no evidence of that.
Someone has an idea?
Emmanuel
Tommi Mäkitalo a écrit :
Great news! So lets go on doing something useful for that money;-)
I'm back from my vacation and already checked in the first of what I've done. I was offline but had my netbook with me.
I successfully replaced the std::ifstream with my own implentation. So for the windows porters there should be one a single system call left to #ifdef with some win32-specific call. In zimlib/src/fstream.cpp there is a call to lseek64.
Tommi
_______________________________________________ dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l
Hi Emmanuel,
26 was a good hint. It is interpreted as eof on windows, when the file is opened in text mode. Also cr-lf is translated into a single lf, which is also not correct for reading zim files. I added the O_BINARY-flag to the call to open.
Also I removed <unistd.h> from uuid.cpp.
For what is io.h needed? Which error message do you get? Linux don't need it and can't even find io.h on my linux boxes.
Tommi
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tommi Mäkitalo a écrit :
26 was a good hint. It is interpreted as eof on windows, when the file is opened in text mode. Also cr-lf is translated into a single lf, which is also not correct for reading zim files. I added the O_BINARY-flag to the call to open.
Yes, this was a/the problem. A pity that I did not though about that sooner... this is not the first time I invest time on this TEXT read mode issue on a Windows system :(
Also I removed <unistd.h> from uuid.cpp.
Thanks
For what is io.h needed? Which error message do you get? Linux don't need it and can't even find io.h on my linux boxes.
If you remove this #include, read(), open(), close() and also _lseeki64() will become unknown functions.
Now, I have fixed the last problems under windows. Would be great to patch also the upstream. You can see the small things I have done here: http://kiwix.svn.sourceforge.net/viewvc/kiwix?view=rev&revision=1446
I have prepared a self-installer of the windows version of Kiwix (now 100% functional) here: http://tmp.kiwix.org/tmp/kiwix-install.exe
The source code may be found here: http://tmp.kiwix.org/src/nightly/kiwix-svn-2010-04-08.tar.bz2
... Compilation under GNU/Linux follows the GNU standard ./configure; ./make; ./make install
Kiwix 0.9 alpha1 will be published soon.
Regards Emmanuel