On 24/09/2011 14:24, Christian Pühringer wrote:
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is apparently about 1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the complete cluster.
Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.
Emmanuel
Yes, images are not compressed in ZIM files, as Emmanuel pointed out.
Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem.
LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used.
On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart emmanuel@engelhart.orgwrote:
On 24/09/2011 14:24, Christian Pühringer wrote:
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is
apparently about
1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the
complete
cluster.
Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.
Emmanuel
dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l
Hi Arunesh,
Firstly, thank you again for doing the java port.
I've one question regarding the features of the port: Is there or do you plan to add a feature for iterating over the index? (Similar to find/findByTitle in C++ zimlib) This would be required to add some auto complete feature to the phonegap app.
Best regards, Christian
Am 24.09.2011 14:42, schrieb Arunesh Mathur:
Yes, images are not compressed in ZIM files, as Emmanuel pointed out.
Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem.
LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used.
On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart <emmanuel@engelhart.org mailto:emmanuel@engelhart.org> wrote:
On 24/09/2011 14:24, Christian Pühringer wrote: > The JAVA liblzma performance is pretty bad: To increase efficiency of > compression in the zim-format articles (and also all > other data like images) are stored in clusters. Cluster size is apparently about > 1 MB. This implies that loading an article > which is stored at the end of a cluster involves decompressing the complete > cluster. Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files. Emmanuel _______________________________________________ dev-l mailing list dev-l@openzim.org <mailto:dev-l@openzim.org> https://intern.openzim.org/mailman/listinfo/dev-l
-- Best,
Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.
dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l
Hi Christian, You can look at getDirectoryInfo(String articleName, char namespace) in ZimReader.java This'll return null if there is the article doesn't exist, else the DirectoryEntry object. (Is this what you were looking for?)
Actually I plan on adding more features if required. Do you think there are any missing, when compared to zimlib? If yes, could you list them?
Have you tested zimreader-java on an Android phone? How is it's performance?
On Mon, Oct 3, 2011 at 11:10 PM, Christian Pühringer cip@gmx.at wrote:
Hi Arunesh,
Firstly, thank you again for doing the java port.
I've one question regarding the features of the port: Is there or do you plan to add a feature for iterating over the index? (Similar to find/findByTitle in C++ zimlib) This would be required to add some auto complete feature to the phonegap app.
Best regards, Christian
Am 24.09.2011 14:42, schrieb Arunesh Mathur:
Yes, images are not compressed in ZIM files, as Emmanuel pointed out.
Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem.
LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used.
On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart < emmanuel@engelhart.org> wrote:
On 24/09/2011 14:24, Christian Pühringer wrote:
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is
apparently about
1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the
complete
cluster.
Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.
Emmanuel
dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l
-- Best,
Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.
dev-l mailing listdev-l@openzim.orghttps://intern.openzim.org/mailman/listinfo/dev-l
Hi Arunesh,
I'd need something like findByTitle(namespace, title) in zimlib: It returns an iterator pointing to the lexicographically next article. I'd like to use this to implement an auto suggest feature (Same as in online wikipedia search box). Note: While for auto suggest iterating forward is necessary, for other features it would be good if it is also possible to iterate backward (same as in zimlib).
One other thing which may be currently missing is support for title and url search/get article functionality. It looks to me the right now everything operates on urls and not on titles. For searching it is necessary to also have title based functions. (Like findByTitle). Note that while currently in most zimfiles urls have the form 'Namespace'/'Title', this is not guaranteed, the url could be completely different from the title.
Regarding performance, have you missed Brion's and my results posted in this thread? (They may be only in the wikitech-l@lists.wikimedia.org and not in the dev-l@openzim.org list)
Best regards, Christian
Am 05.10.2011 08:22, schrieb Arunesh Mathur:
Hi Christian, You can look at getDirectoryInfo(String articleName, char namespace) in ZimReader.java This'll return null if there is the article doesn't exist, else the DirectoryEntry object. (Is this what you were looking for?)
Actually I plan on adding more features if required. Do you think there are any missing, when compared to zimlib? If yes, could you list them?
Have you tested zimreader-java on an Android phone? How is it's performance?
On Mon, Oct 3, 2011 at 11:10 PM, Christian Pühringer <cip@gmx.at mailto:cip@gmx.at> wrote:
Hi Arunesh, Firstly, thank you again for doing the java port. I've one question regarding the features of the port: Is there or do you plan to add a feature for iterating over the index? (Similar to find/findByTitle in C++ zimlib) This would be required to add some auto complete feature to the phonegap app. Best regards, Christian Am 24.09.2011 14 <tel:24.09.2011%2014>:42, schrieb Arunesh Mathur:
Yes, images are not compressed in ZIM files, as Emmanuel pointed out. Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem. LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used. On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart <emmanuel@engelhart.org <mailto:emmanuel@engelhart.org>> wrote: On 24/09/2011 14:24, Christian Pühringer wrote: > The JAVA liblzma performance is pretty bad: To increase efficiency of > compression in the zim-format articles (and also all > other data like images) are stored in clusters. Cluster size is apparently about > 1 MB. This implies that loading an article > which is stored at the end of a cluster involves decompressing the complete > cluster. Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files. Emmanuel _______________________________________________ dev-l mailing list dev-l@openzim.org <mailto:dev-l@openzim.org> https://intern.openzim.org/mailman/listinfo/dev-l -- Best, Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India. _______________________________________________ dev-l mailing list dev-l@openzim.org <mailto:dev-l@openzim.org> https://intern.openzim.org/mailman/listinfo/dev-l
-- Best,
Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.
Hi Christian, Sure, that seems easy enough to implement. I'll get that feature up and running soon.
Yes, most of the functions are based on title, I did that because I didn't find any difference between the two when I browsed around 5-6 ZIM files. But as you say, I'll add their URL counterparts as well. I'll also add the documentation-gen module to the repository.
Hmm, I'm not subscribed to that mailing list. I'll go through the thread that discusses performance.
-Arunesh Mathur
On Fri, Oct 7, 2011 at 2:31 AM, Christian Pühringer cip@gmx.at wrote:
Hi Arunesh,
I'd need something like findByTitle(namespace, title) in zimlib: It returns an iterator pointing to the lexicographically next article. I'd like to use this to implement an auto suggest feature (Same as in online wikipedia search box). Note: While for auto suggest iterating forward is necessary, for other features it would be good if it is also possible to iterate backward (same as in zimlib).
One other thing which may be currently missing is support for title and url search/get article functionality. It looks to me the right now everything operates on urls and not on titles. For searching it is necessary to also have title based functions. (Like findByTitle). Note that while currently in most zimfiles urls have the form 'Namespace'/'Title', this is not guaranteed, the url could be completely different from the title.
Regarding performance, have you missed Brion's and my results posted in this thread? (They may be only in the wikitech-l@lists.wikimedia.org and not in the dev-l@openzim.org list)
Best regards, Christian
Am 05.10.2011 08:22, schrieb Arunesh Mathur:
Hi Christian, You can look at getDirectoryInfo(String articleName, char namespace) in ZimReader.java This'll return null if there is the article doesn't exist, else the DirectoryEntry object. (Is this what you were looking for?)
Actually I plan on adding more features if required. Do you think there are any missing, when compared to zimlib? If yes, could you list them?
Have you tested zimreader-java on an Android phone? How is it's performance?
On Mon, Oct 3, 2011 at 11:10 PM, Christian Pühringer cip@gmx.at wrote:
Hi Arunesh,
Firstly, thank you again for doing the java port.
I've one question regarding the features of the port: Is there or do you plan to add a feature for iterating over the index? (Similar to find/findByTitle in C++ zimlib) This would be required to add some auto complete feature to the phonegap app.
Best regards, Christian
Am 24.09.2011 14:42, schrieb Arunesh Mathur:
Yes, images are not compressed in ZIM files, as Emmanuel pointed out.
Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem.
LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used.
On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart < emmanuel@engelhart.org> wrote:
On 24/09/2011 14:24, Christian Pühringer wrote:
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is
apparently about
1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the
complete
cluster.
Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.
Emmanuel
dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l
-- Best,
Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.
dev-l mailing listdev-l@openzim.orghttps://intern.openzim.org/mailman/listinfo/dev-l
-- Best,
Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.
.
Hi Emanuel,
Makes sense that images are not compressed in zim file again, thanks for the clarification. In particular for windows mobile this increases the probability that with reducing the cluster size a sufficient performance is achieved (i.p. if there was a better way to display images than in android, but I haven't looked into this). For android I still expect that it it will be better to use a native library.
Christian Am 24.09.2011 14:30, schrieb Emmanuel Engelhart:
On 24/09/2011 14:24, Christian Pühringer wrote:
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is apparently about 1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the complete cluster.
Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.
Emmanuel
Awesome work Christian. Where can we find your code to test out?
If you don't have a place then I'll carve one out in our depots. On Sep 24, 2011 5:57 AM, "Christian Pühringer" christian@puehringer.net wrote:
Hi Emanuel,
Makes sense that images are not compressed in zim file again, thanks for
the
clarification. In particular for windows mobile this increases the probability that with reducing the cluster size a sufficient performance is achieved (i.p. if there was a better way to
display
images than in android, but I haven't looked into this). For android I still expect that it it will be better to use a native
library.
Christian Am 24.09.2011 14:30, schrieb Emmanuel Engelhart:
On 24/09/2011 14:24, Christian Pühringer wrote:
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is
apparently about
1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the
complete
cluster.
Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.
Emmanuel
dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l