Re: [openZIM dev-l] [Wikitech-l] Phonegap and Wikimedia mobile apps

List overview All Threads
Download

newer

older

Scholarships Available to attend...

Kiwix 0.9 beta3 is out!

Emmanuel Engelhart

24 Sep 2011 24 Sep '11

2:30 p.m.

On 24/09/2011 14:24, Christian Pühringer wrote:

...

The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is apparently about 1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the complete cluster.

Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.

Emmanuel

Attachments:

signature.asc (application/pgp-signature — 262 bytes)

Show replies by date

Arunesh Mathur

24 Sep 24 Sep

2:42 p.m.

New subject: [openZIM dev-l] [Wikitech-l] Phonegap and Wikimedia mobile apps

Yes, images are not compressed in ZIM files, as Emmanuel pointed out.

Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem.

LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used.

On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart emmanuel@engelhart.orgwrote:

...

On 24/09/2011 14:24, Christian Pühringer wrote:

...
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is

apparently about

...
1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the

complete

...
cluster.

Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.

Emmanuel

dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

-- Best, Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.

Christian Pühringer

3 Oct 3 Oct

7:40 p.m.

New subject: [openZIM dev-l] [Wikitech-l] Phonegap and Wikimedia mobile apps

Hi Arunesh,

Firstly, thank you again for doing the java port.

I've one question regarding the features of the port: Is there or do you plan to add a feature for iterating over the index? (Similar to find/findByTitle in C++ zimlib) This would be required to add some auto complete feature to the phonegap app.

Best regards, Christian

Am 24.09.2011 14:42, schrieb Arunesh Mathur:

...

Yes, images are not compressed in ZIM files, as Emmanuel pointed out.

Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem.

LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used.

On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart <emmanuel@engelhart.org mailto:emmanuel@engelhart.org> wrote:
On 24/09/2011 14:24, Christian Pühringer wrote:
> The JAVA liblzma performance is pretty bad: To increase efficiency of
> compression in the zim-format articles (and also all
> other data like images) are stored in clusters. Cluster size is
apparently about
> 1 MB. This implies that loading an article
> which is stored at the end of a cluster involves decompressing the complete
> cluster.

Images should not be compressed in ZIM files for the obvious reasons
they mainly are already compressed. This is the case for all ZIM files I
made. As far as I know this is also the case for Mediawiki:Collection
build ZIM files.

Emmanuel


_______________________________________________
dev-l mailing list
dev-l@openzim.org <mailto:dev-l@openzim.org>
https://intern.openzim.org/mailman/listinfo/dev-l
-- Best,

Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.

dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

Arunesh Mathur

5 Oct 5 Oct

8:22 a.m.

New subject: [openZIM dev-l] [Wikitech-l] Phonegap and Wikimedia mobile apps

Hi Christian, You can look at getDirectoryInfo(String articleName, char namespace) in ZimReader.java This'll return null if there is the article doesn't exist, else the DirectoryEntry object. (Is this what you were looking for?)

Actually I plan on adding more features if required. Do you think there are any missing, when compared to zimlib? If yes, could you list them?

Have you tested zimreader-java on an Android phone? How is it's performance?

On Mon, Oct 3, 2011 at 11:10 PM, Christian Pühringer cip@gmx.at wrote:

...

Hi Arunesh,

Firstly, thank you again for doing the java port.

I've one question regarding the features of the port: Is there or do you plan to add a feature for iterating over the index? (Similar to find/findByTitle in C++ zimlib) This would be required to add some auto complete feature to the phonegap app.

Best regards, Christian

Am 24.09.2011 14:42, schrieb Arunesh Mathur:

Yes, images are not compressed in ZIM files, as Emmanuel pointed out.

Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem.

LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used.

On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart < emmanuel@engelhart.org> wrote:

...
On 24/09/2011 14:24, Christian Pühringer wrote:

...
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is

apparently about

...
1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the

complete

...
cluster.

Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.

Emmanuel

dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

-- Best,

Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.

dev-l mailing listdev-l@openzim.orghttps://intern.openzim.org/mailman/listinfo/dev-l

-- Best, Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.

Christian Pühringer

6 Oct 6 Oct

11:01 p.m.

New subject: [openZIM dev-l] [Wikitech-l] Phonegap and Wikimedia mobile apps

Hi Arunesh,

I'd need something like findByTitle(namespace, title) in zimlib: It returns an iterator pointing to the lexicographically next article. I'd like to use this to implement an auto suggest feature (Same as in online wikipedia search box). Note: While for auto suggest iterating forward is necessary, for other features it would be good if it is also possible to iterate backward (same as in zimlib).

One other thing which may be currently missing is support for title and url search/get article functionality. It looks to me the right now everything operates on urls and not on titles. For searching it is necessary to also have title based functions. (Like findByTitle). Note that while currently in most zimfiles urls have the form 'Namespace'/'Title', this is not guaranteed, the url could be completely different from the title.

Regarding performance, have you missed Brion's and my results posted in this thread? (They may be only in the wikitech-l@lists.wikimedia.org and not in the dev-l@openzim.org list)

Best regards, Christian

Am 05.10.2011 08:22, schrieb Arunesh Mathur:

...

Actually I plan on adding more features if required. Do you think there are any missing, when compared to zimlib? If yes, could you list them?

Have you tested zimreader-java on an Android phone? How is it's performance?

On Mon, Oct 3, 2011 at 11:10 PM, Christian Pühringer <cip@gmx.at mailto:cip@gmx.at> wrote:

Hi Arunesh,

Firstly, thank you again for doing the java port.

I've one question regarding the features of the port:
Is there or do you plan to add a feature for iterating over the index?
(Similar to find/findByTitle in C++ zimlib)
This would be required to add some auto complete feature to the phonegap app.

Best regards,
Christian

Am 24.09.2011 14 <tel:24.09.2011%2014>:42, schrieb Arunesh Mathur:

...

Yes, images are not compressed in ZIM files, as Emmanuel pointed out.

Also, I have tried decompressing a LZMA compressed file on an Android
phone (HTC Wildfire to be precise), and the decompression speed is not a
problem.

LZMA-Java is under frequent development by Lasse Collin, so we should
make sure that the latest code is used.

On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart
<emmanuel@engelhart.org <mailto:emmanuel@engelhart.org>> wrote:

    On 24/09/2011 14:24, Christian Pühringer wrote:
    > The JAVA liblzma performance is pretty bad: To increase efficiency of
    > compression in the zim-format articles (and also all
    > other data like images) are stored in clusters. Cluster size is
    apparently about
    > 1 MB. This implies that loading an article
    > which is stored at the end of a cluster involves decompressing the
    complete
    > cluster.

    Images should not be compressed in ZIM files for the obvious reasons
    they mainly are already compressed. This is the case for all ZIM files I
    made. As far as I know this is also the case for Mediawiki:Collection
    build ZIM files.

    Emmanuel

    _______________________________________________
    dev-l mailing list
    dev-l@openzim.org <mailto:dev-l@openzim.org>
    https://intern.openzim.org/mailman/listinfo/dev-l

-- 
Best,

Arunesh Mathur
IV year, Undergraduate,
Department of Computer Science and Engineering,
National Institute of Technology Karnataka Surathkal,
India.

_______________________________________________
dev-l mailing list
dev-l@openzim.org  <mailto:dev-l@openzim.org>
https://intern.openzim.org/mailman/listinfo/dev-l

-- Best,

Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.

Arunesh Mathur

7 Oct 7 Oct

12:39 p.m.

New subject: [openZIM dev-l] [Wikitech-l] Phonegap and Wikimedia mobile apps

Hi Christian, Sure, that seems easy enough to implement. I'll get that feature up and running soon.

Yes, most of the functions are based on title, I did that because I didn't find any difference between the two when I browsed around 5-6 ZIM files. But as you say, I'll add their URL counterparts as well. I'll also add the documentation-gen module to the repository.

Hmm, I'm not subscribed to that mailing list. I'll go through the thread that discusses performance.

-Arunesh Mathur

On Fri, Oct 7, 2011 at 2:31 AM, Christian Pühringer cip@gmx.at wrote:

...

Hi Arunesh,

I'd need something like findByTitle(namespace, title) in zimlib: It returns an iterator pointing to the lexicographically next article. I'd like to use this to implement an auto suggest feature (Same as in online wikipedia search box). Note: While for auto suggest iterating forward is necessary, for other features it would be good if it is also possible to iterate backward (same as in zimlib).

One other thing which may be currently missing is support for title and url search/get article functionality. It looks to me the right now everything operates on urls and not on titles. For searching it is necessary to also have title based functions. (Like findByTitle). Note that while currently in most zimfiles urls have the form 'Namespace'/'Title', this is not guaranteed, the url could be completely different from the title.

Regarding performance, have you missed Brion's and my results posted in this thread? (They may be only in the wikitech-l@lists.wikimedia.org and not in the dev-l@openzim.org list)

Best regards, Christian

Am 05.10.2011 08:22, schrieb Arunesh Mathur:

Hi Christian, You can look at getDirectoryInfo(String articleName, char namespace) in ZimReader.java This'll return null if there is the article doesn't exist, else the DirectoryEntry object. (Is this what you were looking for?)

Actually I plan on adding more features if required. Do you think there are any missing, when compared to zimlib? If yes, could you list them?

Have you tested zimreader-java on an Android phone? How is it's performance?

On Mon, Oct 3, 2011 at 11:10 PM, Christian Pühringer cip@gmx.at wrote:

...
Hi Arunesh,

Firstly, thank you again for doing the java port.

I've one question regarding the features of the port: Is there or do you plan to add a feature for iterating over the index? (Similar to find/findByTitle in C++ zimlib) This would be required to add some auto complete feature to the phonegap app.

Best regards, Christian

Am 24.09.2011 14:42, schrieb Arunesh Mathur:

Yes, images are not compressed in ZIM files, as Emmanuel pointed out.

Also, I have tried decompressing a LZMA compressed file on an Android phone (HTC Wildfire to be precise), and the decompression speed is not a problem.

LZMA-Java is under frequent development by Lasse Collin, so we should make sure that the latest code is used.

On Sat, Sep 24, 2011 at 6:00 PM, Emmanuel Engelhart < emmanuel@engelhart.org> wrote:

...
On 24/09/2011 14:24, Christian Pühringer wrote:

...
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is

apparently about

...
1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the

complete

...
cluster.

Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.

Emmanuel

dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

-- Best,

Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.

dev-l mailing listdev-l@openzim.orghttps://intern.openzim.org/mailman/listinfo/dev-l

-- Best,

Arunesh Mathur IV year, Undergraduate, Department of Computer Science and Engineering, National Institute of Technology Karnataka Surathkal, India.

Christian Pühringer

24 Sep 24 Sep

2:49 p.m.

New subject: [openZIM dev-l] [Wikitech-l] Phonegap and Wikimedia mobile apps

Hi Emanuel,

Makes sense that images are not compressed in zim file again, thanks for the clarification. In particular for windows mobile this increases the probability that with reducing the cluster size a sufficient performance is achieved (i.p. if there was a better way to display images than in android, but I haven't looked into this). For android I still expect that it it will be better to use a native library.

Christian Am 24.09.2011 14:30, schrieb Emmanuel Engelhart:

...

On 24/09/2011 14:24, Christian Pühringer wrote:

...
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is apparently about 1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the complete cluster.

Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.

Emmanuel

Tomasz Finc

11:14 p.m.

New subject: [openZIM dev-l] [Wikitech-l] Phonegap and Wikimedia mobile apps

Awesome work Christian. Where can we find your code to test out?

If you don't have a place then I'll carve one out in our depots. On Sep 24, 2011 5:57 AM, "Christian Pühringer" christian@puehringer.net wrote:

...

Hi Emanuel,

Makes sense that images are not compressed in zim file again, thanks for

the

...

clarification. In particular for windows mobile this increases the probability that with reducing the cluster size a sufficient performance is achieved (i.p. if there was a better way to

display

...

images than in android, but I haven't looked into this). For android I still expect that it it will be better to use a native

library.

...

Christian Am 24.09.2011 14:30, schrieb Emmanuel Engelhart:

...
On 24/09/2011 14:24, Christian Pühringer wrote:

...
The JAVA liblzma performance is pretty bad: To increase efficiency of compression in the zim-format articles (and also all other data like images) are stored in clusters. Cluster size is

apparently about

...

...
...
1 MB. This implies that loading an article which is stored at the end of a cluster involves decompressing the

complete

...

...
...
cluster.

Images should not be compressed in ZIM files for the obvious reasons they mainly are already compressed. This is the case for all ZIM files I made. As far as I know this is also the case for Mediawiki:Collection build ZIM files.

Emmanuel

dev-l mailing list dev-l@openzim.org https://intern.openzim.org/mailman/listinfo/dev-l

4814

Age (days ago)

4827

Last active (days ago)

offline-l@lists.wikimedia.org

7 comments

5 participants

tags (0)

participants (5)

Arunesh Mathur
Christian Pühringer
Christian Pühringer
Emmanuel Engelhart
Tomasz Finc