On Mon, 16 Nov 2009, Emmanuel Engelhart wrote:
Yes, by building the ZIM file, I add to "the
already there keywords" the
title of the redirects pages pointing to this page.
What are "the already there keywords" ?
I'm not enthusiastic about dumping category
pages... but this is only a
part of the issue. The other part is that I have no method to know,
given a list of articles, which categories I have to integrate in the
final dump! do you?
You could do it iteratively ?
You must have a method of unlinking 'red-linked' pages - links in
articles that point to pages not in our collection.
Include all categories, remove those that point to zero or one
article in our collection.
You can leave all categories in the article, just unlink the ones
that do not 'make the cut'.
My point is - if references make up half the text dump, categories
surely deserve to be in there.
Re: References - could you perhaps link to
http://en.wikipedia.org/wiki/Gamma-ray_burst#References ?
Then, if you *do* have internet access, you can get to the refs ?
(Still not very satisfactory - the correspondence between the
article ref and the actual one is lost - you have to look for it).
I appreciate all the work that has already gone in to this, but
I see a lot of effort going into one or two zim files, and not
enough on the process - where you could create another zim file
which is just chemistry-related, or Africa-related, or top-1000
articles.
Cheers, Andy!