Kevin,
You're right that our focus has often been on DVD-sized collections, but
bigger collections are available! In some ways bigger is easier because
you bypass the whole (very complex!) issue of content selection - but also
curation becomes a problem as you scale up..
Are you planning on publishing the entire English Wikipedia? I believe
there are such ZIM collections available, but some may be a few months
old. More important, they have not been checked against vandalism, which
means that some articles may be pure obscenities, etc. (I found some
disgusting examples when preparing the Version 0.7 collection.) How are
you doing your revisionID selection? Are you using the WikiTrust system
we use on the 1.0 collections? If you are producing a vandalism-checked
version of the entire English Wikipedia, or you have developed your own
tools, we would very much like to share these things with others working
in the area.
With large storage becoming cheaper, it should now be feasible to put the
entire English Wikipedia onto a hard drive or similar; our subset
collections are designed for cases where distribution will be via DVD or
flash drive (as with the Version 0.7 and 0.8 collections). In the case of
Wikipedia for Schools, the producers wanted a collection that was
hand-checked for vandalism and child-appropriateness, which is why it is
only around 6000 articles (but also why it's so popular!).
To avoid the template problems, you should definitely consider the ZIM
format, which is designed for this purpose (making content readable
offline). As I understand it, you don't need to tie that to a particular
reader (Kiwix, Okawix, etc), though these systems are storage-efficient
and each represents several years of optimisation work.
If you're looking for HTML versions, I believe these are available, and I
think Emmanuel has produced such things for Wizzy to help him with his
work:
http://blog.wizzy.com/post/Kiwix-install-at-Kwena-Malapo-school-Johannesberg
Let us know specifically the end format you prefer. (BTW, I'm a
collections curator, not a tech person, so please forgive me if I've made
any technical errors!)
Hope this helps. Good luck!
Martin (User:Walkerma, English Wikipedia 1.0 team)
Martin A. Walker
Department of Chemistry
SUNY College at Potsdam
Potsdam, NY 13676 USA
+1 (315) 267-2271
Kevin Clark wrote:
What you try
to achieve already exists IMO.
I beg to differ. The existing solutions I've seen that provide only a
small subset of the data and force the user to use a completely different
interface. We want the user experience to be as close to the real thing as
possible i.e. all the Wikipedia pages are available and the user can
choose which client (browser) they use to access the data.
Kevin
_______________________________________________
Offline-l mailing list
Offline-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/offline-l