Kevin,
You're right that our focus has often been on DVD-sized collections, but bigger collections are available! In some ways bigger is easier because you bypass the whole (very complex!) issue of content selection - but also curation becomes a problem as you scale up..
Are you planning on publishing the entire English Wikipedia? I believe there are such ZIM collections available, but some may be a few months old. More important, they have not been checked against vandalism, which means that some articles may be pure obscenities, etc. (I found some disgusting examples when preparing the Version 0.7 collection.) How are you doing your revisionID selection? Are you using the WikiTrust system we use on the 1.0 collections? If you are producing a vandalism-checked version of the entire English Wikipedia, or you have developed your own tools, we would very much like to share these things with others working in the area.
With large storage becoming cheaper, it should now be feasible to put the entire English Wikipedia onto a hard drive or similar; our subset collections are designed for cases where distribution will be via DVD or flash drive (as with the Version 0.7 and 0.8 collections). In the case of Wikipedia for Schools, the producers wanted a collection that was hand-checked for vandalism and child-appropriateness, which is why it is only around 6000 articles (but also why it's so popular!).
To avoid the template problems, you should definitely consider the ZIM format, which is designed for this purpose (making content readable offline). As I understand it, you don't need to tie that to a particular reader (Kiwix, Okawix, etc), though these systems are storage-efficient and each represents several years of optimisation work.
If you're looking for HTML versions, I believe these are available, and I think Emmanuel has produced such things for Wizzy to help him with his work: http://blog.wizzy.com/post/Kiwix-install-at-Kwena-Malapo-school-Johannesberg Let us know specifically the end format you prefer. (BTW, I'm a collections curator, not a tech person, so please forgive me if I've made any technical errors!)
Hope this helps. Good luck! Martin (User:Walkerma, English Wikipedia 1.0 team)
Martin A. Walker Department of Chemistry SUNY College at Potsdam Potsdam, NY 13676 USA +1 (315) 267-2271
Kevin Clark wrote:
What you try to achieve already exists IMO.
I beg to differ. The existing solutions I've seen that provide only a small subset of the data and force the user to use a completely different interface. We want the user experience to be as close to the real thing as possible i.e. all the Wikipedia pages are available and the user can choose which client (browser) they use to access the data.
Kevin
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l