On Tue, Mar 12, 2013 at 11:27 AM, Jason Skomorowski jason@skomorowski.net wrote:
Thanks for the substantial contribution. Better tools to share Wikipedia have the potential to help many of the billions of people without reliable access to the Internet have at least this one repository of knowledge at their disposal. Important work this. On 13-01-28 09:30 PM, gnosygnu wrote:
[*snip*]
XOWA also has the ability to work with the full tarball dumps (hence, dispensing with an always online connection). The tarball dumps are quite big though (English Wikipedia is 2.2 TB), so I don't know how many people would have the patience to download the entire set.
Basically I wanted an offline reader that would also show images. The on-demand download allows users to download images for articles they are interested in. If they want all the images offline, then they have the option of downloading the tarball dumps. I'm still looking at an intermediate option between the two.
Is there an option to use a path on the filesystem rather than a tarball? This would be a pretty huge feature for two reasons:
- in order to sync only new files from
http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/ one needs to have the images extracted. Extracting multiple terabytes and recreating a tarball requires a lot of extra time and disk space
- filesystem paths can be symlinked so that we can split this (very large)
collection across drives
Sorry, I should have been more specific with my description. XOWA works off the files/directories from the extracted tarballs, not the tarball.
For example, you can extract "enwiki-20121201-remote-media-1.tar" to "/home/". It will generate files like "/home/wikipedia/commons/7/70/A.png". Note that the file paths in the tarball are very similar to those on the WMF server: in this case, "http://upload.wikimedia.org/wikipedia/commons/7/70/A.png". XOWA can then be redirected to use the local filesystem so that if a page with [[File:A.png|thumb]] is opened, it will create the thumb from there (instead of downloading it from upload.wikimedia.org). If you are doing further syncing, the new files can be placed in "/home/wikipedia/commons" root, and as long as they match WMF's style, XOWA will pick them up.
This is still not an ideal solution as a full tarball set still needs to be downloaded at one point in time -- which, for English Wikipedia, is 2.2 TB. I am looking at generating a "thumbs-only" archive which will bring it down to about 100 GB. I'd still need a way to distribute it, but will probably try torrenting first.
Let me know if this is enough info or if you were referring to something else.
Thanks.