On Tue, Mar 12, 2013 at 11:27 AM, Jason Skomorowski
Thanks for the substantial contribution. Better tools
to share Wikipedia
have the potential to help many of the billions of people without reliable
access to the Internet have at least this one repository of knowledge at
their disposal. Important work this.
On 13-01-28 09:30 PM, gnosygnu wrote:
XOWA also has the ability to work with the full tarball dumps (hence,
dispensing with an always online connection). The tarball dumps are
quite big though (English Wikipedia is 2.2 TB), so I don't know how
many people would have the patience to download the entire set.
Basically I wanted an offline reader that would also show images. The
on-demand download allows users to download images for articles they
are interested in. If they want all the images offline, then they have
the option of downloading the tarball dumps. I'm still looking at an
intermediate option between the two.
Is there an option to use a path on the filesystem rather than a tarball?
This would be a pretty huge feature for two reasons:
* in order to sync only new files from
one needs to have
the images extracted. Extracting multiple terabytes and recreating a tarball
requires a lot of extra time and disk space
* filesystem paths can be symlinked so that we can split this (very large)
collection across drives
Sorry, I should have been more specific with my description. XOWA
works off the files/directories from the extracted tarballs, not the
For example, you can extract "enwiki-20121201-remote-media-1.tar" to
"/home/". It will generate files like
"/home/wikipedia/commons/7/70/A.png". Note that the file paths in the
tarball are very similar to those on the WMF server: in this case,
"http://upload.wikimedia.org/wikipedia/commons/7/70/A.png". XOWA can
then be redirected to use the local filesystem so that if a page with
[[File:A.png|thumb]] is opened, it will create the thumb from there
(instead of downloading it from upload.wikimedia.org
). If you are
doing further syncing, the new files can be placed in
"/home/wikipedia/commons" root, and as long as they match WMF's style,
XOWA will pick them up.
This is still not an ideal solution as a full tarball set still needs
to be downloaded at one point in time -- which, for English Wikipedia,
is 2.2 TB. I am looking at generating a "thumbs-only" archive which
will bring it down to about 100 GB. I'd still need a way to distribute
it, but will probably try torrenting first.
Let me know if this is enough info or if you were referring to something else.