I'm hoping this forum is a proper place for this message. My apologies if it is not.
XOWA is a new open-source offline Wikipedia app which I wrote in my spare time over the past 20 months. You can view screenshots here ( http://sourceforge.net/projects/xowa ) or here ( http://imgur.com/a/OydBK/layout/blog ). Its notable features are: * It is a full-fledged offline HTML reader for English Wikipedia (or any other Wikimedia Foundation wiki) * It works directly against the data-dump files (such as those at http://dumps.wikimedia.org/backup-index.html) * It downloads images from Wikimedia on demand (or locally from a Wikimedia image tarball at http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/20130101/) * It can edit offline articles (to update content, correct vandalism or just to experiment) * It navigates between offline wikis (Click on "Look up this word in Wiktionary" and it will open your offline version of Wiktionary)
I'm posting here because I'm looking for feedback. So far, I have a few good users, and one exceptional user from German Wikipedia (Schnark: who added MathJax and sortable/collapsible tables). I'm currently looking for others who will try offline images, test English Wikipedia, or review wikis in other languages.
The most recent version of XOWA is contained in one zip file. It takes one click and about 3 minutes to download Simple Wikipedia for offline use. XOWA also has a page that lists 596 other wikis that can be set up with one click. English Wikipedia is the largest and it will take between 4 and 5 hours, but most of that time is to download and unzip the 9 GB .bz2 file.
If you're interested and have some time, please give XOWA a try. I post Windows-specific instructions below. XOWA also works on Linux (and possibly Mac OS X), so if you want to run on that OS, the instructions are similar except you will need to run "java -jar xowa_linux.jar" or "java -d32 -DstartOnFirstThread -jar xowa_macosx.jar"
If you have questions or comments or problems, please post and I will reply.
Thanks for your attention.
------------ Instructions ------------
Requirements: * 1 GB free space: XOWA (5 MB) + XUL Runner (32 MB) + Simple Wikipedia (500 MB) + ImageMagick/Inkscape (400 MB) * Windows XP or higher * Java 1.6 or Java 1.7. If Java is not installed on your machine, you can get it from http://www.java.com/en/download
Steps * Download xowa_app_windows_v0.2.2.0.zip from http://sourceforge.net/projects/xowa/files/v0.2.2 * Unzip the file to C:\xowa. When you are done, you should have a file called C:\xowa\xowa.exe as well as many other files and directories * Double-click C:\xowa\xowa.exe. The app should launch and the XOWA Main Page should load. * Click the link for "Set up Simple Wikipedia". Wait about 3 minutes for the wiki to download and install. When it is finished, it will open Simple Wikipedia * Browse Simple Wikipedia. When you are done, click on the Main Page link under XOWA in the left hand navigation bar. * Click the link for "Set up images (Windows)". Wait about 3 minutes for the image programs to install. * Click on Page history on the left hand nav * Select Main_Page for simple.wikipedia.org. Images will now download automatically for any page you visit. Here are some example pages to visit (you can copy and paste these into the address bar): "simple.wikipedia.org/wiki/World History", "simple.wikipedia.org/wiki/Chess", "simple.wikipedia.org/wiki/Gothic architecture", "simple.wikipedia.org/wiki/Saturn (planet)" * If you want to try other wikis, click on "list of data dumps" on the XOWA Main Page
Very interesting concept. I assume you've seen [ http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team], which is closely related. Couple of questions:
* It can edit offline articles — to what end? Does it push these articles when the device goes online? How does it handle edit conflicts? Which version is the user editing (i.e., for high traffic articles, it's rather impractical to be editing a database dump that is ~days/weeks old, considering all the changes made since)?
* It downloads images on demand — so someone needs to browse to a page before downloading the images for it? How long are these images cached? Adjustable?
These are just a couple of thoughts that popped into my mind at first. It might be worthwhile talking to the (English) WP 1.0 folks (we can meet on IRC if you want, let me know) about integrating this with that project... adding a link to a XOWA install with just the pages selected for WP 1.0? I can see the benefit of having a version of WP 1.0 that is cross-platform/doesn't require addition software/self-contained...at least as a first use for your work. See [ http://en.wikipedia.org/wiki/Wikipedia:Version_0.8/downloads]
Thanks, Theo
See also: * [http://library.kiwix.org/wikipedia_en_wp1/]
On Mon, Jan 28, 2013 at 5:47 PM, gnosygnu gnosygnu@gmail.com wrote:
I'm hoping this forum is a proper place for this message. My apologies if it is not.
XOWA is a new open-source offline Wikipedia app which I wrote in my spare time over the past 20 months. You can view screenshots here ( http://sourceforge.net/projects/xowa ) or here ( http://imgur.com/a/OydBK/layout/blog ). Its notable features are:
- It is a full-fledged offline HTML reader for English Wikipedia (or
any other Wikimedia Foundation wiki)
- It works directly against the data-dump files (such as those at
http://dumps.wikimedia.org/backup-index.html)
- It downloads images from Wikimedia on demand (or locally from a
Wikimedia image tarball at http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/20130101/ )
- It can edit offline articles (to update content, correct vandalism
or just to experiment)
- It navigates between offline wikis (Click on "Look up this word in
Wiktionary" and it will open your offline version of Wiktionary)
I'm posting here because I'm looking for feedback. So far, I have a few good users, and one exceptional user from German Wikipedia (Schnark: who added MathJax and sortable/collapsible tables). I'm currently looking for others who will try offline images, test English Wikipedia, or review wikis in other languages.
The most recent version of XOWA is contained in one zip file. It takes one click and about 3 minutes to download Simple Wikipedia for offline use. XOWA also has a page that lists 596 other wikis that can be set up with one click. English Wikipedia is the largest and it will take between 4 and 5 hours, but most of that time is to download and unzip the 9 GB .bz2 file.
If you're interested and have some time, please give XOWA a try. I post Windows-specific instructions below. XOWA also works on Linux (and possibly Mac OS X), so if you want to run on that OS, the instructions are similar except you will need to run "java -jar xowa_linux.jar" or "java -d32 -DstartOnFirstThread -jar xowa_macosx.jar"
If you have questions or comments or problems, please post and I will reply.
Thanks for your attention.
Instructions
Requirements:
- 1 GB free space: XOWA (5 MB) + XUL Runner (32 MB) + Simple Wikipedia
(500 MB) + ImageMagick/Inkscape (400 MB)
- Windows XP or higher
- Java 1.6 or Java 1.7. If Java is not installed on your machine, you
can get it from http://www.java.com/en/download
Steps
- Download xowa_app_windows_v0.2.2.0.zip from
http://sourceforge.net/projects/xowa/files/v0.2.2
- Unzip the file to C:\xowa. When you are done, you should have a file
called C:\xowa\xowa.exe as well as many other files and directories
- Double-click C:\xowa\xowa.exe. The app should launch and the XOWA
Main Page should load.
- Click the link for "Set up Simple Wikipedia". Wait about 3 minutes
for the wiki to download and install. When it is finished, it will open Simple Wikipedia
- Browse Simple Wikipedia. When you are done, click on the Main Page
link under XOWA in the left hand navigation bar.
- Click the link for "Set up images (Windows)". Wait about 3 minutes
for the image programs to install.
- Click on Page history on the left hand nav
- Select Main_Page for simple.wikipedia.org. Images will now download
automatically for any page you visit. Here are some example pages to visit (you can copy and paste these into the address bar): "simple.wikipedia.org/wiki/World History", "simple.wikipedia.org/wiki/Chess", "simple.wikipedia.org/wiki/Gothic architecture", "simple.wikipedia.org/wiki/Saturn (planet)"
- If you want to try other wikis, click on "list of data dumps" on the
XOWA Main Page
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l
Thanks for the response. I reply inline below. Let me know if I missed anything.
I assume you've seen [http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team], which is closely related.
I've looked at it briefly, but I was more interested in something where the content wasn't curated. Since Wikimedia distributes complete data dumps, I wanted something that could render it entirely (and theoretically the entire site).
- It can edit offline articles — to what end? Does it push these articles
when the device goes online? How does it handle edit conflicts?...
Valid points. The edit ability is not meant for live/real-time editing. I wanted to give the user the ability to correct articles that were sitting on their machine -- especially to remove vandalism. This may be a low-frequency usage, but I think it's a good feature to have.
- It downloads images on demand — so someone needs to browse to a page
before downloading the images for it?
Yes, images are downloaded from within the app. For example, if you open up XOWA and browse to simple.wikipedia.org/wiki/Chess, it will open the offline article and then go online and download all the images for it.
XOWA also has the ability to work with the full tarball dumps (hence, dispensing with an always online connection). The tarball dumps are quite big though (English Wikipedia is 2.2 TB), so I don't know how many people would have the patience to download the entire set.
Basically I wanted an offline reader that would also show images. The on-demand download allows users to download images for articles they are interested in. If they want all the images offline, then they have the option of downloading the tarball dumps. I'm still looking at an intermediate option between the two.
How long are these images cached? Adjustable?
Right now there is no adjustable cache. Every downloaded image gets added to XOWA's library. If users are interested in limiting the size of the downloaded images, I can add it in later as an option.
These are just a couple of thoughts that popped into my mind at first. It might be worthwhile talking to the (English) WP 1.0 folks (we can meet on IRC if you want, let me know) about integrating this with that project...
Sure, I would be interested. Let me know if they think the app is at all a fit.
adding a link to a XOWA install with just the pages selected for WP 1.0?
Well, XOWA is meant to show all the pages in a dump, though theoretically I can add a filter that limits it to those pages selected for WP 1.0
I can see the benefit of having a version of WP 1.0 that is cross-platform/doesn't require addition software/self-contained...at least as a first use for your work.
Yup. That's what I was aiming for: something self-contained and portable but also complete and offline.
See also:
It looks good. This looks similar to the kiwix offline English Wikipedia Dec-2010 45.000 version. I assume this is the online version of WP 1.0?
Thanks.
Thanks for the substantial contribution. Better tools to share Wikipedia have the potential to help many of the billions of people without reliable access to the Internet have at least this one repository of knowledge at their disposal. Important work this.
On 13-01-28 09:30 PM, gnosygnu wrote:
- It downloads images on demand — so someone needs to browse to a page
before downloading the images for it?
Yes, images are downloaded from within the app. For example, if you open up XOWA and browse to simple.wikipedia.org/wiki/Chess, it will open the offline article and then go online and download all the images for it.
XOWA also has the ability to work with the full tarball dumps (hence, dispensing with an always online connection). The tarball dumps are quite big though (English Wikipedia is 2.2 TB), so I don't know how many people would have the patience to download the entire set.
Basically I wanted an offline reader that would also show images. The on-demand download allows users to download images for articles they are interested in. If they want all the images offline, then they have the option of downloading the tarball dumps. I'm still looking at an intermediate option between the two.
Is there an option to use a path on the filesystem rather than a tarball? This would be a pretty huge feature for two reasons:
* in order to sync only new files from http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/ one needs to have the images extracted. Extracting multiple terabytes and recreating a tarball requires a lot of extra time and disk space
* filesystem paths can be symlinked so that we can split this (very large) collection across drives
Best, Jason
On Tue, Mar 12, 2013 at 11:27 AM, Jason Skomorowski jason@skomorowski.net wrote:
Thanks for the substantial contribution. Better tools to share Wikipedia have the potential to help many of the billions of people without reliable access to the Internet have at least this one repository of knowledge at their disposal. Important work this. On 13-01-28 09:30 PM, gnosygnu wrote:
[*snip*]
XOWA also has the ability to work with the full tarball dumps (hence, dispensing with an always online connection). The tarball dumps are quite big though (English Wikipedia is 2.2 TB), so I don't know how many people would have the patience to download the entire set.
Basically I wanted an offline reader that would also show images. The on-demand download allows users to download images for articles they are interested in. If they want all the images offline, then they have the option of downloading the tarball dumps. I'm still looking at an intermediate option between the two.
Is there an option to use a path on the filesystem rather than a tarball? This would be a pretty huge feature for two reasons:
- in order to sync only new files from
http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/ one needs to have the images extracted. Extracting multiple terabytes and recreating a tarball requires a lot of extra time and disk space
- filesystem paths can be symlinked so that we can split this (very large)
collection across drives
Sorry, I should have been more specific with my description. XOWA works off the files/directories from the extracted tarballs, not the tarball.
For example, you can extract "enwiki-20121201-remote-media-1.tar" to "/home/". It will generate files like "/home/wikipedia/commons/7/70/A.png". Note that the file paths in the tarball are very similar to those on the WMF server: in this case, "http://upload.wikimedia.org/wikipedia/commons/7/70/A.png". XOWA can then be redirected to use the local filesystem so that if a page with [[File:A.png|thumb]] is opened, it will create the thumb from there (instead of downloading it from upload.wikimedia.org). If you are doing further syncing, the new files can be placed in "/home/wikipedia/commons" root, and as long as they match WMF's style, XOWA will pick them up.
This is still not an ideal solution as a full tarball set still needs to be downloaded at one point in time -- which, for English Wikipedia, is 2.2 TB. I am looking at generating a "thumbs-only" archive which will bring it down to about 100 GB. I'd still need a way to distribute it, but will probably try torrenting first.
Let me know if this is enough info or if you were referring to something else.
Thanks.
Hi,
gnosygnu gnosygnu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org writes:
Requirements:
- 1 GB free space: XOWA (5 MB) + XUL Runner (32 MB) + Simple Wikipedia
(500 MB) + ImageMagick/Inkscape (400 MB)
- Windows XP or higher
My understanding is that it works also on GNU/Linux so perhaps this requirement should be updated.
Also, the webpage mentions XOWA as *the* offline reader, but there are others (like Kiwix), maybe the webpage could mention them so that users can choose which one fits their needs best.
Congrats for this work!
My understanding is that it works also on GNU/Linux so perhaps this requirement should be updated.
Thanks for the note. Yes, it runs on GNU/Linux (and probably Mac OS X). The instructions in my original email were Windows-centric. I'll make the designation clearer if I resend them again.
Also, the webpage mentions XOWA as *the* offline reader, but there are others (like Kiwix), maybe the webpage could mention them so that users can choose which one fits their needs best.
I didn't intend for *the* to be a point of comparison. Wikipedia's current byline is "The free encyclopedia". Since most of XOWA emulates Wikipedia's look, I just paralleled the phrase.
That said, I don't want one word to be a point of misunderstanding. I've updated the website to read "A free, open-source Wikipedia app".
Congrats for this work!
Thanks.
Dear All, This really deserves praise. I could use it in few minutes. A simple but great application.
The developer deserves praise. I was pretty sick of the KIWIX, for so many reasons...
Good. Keep it up. Yours, MK Yadava --------------------------- Please exchange editable Office documents only in ODF Format. No other format is acceptable. Sending such information in other formats is at your own risk. Please be assured that non ODF formats will not be opened.
To get a free editor supporting ODF, please visit http://www.openoffice.org/ --------------------------------------- Visit my blogs at http://letsbetteroursociety.blogspot.com/ ------------------------------------------------
My understanding is that it works also on GNU/Linux so perhaps this requirement should be updated.
Thanks for the note. Yes, it runs on GNU/Linux (and probably Mac OS X). The instructions in my original email were Windows-centric. I'll make the designation clearer if I resend them again.
Also, the webpage mentions XOWA as *the* offline reader, but there are others (like Kiwix), maybe the webpage could mention them so that users can choose which one fits their needs best.
I didn't intend for *the* to be a point of comparison. Wikipedia's current byline is "The free encyclopedia". Since most of XOWA emulates Wikipedia's look, I just paralleled the phrase.
That said, I don't want one word to be a point of misunderstanding. I've updated the website to read "A free, open-source Wikipedia app".
Congrats for this work!
Thanks.
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l