Hello all,
I am working on this project from past few months
http://code.google.com/p/offline-wikipedia/, i have presented a talk related
to this in freed.in 09 too.
My aim with this project is:
- To create DVD distribution for English wikipedia up to the standards
that it can make match to http://download.wikimedia.org/dvd.html.
- Making it easy to install and usable straight from DVD.
Target Audience are:
- Those who don't have Internet access.
- Those who want to access content to wikipedia irrespective of Internet
connection.
- Those who use existing proprietary encyclopedias available in market.
Present status:
- Apart from source code hosted at google, whole setup is also available
at http://92.243.5.147/offline-wiki there are two parts, for complete
English wikipedia you have to get blocks.tgz and offline-wikipedia.tgz,
there are instructions in README file available there. There is also also a
small prototype sample.tar.bz2 available, in case one wants to check the
quality of work.
- I am following approach taken by
http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html<http://users.softlab.ece.ntua.gr/%7Ettsiod/buildWikipediaOffline.html>,
but with some difference, i am using python to convert wiki-text to html and
django for server.
- As of now, with XML dumps provided by media-wiki, last year's October
dump was 4.1G, i have csv files to locate articles inside those dumps of
size ~300M, and small django configuration to access and convert the
articles to html, and all this fits into a DVD.
Issues at hand:
- My python parser to create html out of wiki-text if not perfect, i can
replace it with something which is better and existing, but am yet to find
that.
- To access articles faster i am breaking single bz2 using bz2recover,
and it gives me 20k odd files for English content, i am trying to avoid
those many files and not compromising with the speed of browsing the
articles.
- We can replace django server with something more light and simple given
they don't have dependency cycles and making it hard to access/use/install.
- March 09 English content is 4.6G making things more tight
- It is only text content excluding multimedia, pictures(which are
improtant part and cant be neglected).
Target:
- Make it updatebale.
- To make it editable.
- To manage different categories of articles, and segregation based on
that to make refined and better education/learning tool.
There are other issues too, i know of other attempts like wiki-taxi,
wikipedia-dumpreader, and am trying to patch things to get better results.
But don't know why those present parser/attempts never made to that DVD
distribution list. I am working to improve it, in the meanwhile any
suggestion, feedback, contribution are most welcome.
--
Regards
Shantanu Choudhary
dear w-o-r, there are some recent discussions about offline readers
and writers, for mediawiki and other wikis ('mikmik' is a proposed
platform for simple distributed offline editing, for instance, with
plugins for different wiki architectures), on this wikireader list:
http://lists.laptop.org/listinfo/wikireader
just as wiki-offline-reader-l is nominally focused on offline
wikipedia readers, the wikireader lits is nominally focused on
wikireaders for offline students and schools using XOs ; but there is
significant overlap.
you're all welcome to join the discussion there.
regards,
SJ
Hi
This mailing-list is unfortunately not active, but is a good mirror to
my opinion of the static content activity at the foundation level.
Consequently, we all cook on his side his software to achieve to
publish on DVD.
But, the french and english speaking communities work now together and
use and will use the same software to publish their WP1 selections on
DVD.
This software is called kiwix : http://www.kiwix.org
It is a small software, an embeded browser with an efficient search
engine, developed by a french company.
A first test-release was done in February and you can buy or download
the result on http://www.wikipediaondvd.com
This company will support the further devs. and commercializations. It
provides also, in many languages, the search engine for Wikipedia
content Wikiwix: http://www.wikiwix.com
We believe that this software has a future and are ready to help
everyone with a serious publication project.
Best regards
Kelson
A friend has a smartphone and runs one of the offline availiable
versions (http://de.wikipedia.org/wiki/Wikipedia:Unterwegs) on it. I
just have a "normal" JAVA-enabled mobile phone and I want to use
wikipedia there as well. So I thought about unsing a J2ME MIDlet to
read wikipedia on my cellphone offline. I was searching for such a
tool and after finding nothing, I thought about creating my own one.
But it seems that the solution of the problem is much lousier then the
problem itself.
Have you aver thought about something like this? Does something like a
wikipedia-midelt exist? What could be the chellanges? Where shall I
start?
Some help would be warmly welcome,
Thomas
On 7/232/06, Samuel Klein <meta.sj at gmail.com> wrote:
> Emmanuel :
>
> A simple offline reader ready in time for wikimania would be
> excellent. What language are you working in? Is there a public code
> repository?
>
> Some people at One Laptop per Child are working on an offline
> reader/writer for mediawiki style dumps in python; this will be used
> to read ebooks, other texts, and wiki dumps... perhaps some coding
> collaboration is possible, where the specific specs of different
> projects overlap.
>
> SJ
Hi!
I don't have technical skills (even-english-writing-skills), but want
to share my adventures :-)
Last November i was invited by Gleducar and the organizers of V
Jornadas Regionales de Software Libre, a congress organized by FLOSS
community in Rosario, Argentina, to speak about wikimedia projects.
Gleducar is an argentine association who promotes the collaborative
knowledge construction, using MediaWiki to create free content for
schools [1].
To take advantage of my visit, argentine wikipedians organized a
meetup [2] [3]. We discussed about the project and searched help from
Gleducar and FLOSS community to create a Wikipedia CD version in
spanish. Last year, Chilean and Argentine governments started a new
project called "My First PC", a plan aiming to create a cheaper
personal computer that low-income people could purchase. An OhMyNews
report said [4] :
"The plan, in the same vein as those adopted by India and Malaysia,
which have reported having success with theirs, was not very welcome
in Chile. The reason for the anger is that "My First PC" incorporates
Microsoft Windows XP Starter Edition, making the license price 20
percent of the total price of the computer. Open-source defenders say
that you can get a better operating system for free. "My First PC"
also includes the Encarta Encyclopedia, tacking on another 12 percent.
Critics say users could save their money if they use, for example,
Wikipedia, or search for information on the Internet."
As a response to the bloggers and media support, we thought that could
be a great idea to search a solution to create the spanish CD version.
In the meetup we evaluated the germany version, that I saw at 'mania
05, and we didn't like it. We thought that our solution would not have
to be based in a LAMP-like technology, and we preferred a simple HTML
static dump. Finally, we found Tim Starling's static dumps [5] [6].
Tim's solution is just what we needed, but it doesn't have a search
engine included. That dump can be navigable offline by any web
browser. If you do a wget of that site now, and burn a CD with the
contents, you have a Wikipedia CD version, including the images from
Commons. Tim hasn't made any new dump since November 2005. You can
navigate this dump, for example in french, here [7]. Since then, we
are searching any solutions [8]. We count with Pilaf, the creator of
Live Preview, that Lupin uses in his popups tool, as a technical
member. Also Educalibre, a chilean Gleducar-like project, is working
with us.
Gleducar has made a static dump test and has burn it, with success.
But we are stopped by the search engine! We thought in a javascript
solution, maybe AJAX. I found an online wikipedia browser called
Gollum [9] (GPL, but they don't have a downloadable version) that uses
AJAX for autodiscover articles name in the search field. I like Gollum
and maybe it will be a great tool for a Wikipedia CD version. Now we
have more interest, because argentine FLOSS comunnity is working with
a OLPC laptop prototype, and they are developing social and
educational projects for its use in those laptops. Maybe an OEPC in
spanish could be a great project.
We know that we will need the WMF help at certain moment, and we won't
publish any solution before discussing it with spanish community, SPC
and WMF.
Well, we will continue our searching and will helping you as much as possible.
Kindly regards
[1] http://www.gleducar.org.ar
[1] http://es.wikipedia.org/wiki/Wikipedia:Encuentros/Encuentro_de_Wikipedistas…
[2] http://es.wikipedia.org/wiki/Usuario:Zuirdj/Rosario_2005
[3] http://english.ohmynews.com/ArticleView/article_view.asp?menu=A11100&no=243…
[4] http://meta.wikimedia.org/wiki/Parsers#A_non-parser_dumper
[5] http://meta.wikimedia.org/wiki/Static_dumps
[6] http://static.wikipedia.org/fr/index.html
[7] http://es.wikipedia.org/wiki/Wikipedia:Wikipedia_en_CD
[8] http://gollum.easycp.de/en/0,,00.html
--
Juan David Ruiz
http://es.wikipedia.org/wiki/Usuario:Zuirdj
Hi SJ
> Emmanuel :
>
> A simple offline reader ready in time for wikimania would be
> excellent. What language are you working in? Is there a public code
> repository?
I work on it. I guess a simple viewer with a good looking french dump
of Wikipedia will be done. There is currently no public code
repository.
Unfortunatly, I invested the most of my time to get a good static HTML
dump the last 2 months. But now this problem is more or less solved
for me and I will have more time to focuse me on the soft.
The technologies are :
* Mozilla framework : xulrunner (portable, skinable, easy to modify,
extensible, plugins)
* Xapian search engine
* HTML dump
The soft is currently, not more than a simple HTMLviewer window.
I work to integrate an existing search engine XAPIAN, that's mean
making an XPCOM to embeded the library.
>
> Some people at One Laptop per Child are working on an offline
> reader/writer for mediawiki style dumps in python; this will be used
> to read ebooks, other texts, and wiki dumps... perhaps some coding
> collaboration is possible, where the specific specs of different
> projects overlap.
Yes, could be interesting. I wrote a complete requirement document,
but that's in french.
I proposed that we meet us, all people involved in HTML dumping and
HTML based offline-reader development.
Such a meeting would be very interesting to:
- know more exactly what is needed for such third use of Wikipedia content.
- speak about how to achieve that
- speak about the offline-reader soft... although the dedicated ml is
very quite.... :(
Emmanuel
For everyone information.
I am sure the static content subco could interact with those working on
the wiki-offline reader (and should register the list). Also, Emmanuel
(please meet him) is working a lot to push publishing on fr.
Anthere
-------- Original Message --------
Subject: Hi Everybody
Date: Mon, 12 Jun 2006 21:35:58 +0200
From: Emmanuel Engelhart <emmanuel.engelhart(a)wikimedia.fr>
Organization: Wikimédia France
To: Wiki-offline-reader-l(a)wikimedia.org
CC: Florence Devouard <anthere(a)wikimedia.org>, Delphine Ménard
<dmenard(a)wikimedia.org>, "Łukasz Garczewski" <lgarczewski(a)gmail.com>
Hi,
I'm Emmanuel Engelhart, member of Wikimedia France board.
I aim publishing Wikipedia in French.
We are a few persons who try to manage to do that.
I personally :
- wrote an offline-reader requirement document
- program the scripts to build the document repository
- want to build the index with Xapian (I want to try CLucene too)
I need some help to :
- make a perfect mirror of wikipedia locally, I have a problem to import all
images.
- make a GUI over the document repository & index, I'm not a GUI programer ;
Qt, GTK, XPFE programer needed ;-)
To my opinion it's possible to present a simple offline-reader for Wikimania.
So join Wiki-offline-reader-l(a)wikimedia.org if you're interested.
Best wishes
Emmanuel
Hi,
I'm Emmanuel Engelhart, member of Wikimedia France board.
I aim publishing Wikipedia in French.
We are a few persons who try to manage to do that.
I personally :
- wrote an offline-reader requirement document
- program the scripts to build the document repository
- want to build the index with Xapian (I want to try CLucene too)
I need some help to :
- make a perfect mirror of wikipedia locally, I have a problem to import all
images.
- make a GUI over the document repository & index, I'm not a GUI programer ;
Qt, GTK, XPFE programer needed ;-)
To my opinion it's possible to present a simple offline-reader for Wikimania.
So join Wiki-offline-reader-l(a)wikimedia.org if you're interested.
Best wishes
Emmanuel
I want to know if anyone is interested in writing an
offline wiki browser.
<a href
="http://wikifilter.sourceforge.net/">WikiFilter</a>
is a small program I wrote and both the binary and the
source code (in C) are uploaded there. It is not a
complete "reader", but a background "browser", because
it manages the dump file, and does the parsing, but
relies on a web browser to display the html output.
While managing the dump file is straightforward (build
an index, search it for an article by title, etc.),
parsing wiki text to html is quite tricky. I wonder if
anyone has tried this before.
In particluar, are there any robust algorithms to
parse templates and wiki tables?
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com