Emmanuel,
maybe you can help him with a ZIM file?
/Manuel
-------- Original-Nachricht --------
Betreff: [Toolserver-l] Static dump of German Wikipedia
Datum: Fri, 24 Sep 2010 01:57:55 +0200
Von: Marco Schuster <marco(a)harddisk.is-a-geek.org>
Antwort an: toolserver-l(a)lists.wikimedia.org
An: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>,
toolserver-l <toolserver-l(a)lists.wikimedia.org>
Hi all,
I have made a list of all the 1.9M articles in NS0 (including
redirects / short pages) using the Toolserver; now I have the list I'm
going to download every single of 'em (after the trial period tonight,
I want to see how this works out. I'd like to begin with downloading
the whole thing in 3 or 4 days, if noone objects) and then publish a
static dump of it. Data collection will be on the Toolserver
(/mnt/user-store/dewiki-static/articles/); the request rate will be 1
article per second and I'll download the new files once or twice a day
to my home PC, so there should be no problem with the TS or Wikimedia
server load.
When this is finished in ~ 21-22 days, I'm going to compress them and
upload them to my private server (well, if Wikimedia has an archive
server, that 'd be better) as a tgz file so others can play with it.
Furthermore, though I have no idea if I'll succeed, I plan on hacking
a static Vector skin file which will load the articles using jQuery's
excellent .load() feature, so that everyone with JS can enjoy a truly
offline Wikipedia.
Marco
PS: When trying to invoke /w/index.php?action=render with an invalid
oldid, the server returns HTTP/1.1 200 OK and an error message, but
shouldn't this be a 404 or 500?
--
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de
_______________________________________________
Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette
Someone created a MediaWiki parser written in C - please see the mail below.
Greetings from Linux-Kongress in Nürnberg,
/Manuel
Sent via mobile phone.
-- Urspr. Mitt. --
Betreff: [Wikitech-l] Parser implementaton for MediaWiki syntax
Von: Andreas Jonsson <andreas.jonsson(a)kreablo.se>
Datum: 23.09.2010 11:28
Hi,
I have written a parser for MediaWiki syntax and have set up a test
site for it here:
http://libmwparser.kreablo.se/index.php/Libmwparsertest
and the source code is available here:
http://svn.wikimedia.org/svnroot/mediawiki/trunk/parsers/libmwparser
A preprocessor will take care of parser functions, magic words,
comment removal, and transclusion. But as it wasn't possible to
cleanly separate these functions from the existing preprocessor, some
preprocessing is disabled at the test site. It should be
straightforward to write a new preprocessor that provides only the required
functionality, however.
The parser is not feature complete, but the hard parts are solved. I
consider "the hard parts" to be:
* parsing apostrophes
* parsing html mixed with wikitext
* parsing headings and links
* parsing image links
And when I say "solved" I mean producing the same or equivalent output
as the original parser, as long as the behavior of the original parser
is well defined and produces valid html.
Here is a schematic overview of the design:
+-----------------------+
| | Wikitext
| client application +---------------------------------------+
| | |
+-----------------------+ |
^ |
| Event stream |
+----------+------------+ +-------------------------+ |
| | | | |
| parser context |<------>| Parser | |
| | | | |
+-----------------------+ +-------------------------+ |
^ |
| Token stream |
+-----------------------+ +------------+------------+ |
| | | | |
| lexer context |<------>| Lexer |<---+
| | | |
+-----------------------+ +-------------------------+
The design is described more in detail in a series of posts at the
wikitext-l mailing list. The most important "trick" is to make sure
that the lexer never produce a spurious token. An end token for a
production will not appear unless the corresponding begin token
already has been produced, and the lexer maintains a block context to
only produce tokens that makes sense in the current block.
I have used Antlr for generating both the parser and the lexer, as
Antlr supports semantic predicates that can be used for context
sensitive parsing. Also I am using a slightly patched version of
Antlr's C runtime environent, because the lexer needs to support
speculative execution in order to do context sensitive lookahead.
A Swig generated interface is used for providing the php api. The
parser process the buffer of the php string directly, and writes its
output to an array of php strings. Only UTF-8 is supported at the
moment.
The performance seems to be about the same as for the original parser
on plain text. But with an increasing amount of markup, the original
parser runs slower. This new parser implementation maintains roughly
the same performance regardless of input.
I think that this demonstrates the feasability of replacing the
MediaWiki parser. There is still a lot of work to do in order to turn
it into a full replacement, however.
Best regards,
Andreas
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Good morning.
First of all I would like to introduce myself. My name is Wilfredo
Rodriguez, I am writing from Venezuela.
I am a novice in openZIM, I'm just starting to enter the world of
performance multimedia content.
OpenZIM I've been using for some time. I think it is important to
standardize the format, this will bring benefits to all stakeholders.
I do not like talking about myself, but in this case is necessary.
I have 14 years experience programming. I specialize in Web 2.0 systems, I
have a good level (ordered by experience) of c/c++/Ruby/Java/PHP/perl/lisp
and many others. I can also get to do graphic design work fairly good.
I can understand very well English and French.
I am at your service
I hope to be of assistance in whatever they need.
--
Lcdo. Wilfredo Rafael Rodríguez Hernández
--------------------------------------------------------
msn,googletalk = wilfredor(a)gmail.com
cv = http://www.wilfredor.co.cc
blog = http://wilfredor.blogspot.com
fotos = http://picasaweb.google.com/wilfredor/
Hello, everyone.
In the last IRC meeting, I offered to take a shot at porting the LZMA2
decoder from ANSI C to Java, for the benefit of Okawix's effort to ship ZIM
support for the Android platform.
As Tommi suspected, it is indeed a little complicated.
I am, however, making progress, and am committed to finishing the job.
However, I have many constraints on my time, and I *cannot* commit to a
deadline.
Nevertheless, just in case I _am_ able to make it, what is the *latest* you
guys at Okawix can receive a working LZMA2 Decoder and still ship on time?
If I can't deliver by that deadline, I'll still finish the job, and it'll
wait for a future release, and in the meantime may come in handy to other
people who might need LZMA2 in Java.
There's _nothing working_ to see yet, but if you like, you can follow the
development on GitHub:
http://github.com/abartov/LZMA2-java
<http://github.com/abartov/LZMA2-java>I'll post to this list anyhow once I
have something working.
Cheers,
Asaf
--
Asaf Bartov <asaf.bartov(a)gmail.com>
FYI.
Asaf
---------- Forwarded message ----------
From: Asaf Bartov <asaf.bartov(a)gmail.com>
Date: 2010/9/16
Subject: Re: [Internal-l] 1M Entries of Free Knowledge Go To Africa
Thanks for the congratulations.
We will be sure to share some photos and blog posts by the students once
those become available, in the coming months.
In the meantime, some of you have asked for a linkable version of our press
release, so here's a link to a wikified version:
http://wikimedia.org.il/Press_release:Wikipedia_Goes_To_Africa
Cheers,
Asaf Bartov
Wikimedia Israel
2010/9/15 KIZU Naoko <aphaia(a)gmail.com>
Thank you Asaf for letting us share the news, and congrats for
> launching the project! I'm looking forward seeing it success -
> possibly you'll give us updates in Haifa next year? :)
>
> --
Asaf Bartov <asaf.bartov(a)gmail.com>
This is a report how and where ZIM is used.
Nothing directly related to our development, but I'd like you to know
and share this as a heads-up ;-)
/Manuel
-------- Original-Nachricht --------
Betreff: [Devnations-l] Fwd: 1M Entries of Free Knowledge Go To Africa
Datum: Wed, 15 Sep 2010 20:15:10 +0200
Von: Asaf Bartov <asaf.bartov(a)gmail.com>
Antwort an: Growing Wikimedia in developing nations
<devnations-l(a)lists.wikimedia.org>
An: devnations-l(a)lists.wikimedia.org
Hello, everyone.
Below is a report I just sent about Wikimedia Israel's recent work to
spread free knowledge in Africa. I bring it to your attention with
particular emphasis on the *means, *as I consider it an excellent
example of what we had identified as an effective approach in the
Proposed Agenda
<http://meta.wikimedia.org/w/index.php?title=File:Proposed_Agenda_for_Wikime…>
document, namely: reaching out to organizations *already active *in
developing nations and *give them Wikimedia content*.
Cheers,
Asaf Bartov
Wikimedia Israel
---------- Forwarded message ----------
From: *Asaf Bartov* <asaf.bartov(a)gmail.com <mailto:asaf.bartov@gmail.com>>
Date: Wed, Sep 15, 2010 at 8:08 PM
Subject: 1M Entries of Free Knowledge Go To Africa
To: "Local Chapters, board and officers coordination (closed
subscription)" <internal-l(a)lists.wikimedia.org
<mailto:internal-l@lists.wikimedia.org>>
Hello, everyone.
Here's a rough translation of a press release Wikimedia Israel has
issued yesterday, about the "Africa Center" project I had mentioned in
an e-mail to this list on June 30th 2010 (see that e-mail for more
background on the Africa Center at Ben Gurion University).
=== START OF PRESS RELEASE ===
TITLE: Wikipedia Goes To Africa -- Israeli Students to Leave for
Humanitarian Work in Africa, Equipped With Portable Static Wikipedia
SUBTITLE: Ben-Gurion University's "Africa Center", Wikimedia Israel, and
Hamakor cooperate in making free knowledge accessible in Africa
The Africa Center at BGU, headed by Dr. Tamar Golan, annually sends a
group of students on a three-month humanitarian expedition to developing
countries in Africa. This year's group is going to the Repbulic of
Benin and the Republic of Cameroon.
Learning about this while approaching the Africa Center for help with
developing Africa-related entries on the Hebrew Wikipedia, Wikimedia
Israel decided to equip the students with computers running free
software and containing an offline (static) version of the French
Wikipedia, so that the students can bring free knowledge to Africans
without access to the Internet.
Wikimedia Israel reached out to Hamakor, the Israeli Free and Open
Source Software NGO, and Hamakor helped obtain computer donations,
refurbished them and installed the Linux operating system on them.
Wikimedia Israel collaborated with members of Wikimedia Switzerland and
Wikimedia France to produce an up-to-date static version of the French
Wikipedia (numbering about 1 million entries, and including images),
French being a major language of reading and writing in Cameroon and Benin.
"The students also have portable installations of the offline Wikipedia,
so that they may install it on any other computers they may run across
in Africa," explained Asaf Bartov, who coordinated the project in
Wikimedia Israel, "and they have received training on using Linux and
Kiwix, the offline Wikipedia reader (free) software, so they may train
others to use the computers".
Incidentally, the Linux version installed on those computers is called
Ubuntu Linux, 'Ubuntu' being an African word (in the Zulu language)
roughly translatable as "unity of mankind" or "mutual reliance".
Supporting and promoting the distribution of free knowledge in
developing countries is one of the five major goals identified by the
Wikimedia Foundation as central to its five-year strategy plan,
developed by thousands of members of the Wikimedia Movement.
=== END OF PRESS RELEASE ===
I'd like to specifically express our gratitude to Emmanuel Engelhart,
chief developer of Kiwix, for all his help in getting the content ready
and working on a tight schedule.
We are very pleased with this project, and consider it a prime example
for seizing an opportunity for outreach that we never expected to come
our way.
As always, questions, comments, translations and relaying (of the press
release) to other lists/communities are welcome.
Asaf Bartov
Wikimedia Israel
--
Asaf Bartov <asaf.bartov(a)gmail.com <mailto:asaf.bartov@gmail.com>>
--
Asaf Bartov <asaf.bartov(a)gmail.com <mailto:asaf.bartov@gmail.com>>
Hi everyone, we need your help.
We are from Python Argentina, and we are working on adapting our
cdpedia project to make a DVD together with educ.ar and Wikimedia
Foundation, holding the entire Spanish Wikipedia that will be sent
soon to Argentinian schools.
Hernán and Diego are the two interns tasked with updating the data
that cdpedia uses to make the cd (it currently uses a static html dump
dated June 2008), but they are encountering some problems while trying
to make an up to date static html es-wikipedia dump.
I'm ccing this list of people, because I'm sure you've faced similar
issues when making your offline wikipedias, or because maybe you know
someone who can help us.
Following is an email from Hernán describing the problems he's found.
thanks!
--
alecu - Python Argentina
2010/4/30 Hernan Olivera <lholivera(a)gmail.com>:
Hi everybody,
I've been working on making an up to date static html dump for the
spanish wikipedia, to use as a basis for the DVD.
I've followed the procedures detailed in the pages below, that were
used to generate the current (and out of date) static html dumps:
1) installing and setting up a mediawiki instance
2) importing the xml from [6] with mwdumper
3) exporting the static html with mediawiki's tool
The procedure finishes without throwing any errors, but the xml import
produces malformed html pages that have visible wikimarkup.
We would really need to have a successful import from the spanish xmls
to a mediawiki instance so we can produce the up to date static html
dump.
Links to the info I used:
[0] http://www.mediawiki.org/wiki/Manual:Installation_guide/es
[1] http://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Ubuntu
[2] http://en.wikipedia.org/wiki/Wikipedia_database
[3] http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps
[4] http://meta.wikimedia.org/wiki/Importing_a_Wikipedia_database_dump_into_Med…
[5] http://meta.wikimedia.org/wiki/Data_dumps
[6] http://dumps.wikimedia.org/eswiki/20100331/
[7] http://www.mediawiki.org/wiki/Alternative_parsers
(among others)
Cheers,
--
Hernan Olivera
PS: unluckily I didn't write down every step in detail. I did a lot
more tests than what I wrote here. To make a detailed report I'd like
to go thru the procedure again writing down every option (and to check
if I missed something). I'm finishing installing a server just for
this, because this processes take forever and they blocked other tasks
while making this tests.
2009/10/23 Samuel Klein <meta.sj(a)gmail.com>:
> Jimbo - thanks for the spur to clean up the existing work.
>
> All - Let's start by cleaning up the mailing lists and setting a few
> short-term goals :-) It's a good sign that we have both charity and love
> converging to make something happen.
>
> * For all-platform all-purpose wikireaders, let's use
> offline-l(a)lists.wikimedia, as we discussed a month ago in the aftermath of
> Wikimania (Erik, were you going to set this up? I think we agreed to
> deprecate wiki-offline-reader-l and replace it with offline-l.)
>
> * For wikireaders such as WikiBrowse and Infoslicer on the XO, please
> continue to use wikireader(a)lists.laptop
>
>
> I would like to see WikiBrowse become the 'sugarized' version of a reader
> that combines the best of that and the openZim work. A standalone DVD or
> USB drive that comes with its own search tools would be another version of
> the same. As far as merging codebases goes, I don't think the WikiBrowse
> developers are invested in the name.
>
> I think we have a good first cut at selecting articles, weeding out stubs,
> and including thumbnail images. Maybe someone working on openZim can
> suggest how to merge the search processes, and that file format seems
> unambiguously better.
>
> Kul - perhaps part of the work you've been helping along for standalone
> usb-key snapshots would be useful here.
>
>
> Please continue to update this page with your thoughts and progress!
> http://meta.wikimedia.org/wiki/Offline_readers
>
> SJ
>
>
> 2009/10/23 Iris Fernández <irisfernandez(a)gmail.com>
>>
>> On Fri, Oct 23, 2009 at 1:37 PM, Jimmy Wales <jwales(a)wikia-inc.com> wrote:
>> >
>> > My dream is quite simple: a DVD that can be shipped to millions of
>> > people with an all-free-software solution for reading Wikipedia in Spanish.
>> > It should have a decent search solution, doesn't have to be perfect, but it
>> > should be full-text. It should be reasonably fast, but super-perfect is not
>> > a consideration.
>> >
>>
>> Hello! I am an educator, not a programmer. I can help selecting
>> articles or developing categories related to school issues.
>
> Iris - you know the main page of WikiBrowse that you see when the reader
> first loads? You could help with a new version of that page. Madeleine
> (copied here) worked on the first one, but your thoughts on improving it
> would be welcome.
>
>
>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
we were all agree that it started to be problematic not having more ZIM
files. This problem should be IMO slowly fixed. I have recently released
new full Wikipedia ZIM files (like always with the thumbnails) in
Spanish and Portuguese:
* http://tmp.kiwix.org/zim/0.9/wikipedia_es_all_09_2010_beta1.zim
* http://tmp.kiwix.org/zim/0.9/wikipedia_pt_all_10_2010_alpha1.zim
I have also released 10 days ago Kiwix 0.9 alpha6 fixing a few bugs and
introducing the tab navigation. Here is the CHANGELOG
http://kiwix.svn.sourceforge.net/viewvc/kiwix/moulinkiwix/CHANGELOG
This version, thanks to last libzim improvements, is also able to deal
with splitted ZIM files... which is "mandatory" to store big ZIM files
on USB key FAT32 FS. I can now automatically create from any ZIM file a
portable version of Kiwix for Windows. Simply unpack the ZIP file on
your USB and it works: all the necessary is there (autorun, ZIM, search
index, installer, source codes, packages,...)
http://tmp.kiwix.org/portable/
Be informed about the next steps:
http://identi.ca/group/kiwix
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkyOesMACgkQn3IpJRpNWtO5YACfYqWOkYP3tlnAy7KbAqIQCTN/
9Z4AoM7ZbbjV/FG5dfOsLotCxNL8jM1J
=SMsL
-----END PGP SIGNATURE-----
Hello all,
We worked on porting Okawix ( http://okawix.com ) to the Android platform. So far we included Zeno files support with both gzip and bz2 compression formats.
Now, we wants to add Zim files support but we're stuck due to the compression format: Zim files are generated using the LZMA2 format which is not easily available on the Android / Java platform. On the other hand, the OpenZim website lists gzip and bz2 as possible compression format for Zim files. Would it be possible to generate a Zim file with of those compression formats? Or is the documentation on the website out of date?
We only have two weeks left to port Okawix to the Android platform and if we can't get Zim support working in this delay, we would have to release Okawix with Zeno files support only.
Thank you for your help to developp an opensource offline reader including ZIM for Java Android.
Sincerely
Pascal Martin
Dear all,
I may welcome Heiko from Pediapress in the team. We were in touch with
him during Wikimedia Conference in April this year when we started to
think about adding openZIM into the Collections Extension of MediaWiki.
One of the drivers was Asaf whereas the Collections Extension itself is
developed and maintained by Pediapress.
Currently Pediapress is looking seriously into the actual implementation
of openZIM. For the start there are now some question on how to do it
the best way etc., as far as I understood a test environment is already
available.
So if you have time please meet tonight with Heiko in our IRC channel on
Freenode, #openzim.
Thanks for your support,
/Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch