http://bugs.openzim.org/show_bug.cgi?id=18
Summary: debian needs an init.d script
Product: openZIM
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P5
Component: zimreader
AssignedTo: tommi(a)tntnet.org
ReportedBy: andyr(a)wizzy.com
CC: dev-l(a)openzim.org
Estimated Hours: 0.0
One attached.
--
Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.openzim.org/show_bug.cgi?id=21
Summary: Zimlib should allow to get/unpack only a part of a
content
Product: openZIM
Version: unspecified
Platform: PC
OS/Version: Windows
Status: NEW
Severity: enhancement
Priority: P5
Component: zimlib
AssignedTo: tommi(a)tntnet.org
ReportedBy: emmanuel(a)engelhart.org
CC: dev-l(a)openzim.org
Estimated Hours: 0.0
The zimlib needs to fully unpack a content before giving delivering it to a
third part software.
This has many disadvantages especially if the content is big, a video for
example:
* This will need pretty much memory
* This will take time
* You do not have a random access (necessary to seek in an HTML5 video)
It would be a really good usability improvement to have a method which delivers
only a part of any content.
--
Configure bugmail: http://bugs.openzim.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Hi everyone, we need your help.
We are from Python Argentina, and we are working on adapting our
cdpedia project to make a DVD together with educ.ar and Wikimedia
Foundation, holding the entire Spanish Wikipedia that will be sent
soon to Argentinian schools.
Hernán and Diego are the two interns tasked with updating the data
that cdpedia uses to make the cd (it currently uses a static html dump
dated June 2008), but they are encountering some problems while trying
to make an up to date static html es-wikipedia dump.
I'm ccing this list of people, because I'm sure you've faced similar
issues when making your offline wikipedias, or because maybe you know
someone who can help us.
Following is an email from Hernán describing the problems he's found.
thanks!
--
alecu - Python Argentina
2010/4/30 Hernan Olivera <lholivera(a)gmail.com>:
Hi everybody,
I've been working on making an up to date static html dump for the
spanish wikipedia, to use as a basis for the DVD.
I've followed the procedures detailed in the pages below, that were
used to generate the current (and out of date) static html dumps:
1) installing and setting up a mediawiki instance
2) importing the xml from [6] with mwdumper
3) exporting the static html with mediawiki's tool
The procedure finishes without throwing any errors, but the xml import
produces malformed html pages that have visible wikimarkup.
We would really need to have a successful import from the spanish xmls
to a mediawiki instance so we can produce the up to date static html
dump.
Links to the info I used:
[0] http://www.mediawiki.org/wiki/Manual:Installation_guide/es
[1] http://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Ubuntu
[2] http://en.wikipedia.org/wiki/Wikipedia_database
[3] http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps
[4] http://meta.wikimedia.org/wiki/Importing_a_Wikipedia_database_dump_into_Med…
[5] http://meta.wikimedia.org/wiki/Data_dumps
[6] http://dumps.wikimedia.org/eswiki/20100331/
[7] http://www.mediawiki.org/wiki/Alternative_parsers
(among others)
Cheers,
--
Hernan Olivera
PS: unluckily I didn't write down every step in detail. I did a lot
more tests than what I wrote here. To make a detailed report I'd like
to go thru the procedure again writing down every option (and to check
if I missed something). I'm finishing installing a server just for
this, because this processes take forever and they blocked other tasks
while making this tests.
2009/10/23 Samuel Klein <meta.sj(a)gmail.com>:
> Jimbo - thanks for the spur to clean up the existing work.
>
> All - Let's start by cleaning up the mailing lists and setting a few
> short-term goals :-) It's a good sign that we have both charity and love
> converging to make something happen.
>
> * For all-platform all-purpose wikireaders, let's use
> offline-l(a)lists.wikimedia, as we discussed a month ago in the aftermath of
> Wikimania (Erik, were you going to set this up? I think we agreed to
> deprecate wiki-offline-reader-l and replace it with offline-l.)
>
> * For wikireaders such as WikiBrowse and Infoslicer on the XO, please
> continue to use wikireader(a)lists.laptop
>
>
> I would like to see WikiBrowse become the 'sugarized' version of a reader
> that combines the best of that and the openZim work. A standalone DVD or
> USB drive that comes with its own search tools would be another version of
> the same. As far as merging codebases goes, I don't think the WikiBrowse
> developers are invested in the name.
>
> I think we have a good first cut at selecting articles, weeding out stubs,
> and including thumbnail images. Maybe someone working on openZim can
> suggest how to merge the search processes, and that file format seems
> unambiguously better.
>
> Kul - perhaps part of the work you've been helping along for standalone
> usb-key snapshots would be useful here.
>
>
> Please continue to update this page with your thoughts and progress!
> http://meta.wikimedia.org/wiki/Offline_readers
>
> SJ
>
>
> 2009/10/23 Iris Fernández <irisfernandez(a)gmail.com>
>>
>> On Fri, Oct 23, 2009 at 1:37 PM, Jimmy Wales <jwales(a)wikia-inc.com> wrote:
>> >
>> > My dream is quite simple: a DVD that can be shipped to millions of
>> > people with an all-free-software solution for reading Wikipedia in Spanish.
>> > It should have a decent search solution, doesn't have to be perfect, but it
>> > should be full-text. It should be reasonably fast, but super-perfect is not
>> > a consideration.
>> >
>>
>> Hello! I am an educator, not a programmer. I can help selecting
>> articles or developing categories related to school issues.
>
> Iris - you know the main page of WikiBrowse that you see when the reader
> first loads? You could help with a new version of that page. Madeleine
> (copied here) worked on the first one, but your thoughts on improving it
> would be welcome.
>
>
>
Le mer 28/07/10 07:51, "Manuel Schneider" manuel.schneider(a)wikimedia.ch a écrit:
> There no news for you about my part - I basically presented what openZIM
> is, how it works, which challenges we meet and which solutions we found
> and answered a lot of questions about it.
Thank you for presentig openZIM, this was essential.
> Europe. That's why they developed an offline version of Malayalam
> Wikipedia and distributed 500 CDs in schools.
> Additionally there is the problem that Malayalam scripts are not
> displayed correctly on a standard computer. While they have much less
> data - they "just" made an HTML dump of Malayalam Wikipedia - they had
> to modify the CSS and use some CSS tricks with builtin fonts to enable a
> correct display of the articles. Also the filenames had to be changed to
> work with an ISO image and thus all the links had to be adopted.
> At least for the last part openZIM could be suitable for them.
> Interesting was the fact that even the Malayalam Wikipedia is not
> displayed correctly if you don't know their tricks.
Yes, although this seems to me to be a Windows font issue... but the point
was interesting and at this occasion I discovered a bug in my ZIM build script :)
The discussion show me also the need of an automatic transliteration tool (latin-X)
in Kiwix for search imput fields (Santosh have done one in JS in the web pages
itself for Malayalam alphabet).
> I hope that Emmanuel and Asaf want to tell us more here about their
> experience of Wikimania 2010.
Important points for me:
* Meeting "in the real world" few people and pushing our common projects. Hope
to have a few annoucements to do until the end of the year.
* Get feedbacks of users about Kiwix.
* Next release wave in English with WP1 0.701 and 0.8
* New challenges with pushing plan for Wikitrust and starting work with new WP1 assessment&selection tool
* Launching Kiwix port for OLPC
Cheers
Emmanuel
Hi all,
it is time that I post some information about the Wikimania conference
2010 which took place two weeks ago in Gdansk, Poland.
While I happened to become an organizer of Wikimania I didn't had time
to attend all the talks I wanted. But I was able to present a workshop
which we (other submitters of similar workshops and me) merged into a a
bigger, joint workshop about offline content usage in general.
The abstract can be found here:
http://wikimania2010.wikimedia.org/wiki/Submissions/Creating_offline_versio…
Asaf and Emmanuel also attended this workshop.
There no news for you about my part - I basically presented what openZIM
is, how it works, which challenges we meet and which solutions we found
and answered a lot of questions about it.
Martin talked about the process in general and the selection of the
right content for such an offline publication. This is based on special
ratings of articles by the users. He also pointed out how this rating
improved the overall quality of the wiki projects and how it is being
used for other purposes than just offline selection.
Santosh and Shiju were talking about their need to deploy offline
content in India as the internet penetration is somewhat lower than in
Europe. That's why they developed an offline version of Malayalam
Wikipedia and distributed 500 CDs in schools.
Additionally there is the problem that Malayalam scripts are not
displayed correctly on a standard computer. While they have much less
data - they "just" made an HTML dump of Malayalam Wikipedia - they had
to modify the CSS and use some CSS tricks with builtin fonts to enable a
correct display of the articles. Also the filenames had to be changed to
work with an ISO image and thus all the links had to be adopted.
At least for the last part openZIM could be suitable for them.
Interesting was the fact that even the Malayalam Wikipedia is not
displayed correctly if you don't know their tricks.
There have been other workshops on PediaPress and the collection
extension which whould've been interesting as we planned to make an
openZIM interface to the collections extension. Unfortunately I was not
able to attend this session. Maybe Asaf can tell us more.
Asaf also hold an open meeting for people interested in Developing World
issues, a discussion which he started at the Chapters Meeting in April
in Berlin. I attended the first meeting in Berlin and it was quite
interesting and I see some potential for openZIM to become a part of
this effort as offline content is indeed one of the things needed.
I hope that Emmanuel and Asaf want to tell us more here about their
experience of Wikimania 2010.
Regards,
Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
in the wikipedia-de.zim file it seems that in the url, the " " character
is always replaced by the "+" character. This means that the article
with the URL (in the ZIM) "Deutscher Badminton-Club Bonn" will have
links pointing on it in the HTML looking like
"Deutscher+Badminton-Club+Bonn". For this reason I'm not able to open
link containing more than a word.
Is that an error? This seems to me not to be compliant with the URL
standards?
If not how to deal with it? especially how should we represent the "+"
character?
Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkxAoLQACgkQn3IpJRpNWtN+/ACdHm3YaYfRoDwKtW5zvQVat/qW
96UAoJL+zqn/fpf9Nd3O4we/bY/Ayygi
=bJ7y
-----END PGP SIGNATURE-----
Le mer 07/07/10 09:35, "Hernan Olivera" lholivera(a)gmail.com a écrit:
> Hi everyone. We need your help again.
>
> We finally have a working mirror for generating the static html
> version of ESWIKI we need for Cd-Pedia using DumpHTML extension.
> But it seems that the process will take about 3000 hours of
> processing in our little sempron server (4 months!).
I have in the past prepared a full ZIM file (with thumbnails) of WPIT and it tooks ~3 weeks (but not only the dumpHTML part of the process. also mirroring, optimizing, etc.).
http://tmp.kiwix.org/zim/0.9/wikipedia_it_all_kiwix_03_alpha2.zim
I'm currently preparing something similar in French and Spanish. I'm doing that on a new server so I think this will take agains a few weeks. I should release a first version for both Wikipedias during August.
Emmanuel
Dear friends,
I'd like to remind you that next week - July 9th - 11th - Wikimania will
take place in Gdansk:
http://wikimania2010.wikimedia.org/wiki/Main_Page
If you have time I recommend you to register for Wikimania. A big part
of the openZIM team will meet there - Emmanuel, Asaf, Tomasz, me - with
others from the Wikipedia Offline universe like Heiko from PediaPress,
Samuel "SJ" Klein, Martin A. Walker and many others.
We will have a big workshop on Wikipedia Offline, discussing both
selection, extraction and storage:
http://wikimania2010.wikimedia.org/wiki/Schedule#Creating_Offline_Version_o…http://wikimania2010.wikimedia.org/wiki/Schedule#Creating_Offline_Version_o…
Of course I will bring my Nokia phone with "WikiOnBoard" with me ;-)
See you there!
/Manuel
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch