Greetings All,
With the annual fundraiser now over I'm going to be shifting a lot of my interest to both Mobile and Offline here at the WMF. I wanted to give you guys a quick update on our offline happenings and to highlight ways of working with us to make these projects successful.
- PediaPress, Collections and openZim
PediaPress has agreed to do some contract work for us to extend the current collections MediaWiki extension to support openZim ! The'll be starting on this mid January and will be looking for community members to help with testing valid openZim indices.
- Usability evaluation of Kiwix
Kiwix has been an awesome solution for browsing offline Wikipedia. We think that it could be even better with some careful attention to the search and browse experience. During Q1 we'll be doing a UX evaluation of Kiwix to find out how we can make the interface even easier to use. We'll be working closely with Emmanuel and others to implement our findings.
- Evaluation and extension of Wikipedia Release Version Tools
The offline Wikipedia team has been steadily releasing new versions of their Wikipedia collection but technical limitations have hampered how quickly those can be finished. Were taking a close look at their tools and seeing where we can help in creation of new indices.
Stay tuned for more ...
--tomasz
Hi Tomasz,
thank you for the information about current happenings.
What I'm especially interested in are the technical limitations, the offline Wikipedia team has. What are these technical limitations? Is there anything where I can help.
Recently I created a tool, which just grabs all articles from a running mediawiki using the api and creates a zim file out of it. No indirection through files or database any more. That was really easy.
The tool is not really finished yet since it needs to rewrite links to point to the zim file to be really usable, but that is easy to do e.g. using regular expressions.
If I know, how you plan to create the zim files, I may create a tool, which makes that as smooth as possible.
Tommi
On Jan 8, 2011, at 2:01 AM, Tommi Mäkitalo wrote:
Hi Tomasz,
thank you for the information about current happenings.
What I'm especially interested in are the technical limitations, the offline Wikipedia team has. What are these technical limitations? Is there anything where I can help.
We'll get our first good understanding of that once the collections work is done.
Recently I created a tool, which just grabs all articles from a running mediawiki using the api and creates a zim file out of it. No indirection through files or database any more. That was really easy.
Nice .. this will be super nice to test with.
The tool is not really finished yet since it needs to rewrite links to point to the zim file to be really usable, but that is easy to do e.g. using regular expressions.
If I know, how you plan to create the zim files, I may create a tool, which makes that as smooth as possible.
I'll mail out as soon as I hear back from my PediaPress contact.
--tomasz
Hi Tomasz,
thanks for the heads-up. It is very exciting to see things evolve on a broad front now.
Am 08.01.2011 01:54, schrieb Tomasz Finc:
- PediaPress, Collections and openZim
PediaPress has agreed to do some contract work for us to extend the current collections MediaWiki extension to support openZim ! The'll be starting on this mid January and will be looking for community members to help with testing valid openZim indices.
I have added this to * http://openzim.org/Developer_Meetings/2011-1#Agenda
because I hope that we meet all involved persons there.
- Usability evaluation of Kiwix
Kiwix has been an awesome solution for browsing offline Wikipedia. We think that it could be even better with some careful attention to the search and browse experience. During Q1 we'll be doing a UX evaluation of Kiwix to find out how we can make the interface even easier to use. We'll be working closely with Emmanuel and others to implement our findings.
This is now also on the agenda of Wikimania / Dev. Meetup.
Renaud Gaudin who is working on Kiwix as a contractor will - as far as discussed with Emmanuel - be part of that meeting.
- Evaluation and extension of Wikipedia Release Version Tools
The offline Wikipedia team has been steadily releasing new versions of their Wikipedia collection but technical limitations have hampered how quickly those can be finished. Were taking a close look at their tools and seeing where we can help in creation of new indices.
I had a discussion with Roan concerning "wikizim", the tool that creates a ZIM file from a whole wiki by using the API. His idea was to integrate the parser and the relevant wikizim code to make a dumping tool with less overhead, suitable for the Wikimedia static dumps. I have pointed him to the relevant code in the openZIM SVN, but the discussion has been interrupted by the MW 1.17 release he had to work on.
Additionally Jessie Wild from the Wikimedia Foundation was in contact with me concerning a strategical Wikimedia Offline meetup, somewhere before Wikimania. I am in favour of that and would recommend to do this as apart of the Wikimedia Conference in Berlin, before the chapters meeting on 25th - 27th.
I would appreciate it if you could talk to Roan and Jessie to keep them and us in the loop. Maybe we should move part of the discussions - as suggested by Jessie - to offline-l, when everyone interested in general strategical Offline discussions is subscribed there (to not to clog the developer mailing lists and still avoid private mailings which always leaves someone out).
/Manuel
I had a discussion with Roan concerning "wikizim", the tool that creates a ZIM file from a whole wiki by using the API. His idea was to integrate the parser and the relevant wikizim code to make a dumping tool with less overhead, suitable for the Wikimedia static dumps. I have pointed him to the relevant code in the openZIM SVN, but the discussion has been interrupted by the MW 1.17 release he had to work on.
Yeah, Roan is going to have zero to no time for working on that. I have a couple of other people in mind at WMF who could help but if there are any community devs that want to help out then I'd love to get them started.
Additionally Jessie Wild from the Wikimedia Foundation was in contact with me concerning a strategical Wikimedia Offline meetup, somewhere before Wikimania. I am in favour of that and would recommend to do this as apart of the Wikimedia Conference in Berlin, before the chapters meeting on 25th - 27th.
Looks like the Dev conference is going to be moved to later on in the year so we'll have to re-evaluate times. Either way though we should find a time to do it.
--tomasz
On Wed, 19 Jan 2011 03:09:34 -0800, Tomasz Finc tfinc@wikimedia.org wrote:
I had a discussion with Roan concerning "wikizim", the tool that creates a ZIM file from a whole wiki by using the API. His idea was to integrate the parser and the relevant wikizim code to make a dumping tool with less overhead, suitable for the Wikimedia static dumps. I have pointed him to the relevant code in the openZIM SVN, but the discussion has been interrupted by the MW 1.17 release he had to work on.
Yeah, Roan is going to have zero to no time for working on that. I have a couple of other people in mind at WMF who could help but if there are any community devs that want to help out then I'd love to get them started.
Having a tool building a ZIM using the Mediawiki API is essential for people not having access to the system running Mediawiki.
For people with access to the system. using the API to get the content is not the best solution because it impeaches or makes difficult any post-treatments. In addition, this is not the best approach to transmit all over HTTP if you have access to the content directly.
So, I think: * A zimwritermw (zimwriter using the MW API) console tool should be developed (Tommi already started). * Pediapress ZIM solution should be based on it ; otherwise I do not see how we will avoid duplicated work. * zimwriterdisk (zimwriter using static HTML&Media files) should be continued (Tommi already has a stub I think) * MW DumpHTML extension should be revisited to work correctly, with zimwriterdisk and potential additional post-treatment tools, this could be really powerful.
All this 4 tools separately are pretty simple to code and with all of them we might do everything we need.
So, I feel me concerned that neither DumpHTML nor zimwriterdisk are part of the discussions.
Cheers Emmanuel
On Wed, Jan 19, 2011 at 1:51 PM, emmanuel@engelhart.org wrote:
Having a tool building a ZIM using the Mediawiki API is essential for people not having access to the system running Mediawiki.
For people with access to the system. using the API to get the content is not the best solution because it impeaches or makes difficult any post-treatments. In addition, this is not the best approach to transmit all over HTTP if you have access to the content directly.
So, I think:
- A zimwritermw (zimwriter using the MW API) console tool should be
developed (Tommi already started).
- Pediapress ZIM solution should be based on it ; otherwise I do not see
how we will avoid duplicated work.
- zimwriterdisk (zimwriter using static HTML&Media files) should be
continued (Tommi already has a stub I think)
- MW DumpHTML extension should be revisited to work correctly, with
zimwriterdisk and potential additional post-treatment tools, this could be really powerful.
All this 4 tools separately are pretty simple to code and with all of them we might do everything we need.
So, I feel me concerned that neither DumpHTML nor zimwriterdisk are part of the discussions.
+1