Hi all,
I am wondering whether you could give me some tips to find an appropriate solution for our scenario! We have got a small amount of resources to throw at this, and can hopefully develop something that will be useful for others.
We have a wiki here: http://orbit.educ.cam.ac.uk/ which has a small number of pages (by comparison!), but it does contain video.
We would like to be able to create offline (read-only) copy of this. This is because some of it is aimed at teacher education in Africa, where connectivity it poor. At one of the schools we work with we have a low-power server, with a local wifi network, so basically we'd like to get an offline copy onto that server. Html is probably the best solution (as it could then be indexed for searching on our server), and we want to be able to generate an update fairly frequently (over low bandwidht) so updates need to be incremental.
One approach I've taken in the past was this: pull html versions of pages via the API (and doing regexp on them, e.g. to preserve local links between pages, but leave links for editing pointing at the online version). This could be a good solution, but it requires creating of a static html version on the server, that's then periodically updates. It also requires a lot of hacky regexp to get the pulled html into the right format. It would be good to have something in the API that could output pages straight as a 'localised' html format. Another issue was that the API doesn't easily allow finding the most recent version of all pages that have changes since a certain date, so I had to mess around with the OAI extension to get this information. Overall, I got it to work, but it relied on so many hacks that it wasn't really maintanable.
What would be ideal would be to have a local script (i.e. on the remote server) managing this, without us having to create an (intermediary) html copy ever so often. The remote server contacts our wiki when it has connectivity, and fetches updates at night. The only thing the wiki does is produce a list pages that have changed since a certain date, and (via the API) provides suitable html for download. I should say that massive scalability isn't an issue: We won't have loads of pages any time soon, and we won't have lots of mirror sites.
Once we've got an html version (i.e. wiki -> local html), we'd also like to build a little app for Android, that can download such an html copy (again, ideally straight from a wiki, without intermediary html copy somewhere). The app would manage downloading the updates, keep content as html on the SD card, and would allow users to launch the SD card content (into a mobile browser). (We'd need to take care of how to hand video, probably use HTML5 video tags.)
I would really appreciate your feedback on this! The above strategy (modify the API to give localised html) seems simple enough to me - do you have particular views on what could work best?
Bjoern
(Btw. I am also interested in ePub, but that's a different scenario!)
Hi all,
I am wondering whether you could give me some tips to find an appropriate solution for our scenario! We have got a small amount of resources to throw at this, and can hopefully develop something that will be useful for others.
We have a wiki here: http://orbit.educ.cam.ac.uk/ which has a small number of pages (by comparison!), but it does contain video.
We would like to be able to create offline (read-only) copy of this. This is because some of it is aimed at teacher education in Africa, where connectivity it poor. At one of the schools we work with we have a low-power server, with a local wifi network, so basically we'd like to get an offline copy onto that server. Html is probably the best solution (as it could then be indexed for searching on our server), and we want to be able to generate an update fairly frequently (over low bandwidht) so updates need to be incremental.
One approach I've taken in the past was this: pull html versions of pages via the API (and doing regexp on them, e.g. to preserve local links between pages, but leave links for editing pointing at the online version). This could be a good solution, but it requires creating of a static html version on the server, that's then periodically updates. It also requires a lot of hacky regexp to get the pulled html into the right format. It would be good to have something in the API that could output pages straight as a 'localised' html format. Another issue was that the API doesn't easily allow finding the most recent version of all pages that have changes since a certain date, so I had to mess around with the OAI extension to get this information. Overall, I got it to work, but it relied on so many hacks that it wasn't really maintanable.
What would be ideal would be to have a local script (i.e. on the remote server) managing this, without us having to create an (intermediary) html copy ever so often. The remote server contacts our wiki when it has connectivity, and fetches updates at night. The only thing the wiki does is produce a list pages that have changed since a certain date, and (via the API) provides suitable html for download. I should say that massive scalability isn't an issue: We won't have loads of pages any time soon, and we won't have lots of mirror sites.
Once we've got an html version (i.e. wiki -> local html), we'd also like to build a little app for Android, that can download such an html copy (again, ideally straight from a wiki, without intermediary html copy somewhere). The app would manage downloading the updates, keep content as html on the SD card, and would allow users to launch the SD card content (into a mobile browser). (We'd need to take care of how to hand video, probably use HTML5 video tags.)
I would really appreciate your feedback on this! The above strategy (modify the API to give localised html) seems simple enough to me - do you have particular views on what could work best?
Bjoern
(Btw. I am also interested in ePub, but that's a different scenario!)
Hi all,
I am wondering whether you could give me some tips to find an appropriate solution for our scenario! We have got a small amount of resources to throw at this, and can hopefully develop something that will be useful for others.
We have a wiki here: http://orbit.educ.cam.ac.uk/ which has a small number of pages (by comparison!), but it does contain video.
We would like to be able to create offline (read-only) copy of this. This is because some of it is aimed at teacher education in Africa, where connectivity it poor. At one of the schools we work with we have a low-power server, with a local wifi network, so basically we'd like to get an offline copy onto that server. Html is probably the best solution (as it could then be indexed for searching on our server), and we want to be able to generate an update fairly frequently (over low bandwidht) so updates need to be incremental.
One approach I've taken in the past was this: pull html versions of pages via the API (and doing regexp on them, e.g. to preserve local links between pages, but leave links for editing pointing at the online version). This could be a good solution, but it requires creating of a static html version on the server, that's then periodically updates. It also requires a lot of hacky regexp to get the pulled html into the right format. It would be good to have something in the API that could output pages straight as a 'localised' html format. Another issue was that the API doesn't easily allow finding the most recent version of all pages that have changes since a certain date, so I had to mess around with the OAI extension to get this information. Overall, I got it to work, but it relied on so many hacks that it wasn't really maintanable.
What would be ideal would be to have a local script (i.e. on the remote server) managing this, without us having to create an (intermediary) html copy ever so often. The remote server contacts our wiki when it has connectivity, and fetches updates at night. The only thing the wiki does is produce a list pages that have changed since a certain date, and (via the API) provides suitable html for download. I should say that massive scalability isn't an issue: We won't have loads of pages any time soon, and we won't have lots of mirror sites.
Once we've got an html version (i.e. wiki -> local html), we'd also like to build a little app for Android, that can download such an html copy (again, ideally straight from a wiki, without intermediary html copy somewhere). The app would manage downloading the updates, keep content as html on the SD card, and would allow users to launch the SD card content (into a mobile browser). (We'd need to take care of how to hand video, probably use HTML5 video tags.)
I would really appreciate your feedback on this! The above strategy (modify the API to give localised html) seems simple enough to me - do you have particular views on what could work best?
Bjoern
(Btw. I am also interested in ePub, but that's a different scenario!)
Hi all,
I am wondering whether you could give me some tips to find an appropriate solution for our scenario! We have got a small amount of resources to throw at this, and can hopefully develop something that will be useful for others.
We have a wiki here: http://orbit.educ.cam.ac.uk/ which has a small number of pages (by comparison!), but it does contain video.
We would like to be able to create offline (read-only) copy of this. This is because some of it is aimed at teacher education in Africa, where connectivity it poor. At one of the schools we work with we have a low-power server, with a local wifi network, so basically we'd like to get an offline copy onto that server. Html is probably the best solution (as it could then be indexed for searching on our server), and we want to be able to generate an update fairly frequently (over low bandwidht) so updates need to be incremental.
One approach I've taken in the past was this: pull html versions of pages via the API (and doing regexp on them, e.g. to preserve local links between pages, but leave links for editing pointing at the online version). This could be a good solution, but it requires creating of a static html version on the server, that's then periodically updates. It also requires a lot of hacky regexp to get the pulled html into the right format. It would be good to have something in the API that could output pages straight as a 'localised' html format. Another issue was that the API doesn't easily allow finding the most recent version of all pages that have changes since a certain date, so I had to mess around with the OAI extension to get this information. Overall, I got it to work, but it relied on so many hacks that it wasn't really maintanable.
What would be ideal would be to have a local script (i.e. on the remote server) managing this, without us having to create an (intermediary) html copy ever so often. The remote server contacts our wiki when it has connectivity, and fetches updates at night. The only thing the wiki does is produce a list pages that have changed since a certain date, and (via the API) provides suitable html for download. I should say that massive scalability isn't an issue: We won't have loads of pages any time soon, and we won't have lots of mirror sites.
Once we've got an html version (i.e. wiki -> local html), we'd also like to build a little app for Android, that can download such an html copy (again, ideally straight from a wiki, without intermediary html copy somewhere). The app would manage downloading the updates, keep content as html on the SD card, and would allow users to launch the SD card content (into a mobile browser). (We'd need to take care of how to hand video, probably use HTML5 video tags.)
I would really appreciate your feedback on this! The above strategy (modify the API to give localised html) seems simple enough to me - do you have particular views on what could work best?
Bjoern
(Btw. I am also interested in ePub, but that's a different scenario!)
Hi Björn,
basically ZIM files can hold whatever content, so having videos in them is theoretically no problem at all.
1st question: Does Kiwix (the most popular ZIM reader) support video? As Kiwix is using XUL, based on Firefox, chances are that Ogg/Theora videos may work out of the box.
2nd question: Do the ZIM exporter (either Extension:Collection or Kiwix' perl tools) export embedded videos as well? When using the perl tools it should be no problem to amend if needed.
/Manuel
Hi Manuel,
thanks for the answer!
ZIM would be for desktop use though. That's a definitely a good start, and we should pursue this in parallel. How do you enable ZIM in the Collection extension?
However, my understanding is that ZIM files do not allow incremental updates. With our site in Africa, we can only push ~ 10s of MB per night, so incremental updates are essential. We're also looking for a server-based solution, so that we only need to distribute to our low-power server.
Thanks, Bjoern
On 18 June 2012 12:17, Manuel Schneider manuel.schneider@wikimedia.ch wrote:
Hi Björn,
basically ZIM files can hold whatever content, so having videos in them is theoretically no problem at all.
1st question: Does Kiwix (the most popular ZIM reader) support video? As Kiwix is using XUL, based on Firefox, chances are that Ogg/Theora videos may work out of the box.
2nd question: Do the ZIM exporter (either Extension:Collection or Kiwix' perl tools) export embedded videos as well? When using the perl tools it should be no problem to amend if needed.
/Manuel
Regards Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l
Hi Björn,
Am 18.06.2012 13:41, schrieb Bjoern Hassler:
ZIM would be for desktop use though. That's a definitely a good start, and we should pursue this in parallel. How do you enable ZIM in the Collection extension?
Check out these pages:
http://www.mediawiki.org/wiki/Extension:Collection
http://mwlib.readthedocs.org/en/latest/collection.html#installation-and-conf...
look for $wgCollectionFormats
However, my understanding is that ZIM files do not allow incremental updates. With our site in Africa, we can only push ~ 10s of MB per night, so incremental updates are essential. We're also looking for a server-based solution, so that we only need to distribute to our low-power server.
Well, out of the box it doesn't do that but it's technically possible and has been discussed a few times. Small ZIM files with updated content can be used in parallel to a base file which holds the initial content. Split ZIM files are already supported out of the box. Tools like zim-merge have been been discussed when talking about incremental updates.
Emmanuel (Kelson) and Tommi can tell us more about this.
/Manuel
Am 18.06.2012 14:03, schrieb Manuel Schneider:
Hi Björn,
...
Well, out of the box it doesn't do that but it's technically possible and has been discussed a few times. Small ZIM files with updated content can be used in parallel to a base file which holds the initial content. Split ZIM files are already supported out of the box. Tools like zim-merge have been been discussed when talking about incremental updates.
Emmanuel (Kelson) and Tommi can tell us more about this.
/Manuel
Right. It is my task to create a zim merge. I have started coding again for open zim, but it is not ready yet. The plan is to allow merging a base zim file with a smaller zim file with updated articles.
Tommi
On 18/06/2012 13:17, Manuel Schneider wrote:
1st question: Does Kiwix (the most popular ZIM reader) support video? As Kiwix is using XUL, based on Firefox, chances are that Ogg/Theora videos may work out of the box.
Kiwix supports HTML videos since the version 0.9 RC1... I will release today :)
2nd question: Do the ZIM exporter (either Extension:Collection or Kiwix' perl tools) export embedded videos as well? When using the perl tools it should be no problem to amend if needed.
Still work to do there... But this is not the job of my perl tools, but of the Mediawiki:DumpHTML extension.
Emmanuel