On Apr 13, 2020, at 3:55 PM, Francis Franck francis.franck@gmail.com wrote:
Thank you for your reaction .
It sounds like you're trying to produce a static copy of your site for backup or offline use?
Indeed, my intention is to create a standalone copy of my wiki for archive purposes. Because of its relation with the cities' history I would like to get the wiki stored in the Archive of the city of Alkmaar but the problem is that they are not equipped for a Mediawiki-oriented structure. The require a fully html oriented version.
If so, maybe see the project here https://github.com/openzim/mwoffliner or this previous discussion https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000994.html
I have tested mwoffliner but that doesn't incorporate sidebars and seem to have problems with categories and namespaces. I had the impression that Parsoid was much better equipped to make a thrustworthy copy.
mwoffliner builds on the Parsoid output.
Parsoid only handles the page content, not the chrome, so it won't give you the sidebar, etc.
Please see above linked discussion for some ideas on how to make html dumps of your wiki.
Particularly, https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000997.html https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000998.html
Up to now I've got the best results with HTTrack. It Includes the sidebar on most of the pages. Just a pity that it doesn't show pictures that have their size adapted (e.g. [File:Francis Franck.jpeg|left|200x640px| ]).
It think we can close the case here. Thanks again for your reactions.
On Mon, 13 Apr 2020 at 22:35, Arlo Breault abreault@wikimedia.org wrote:
On Apr 13, 2020, at 3:55 PM, Francis Franck francis.franck@gmail.com
wrote:
Thank you for your reaction .
It sounds like you're trying to produce a static copy of your site for
backup or offline use?
Indeed, my intention is to create a standalone copy of my wiki for
archive purposes. Because of its relation with the cities' history I would like to get the wiki stored in the Archive of the city of Alkmaar but the problem is that they are not equipped for a Mediawiki-oriented structure. The require a fully html oriented version.
If so, maybe see the project here https://github.com/openzim/mwoffliner
or this previous discussion https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000994.html
I have tested mwoffliner but that doesn't incorporate sidebars and seem
to have problems with categories and namespaces. I had the impression that Parsoid was much better equipped to make a thrustworthy copy.
mwoffliner builds on the Parsoid output.
Parsoid only handles the page content, not the chrome, so it won't give you the sidebar, etc.
Please see above linked discussion for some ideas on how to make html dumps of your wiki.
Particularly, https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000997.html https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000998.html
Francis Franck, 14/04/20 23:32:
Up to now I've got the best results with HTTrack. It Includes the sidebar on most of the pages.
If your skin is simple enough to work correctly as saved by HTTrack, we lucky! The DumpHTML extension's main trouble was basically how to adapt the skin. Another possibility is to use wget or wpull and save a WARC file, to be then served with warc-proxy or similar. That's the standard for web preservation.
It might be obvious, but don't forget to archive a full dump on archive.org. https://www.mediawiki.org/wiki/Manual:Backup
If you're the same person who asked about this on the #archiveteam and #wikiteam channels, sorry for the repetition.
Federico
Thanks, I'll check it out. Most of my site is now in .html I'm prepared to archive it on archive.org but I've not yet understood how to proceed with a site that has several subdirs to store images and other special pages.
Kind regards, Francis
On Tue, 14 Apr 2020 at 22:40, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Francis Franck, 14/04/20 23:32:
Up to now I've got the best results with HTTrack. It Includes the sidebar on most of the pages.
If your skin is simple enough to work correctly as saved by HTTrack, we lucky! The DumpHTML extension's main trouble was basically how to adapt the skin. Another possibility is to use wget or wpull and save a WARC file, to be then served with warc-proxy or similar. That's the standard for web preservation.
It might be obvious, but don't forget to archive a full dump on archive.org. https://www.mediawiki.org/wiki/Manual:Backup
If you're the same person who asked about this on the #archiveteam and #wikiteam channels, sorry for the repetition.
Federico
Francis Franck, 17/04/20 17:14:
I'm prepared to archive it on archive.org http://archive.org but I've not yet understood how to proceed with a site that has several subdirs to store images and other special pages.
Please archive the XML, the HTML is not very useful. If you want to add the HTML for reference, just make a zip/tar/7z of it.
Federico
Thanks! But does that mean that the site isn't visible within the archive? Francis
On Fri, 17 Apr 2020 at 16:58, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Francis Franck, 17/04/20 17:14:
I'm prepared to archive it on archive.org http://archive.org but I've not yet understood how to proceed with a site that has several subdirs to store images and other special pages.
Please archive the XML, the HTML is not very useful. If you want to add the HTML for reference, just make a zip/tar/7z of it.
Federico
Francis Franck, 17/04/20 18:26:
Thanks! But does that mean that the site isn't visible within the archive?
If you upload the XML, every page can be reconstructed. If you upload zipped HTML, something is going to be missing, but whatever you've uploaded will be visible in the viewer like this: https://archive.org/download/wiki-book-worm-books-books-books.wikispaces.com/book-worm-books-books-books.wikispaces.com.zip/
Federico
Ok, thanks. But what should I do with images and PDF files? https://archive.org/details/Drebbel.xml
On Fri, 17 Apr 2020 at 18:47, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Francis Franck, 17/04/20 18:26:
Thanks! But does that mean that the site isn't visible within the
archive?
If you upload the XML, every page can be reconstructed. If you upload zipped HTML, something is going to be missing, but whatever you've uploaded will be visible in the viewer like this: < https://archive.org/download/wiki-book-worm-books-books-books.wikispaces.com...
Federico
Francis Franck, 17/04/20 22:00:
Ok, thanks. But what should I do with images and PDF files? https://archive.org/details/Drebbel.xml
Please check the WikiTeam tutorial https://github.com/WikiTeam/wikiteam/wiki/Tutorial and ask on wikiteam-discuss mailing list if you have further questions. We're getting too off topic here (sorry everyone for the detour).
Federico
wikitext-l@lists.wikimedia.org