Re: [Wikitext-l] (no subject)

List overview All Threads
Download

newer

older

Visualeditor issues due to nginx???

(no subject)

Arlo Breault

13 Apr 2020 13 Apr '20

5:35 p.m.

...

On Apr 13, 2020, at 3:55 PM, Francis Franck francis.franck@gmail.com wrote:

Thank you for your reaction .

...
It sounds like you're trying to produce a static copy of your site for backup or offline use?

Indeed, my intention is to create a standalone copy of my wiki for archive purposes. Because of its relation with the cities' history I would like to get the wiki stored in the Archive of the city of Alkmaar but the problem is that they are not equipped for a Mediawiki-oriented structure. The require a fully html oriented version.

...
If so, maybe see the project here https://github.com/openzim/mwoffliner or this previous discussion https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000994.html

I have tested mwoffliner but that doesn't incorporate sidebars and seem to have problems with categories and namespaces. I had the impression that Parsoid was much better equipped to make a thrustworthy copy.

mwoffliner builds on the Parsoid output.

Parsoid only handles the page content, not the chrome, so it won't give you the sidebar, etc.

Please see above linked discussion for some ideas on how to make html dumps of your wiki.

Particularly, https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000997.html https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000998.html

Show replies by date

Francis Franck

14 Apr 14 Apr

5:32 p.m.

New subject: (no subject)

Up to now I've got the best results with HTTrack. It Includes the sidebar on most of the pages. Just a pity that it doesn't show pictures that have their size adapted (e.g. [File:Francis Franck.jpeg|left|200x640px| ]).

It think we can close the case here. Thanks again for your reactions.

On Mon, 13 Apr 2020 at 22:35, Arlo Breault abreault@wikimedia.org wrote:

...

...
On Apr 13, 2020, at 3:55 PM, Francis Franck francis.franck@gmail.com

wrote:

...
Thank you for your reaction .

...
It sounds like you're trying to produce a static copy of your site for

backup or offline use?

...
Indeed, my intention is to create a standalone copy of my wiki for

archive purposes. Because of its relation with the cities' history I would like to get the wiki stored in the Archive of the city of Alkmaar but the problem is that they are not equipped for a Mediawiki-oriented structure. The require a fully html oriented version.

...
...
If so, maybe see the project here https://github.com/openzim/mwoffliner

or this previous discussion https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000994.html

...
I have tested mwoffliner but that doesn't incorporate sidebars and seem

to have problems with categories and namespaces. I had the impression that Parsoid was much better equipped to make a thrustworthy copy.

mwoffliner builds on the Parsoid output.

Parsoid only handles the page content, not the chrome, so it won't give you the sidebar, etc.

Please see above linked discussion for some ideas on how to make html dumps of your wiki.

Particularly, https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000997.html https://lists.wikimedia.org/pipermail/wikitext-l/2020-February/000998.html

Federico Leva (Nemo)

5:40 p.m.

New subject: (no subject)

Francis Franck, 14/04/20 23:32:

...

Up to now I've got the best results with HTTrack. It Includes the sidebar on most of the pages.

If your skin is simple enough to work correctly as saved by HTTrack, we lucky! The DumpHTML extension's main trouble was basically how to adapt the skin. Another possibility is to use wget or wpull and save a WARC file, to be then served with warc-proxy or similar. That's the standard for web preservation.

It might be obvious, but don't forget to archive a full dump on archive.org. https://www.mediawiki.org/wiki/Manual:Backup

If you're the same person who asked about this on the #archiveteam and #wikiteam channels, sorry for the repetition.

Federico

Francis Franck

17 Apr 17 Apr

11:14 a.m.

New subject: (no subject)

Thanks, I'll check it out. Most of my site is now in .html I'm prepared to archive it on archive.org but I've not yet understood how to proceed with a site that has several subdirs to store images and other special pages.

Kind regards, Francis

On Tue, 14 Apr 2020 at 22:40, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Francis Franck, 14/04/20 23:32:

...
Up to now I've got the best results with HTTrack. It Includes the sidebar on most of the pages.

If your skin is simple enough to work correctly as saved by HTTrack, we lucky! The DumpHTML extension's main trouble was basically how to adapt the skin. Another possibility is to use wget or wpull and save a WARC file, to be then served with warc-proxy or similar. That's the standard for web preservation.

It might be obvious, but don't forget to archive a full dump on archive.org. https://www.mediawiki.org/wiki/Manual:Backup

If you're the same person who asked about this on the #archiveteam and #wikiteam channels, sorry for the repetition.

Federico

Federico Leva (Nemo)

11:58 a.m.

New subject: (no subject)

Francis Franck, 17/04/20 17:14:

...

I'm prepared to archive it on archive.org http://archive.org but I've not yet understood how to proceed with a site that has several subdirs to store images and other special pages.

Please archive the XML, the HTML is not very useful. If you want to add the HTML for reference, just make a zip/tar/7z of it.

Federico

Francis Franck

12:26 p.m.

New subject: (no subject)

Thanks! But does that mean that the site isn't visible within the archive? Francis

On Fri, 17 Apr 2020 at 16:58, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Francis Franck, 17/04/20 17:14:

...
I'm prepared to archive it on archive.org http://archive.org but I've not yet understood how to proceed with a site that has several subdirs to store images and other special pages.

Please archive the XML, the HTML is not very useful. If you want to add the HTML for reference, just make a zip/tar/7z of it.

Federico

Federico Leva (Nemo)

1:47 p.m.

New subject: (no subject)

Francis Franck, 17/04/20 18:26:

...

Thanks! But does that mean that the site isn't visible within the archive?

If you upload the XML, every page can be reconstructed. If you upload zipped HTML, something is going to be missing, but whatever you've uploaded will be visible in the viewer like this: https://archive.org/download/wiki-book-worm-books-books-books.wikispaces.com/book-worm-books-books-books.wikispaces.com.zip/

Federico

Francis Franck

4 p.m.

New subject: (no subject)

Ok, thanks. But what should I do with images and PDF files? https://archive.org/details/Drebbel.xml

On Fri, 17 Apr 2020 at 18:47, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Francis Franck, 17/04/20 18:26:

...
Thanks! But does that mean that the site isn't visible within the

archive?

If you upload the XML, every page can be reconstructed. If you upload zipped HTML, something is going to be missing, but whatever you've uploaded will be visible in the viewer like this: < https://archive.org/download/wiki-book-worm-books-books-books.wikispaces.com...

...
Federico

Federico Leva (Nemo)

4:20 p.m.

New subject: (no subject)

Francis Franck, 17/04/20 22:00:

...

Ok, thanks. But what should I do with images and PDF files? https://archive.org/details/Drebbel.xml

Please check the WikiTeam tutorial https://github.com/WikiTeam/wikiteam/wiki/Tutorial and ask on wikiteam-discuss mailing list if you have further questions. We're getting too off topic here (sorry everyone for the detour).

Federico

1715

Age (days ago)

1719

Last active (days ago)

wikitext-l@lists.wikimedia.org

8 comments

3 participants

tags (0)

participants (3)

Arlo Breault
Federico Leva (Nemo)
Francis Franck