Download a wiki?

List overview All Threads
Download

newer

older

[mediawiki/core CR request]...

⠠⠺⠓⠁⠞⠄⠎ ⠍⠁⠅⠊⠝⠛ ⠽⠕⠥ ⠓⠁⠏⠏⠽ ⠞⠓⠊⠎...

Bart Humphries

19 May 2018 19 May '18

2:26 a.m.

We have a very old wiki which has basically never been updated for the past decade and which was proving stubbornly resistant to updating several years ago. And now the owner of the server has drifted away, but we do still have control over the domain name itself. The best way that we can think of to update everything is to scrape all of the pages/file, add them to a brand new updated wiki on a new server, then point the domain to that new server. Yes, user accounts will be broken, but we feel that this is the most feasible solution unless someone else has another idea.

However, there's a lot of pages on meritbadge.org -- which is the wiki I'm talking about. Any suggestions for how to automate this scraping process? I can scrape the HTML off every page, but what I really want is to get the wikitext off of every page.

Bart Humphries bart.humphries@gmail.com (909)529-BART(2278)

Show replies by date

Martin Urbanec

19 May 19 May

4:30 a.m.

You can use http://meritbadge.org/wiki/index.php/Special:Export, just add *all* pages (API call/Special:AllPages/similar solution) to the textbox, uncheck "Include only the current revision, not full history" to have full history, save the file prepared. Then you should buy a new server, put the dump on the server and call https://www.mediawiki.org/wiki/Manual:ImportDump.php. You will have new wiki with old content. If you want to go for hosting, you should use Special:Import as the reverse solution.

Hope that helps.

Best, Martin

pá 18. 5. 2018 v 20:27 odesílatel Bart Humphries bart.humphries@gmail.com napsal:

...

We have a very old wiki which has basically never been updated for the past decade and which was proving stubbornly resistant to updating several years ago. And now the owner of the server has drifted away, but we do still have control over the domain name itself. The best way that we can think of to update everything is to scrape all of the pages/file, add them to a brand new updated wiki on a new server, then point the domain to that new server. Yes, user accounts will be broken, but we feel that this is the most feasible solution unless someone else has another idea.

However, there's a lot of pages on meritbadge.org -- which is the wiki I'm talking about. Any suggestions for how to automate this scraping process? I can scrape the HTML off every page, but what I really want is to get the wikitext off of every page.

Bart Humphries bart.humphries@gmail.com (909)529-BART(2278) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Federico Leva (Nemo)

5:32 a.m.

You're in luck: just now I was looking for test cases for the new version of dumpgenerator.py: https://github.com/WikiTeam/wikiteam/issues/311

I've made a couple changes and for tomorrow you should see a new XML dump at https://archive.org/download/wiki-meritbadgeorg_wiki (there's also https://archive.org/download/wiki-meritbadgeorg-20151017, not mine )

Federico

Bart Humphries

12:52 p.m.

Great, thanks!

I have a convention this weekend, so it'll probably be Monday evening/Tuesday before I can really do anything else with that dump.

On Fri, May 18, 2018, 3:32 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

You're in luck: just now I was looking for test cases for the new version of dumpgenerator.py: https://github.com/WikiTeam/wikiteam/issues/311

I've made a couple changes and for tomorrow you should see a new XML dump at https://archive.org/download/wiki-meritbadgeorg_wiki (there's also https://archive.org/download/wiki-meritbadgeorg-20151017, not mine )

Federico

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Emilio J. Rodríguez-Posada

20 May 20 May

5:20 p.m.

WikiTeam bat signal.

Dump delivered.

2018-05-19 6:52 GMT+02:00 Bart Humphries bart.humphries@gmail.com:

...

Great, thanks!

I have a convention this weekend, so it'll probably be Monday evening/Tuesday before I can really do anything else with that dump.

On Fri, May 18, 2018, 3:32 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
You're in luck: just now I was looking for test cases for the new version of dumpgenerator.py: https://github.com/WikiTeam/wikiteam/issues/311

I've made a couple changes and for tomorrow you should see a new XML dump at https://archive.org/download/wiki-meritbadgeorg_wiki (there's also https://archive.org/download/wiki-meritbadgeorg-20151017, not mine )

Federico

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2388

Age (days ago)

2390

Last active (days ago)

wikitech-l@lists.wikimedia.org

4 comments

4 participants

tags (0)

participants (4)

Bart Humphries
Emilio J. Rodríguez-Posada
Federico Leva (Nemo)
Martin Urbanec