We have a very old wiki which has basically never been updated for the past decade and which was proving stubbornly resistant to updating several years ago. And now the owner of the server has drifted away, but we do still have control over the domain name itself. The best way that we can think of to update everything is to scrape all of the pages/file, add them to a brand new updated wiki on a new server, then point the domain to that new server. Yes, user accounts will be broken, but we feel that this is the most feasible solution unless someone else has another idea.
However, there's a lot of pages on meritbadge.org -- which is the wiki I'm talking about. Any suggestions for how to automate this scraping process? I can scrape the HTML off every page, but what I really want is to get the wikitext off of every page.
Bart Humphries bart.humphries@gmail.com (909)529-BART(2278)
You can use http://meritbadge.org/wiki/index.php/Special:Export, just add *all* pages (API call/Special:AllPages/similar solution) to the textbox, uncheck "Include only the current revision, not full history" to have full history, save the file prepared. Then you should buy a new server, put the dump on the server and call https://www.mediawiki.org/wiki/Manual:ImportDump.php. You will have new wiki with old content. If you want to go for hosting, you should use Special:Import as the reverse solution.
Hope that helps.
Best, Martin
pá 18. 5. 2018 v 20:27 odesílatel Bart Humphries bart.humphries@gmail.com napsal:
We have a very old wiki which has basically never been updated for the past decade and which was proving stubbornly resistant to updating several years ago. And now the owner of the server has drifted away, but we do still have control over the domain name itself. The best way that we can think of to update everything is to scrape all of the pages/file, add them to a brand new updated wiki on a new server, then point the domain to that new server. Yes, user accounts will be broken, but we feel that this is the most feasible solution unless someone else has another idea.
However, there's a lot of pages on meritbadge.org -- which is the wiki I'm talking about. Any suggestions for how to automate this scraping process? I can scrape the HTML off every page, but what I really want is to get the wikitext off of every page.
Bart Humphries bart.humphries@gmail.com (909)529-BART(2278) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
You're in luck: just now I was looking for test cases for the new version of dumpgenerator.py: https://github.com/WikiTeam/wikiteam/issues/311
I've made a couple changes and for tomorrow you should see a new XML dump at https://archive.org/download/wiki-meritbadgeorg_wiki (there's also https://archive.org/download/wiki-meritbadgeorg-20151017, not mine )
Federico
Great, thanks!
I have a convention this weekend, so it'll probably be Monday evening/Tuesday before I can really do anything else with that dump.
On Fri, May 18, 2018, 3:32 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:
You're in luck: just now I was looking for test cases for the new version of dumpgenerator.py: https://github.com/WikiTeam/wikiteam/issues/311
I've made a couple changes and for tomorrow you should see a new XML dump at https://archive.org/download/wiki-meritbadgeorg_wiki (there's also https://archive.org/download/wiki-meritbadgeorg-20151017, not mine )
Federico
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
WikiTeam bat signal.
Dump delivered.
2018-05-19 6:52 GMT+02:00 Bart Humphries bart.humphries@gmail.com:
Great, thanks!
I have a convention this weekend, so it'll probably be Monday evening/Tuesday before I can really do anything else with that dump.
On Fri, May 18, 2018, 3:32 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:
You're in luck: just now I was looking for test cases for the new version of dumpgenerator.py: https://github.com/WikiTeam/wikiteam/issues/311
I've made a couple changes and for tomorrow you should see a new XML dump at https://archive.org/download/wiki-meritbadgeorg_wiki (there's also https://archive.org/download/wiki-meritbadgeorg-20151017, not mine )
Federico
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org