I'm porting a 28k page Wikidot wiki to mediawiki. As wikdot is case insensitive, I'm generating lots of broken links to "SF fandom" as the mediawiki page is created as "SF Fandom".
Now I could try creating case-changing redirects, but the pywikibot script capitalize_redirects doesn't do what I want, as it would create a redirect "Sf fandom", which is no use, and would create vast numbers of redirects I didn't need.
What I really need is a tool that spots a broken link "SF fandom", realises there is a page "SF Fandom" that would match if case insensitive, and changes the link to be [[SF Fandom|SF fandom]]
Does such a script exist? With so many pages it would have to be automatic, and I could cope if it picked the wrong one of 2 capitalisation choices.
John
Hi John,
Unfortunately, I don't think we have a script for this specific case, but if you have some Python experience, it should be possible to set this up. It should also be possible using a combination of listpage.py, a text editor (or spreadsheet) and replace.py.
Roughly, I would try the following: - Get a list of all imported page titles (with capitalization) using a pagegenerator - Make a dict of lowercase page title to actual page title (e.g. {"sf fandom": "SF Fandom"}) - Loop over all pages (or all pages with broken links), and use textlib.replace_links to perform the replacement
The listpages-texteditor-replace alternative might break some pages due to it using text replacement, but it might be good enough. Same idea: - Get a list of all imported page titles using listpages.py - In a text editor, transform each page name into a regular expression replacement. For example, SF Fandom would become something like "[[(sf fandom)]]" "[[SF Fandom|\1]]" Store those in a file, e.g. replacements.txt. (I'm not sure about the amount of escaping here). - Run replace.py with the -nocase -regex -pairsfile:replacements.txt
With this regex, you will not match any situations where a link text is present; I think you'll need a second regex pair to fix those as well.
Merlijn
On Thu, 10 Jan 2019 at 09:59, John Bray johnbray822@gmail.com wrote: