On 01/13/2016 09:09 AM, Chris Adams wrote:
I've been working with a number of colleagues getting ready to turn HTTPS on by default for various loc.gov domains. This has been fairly successful and we're working through the old legacy apps now.
Awesome!
When that work completes, we'll have somewhere around half a million links which differ only in the URL scheme. What would be the best way to rewrite all of those URLs? I'd like to reduce the window during which users transit from HTTPS -> HTTP -> HTTPS.
You can use Pywikbot's replace.py[1], which lets you provide regex find/replace and can get a list of pages from the API equivalent of Special:LinkSearch.
You should also consider setting up HSTS[2] so regardless if users click on an HTTP link, they'll be sent to the HTTPS version of the site.
[1] https://www.mediawiki.org/wiki/Manual:Pywikibot/replace.py [2] https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security
-- Legoktm