On Wed, Jan 13, 2016 at 12:47 PM, Legoktm <legoktm.wikipedia(a)gmail.com>
wrote:
You can use Pywikbot's replace.py[1], which lets
you provide regex
find/replace and can get a list of pages from the API equivalent of
Special:LinkSearch.
Thanks - I'll look into that as we get various batches of URLs ready for
testing.
You should also consider setting up HSTS[2] so
regardless if users click
on an HTTP link, they'll be sent to the HTTPS version of the site.
Yes – that's on the plan as soon as we finishing remediating the older
legacy content. I've been using lists from Wikipedia, a sampling of web
access logs, etc. to feed a script[1] to find cases where someone used an
absolute URL in a <script> tag, etc. We have a couple of subdomains which
should be ready to HSTS quickly since they were only used for a single
application.
Chris
1.
https://github.com/acdha/phantomjs-mixed-content-scan