In anticipation for proper HTTPS support on Wikimedia sites, we will be enabling protocol relative URLs. This means that things like logos, references to resources, and interwiki links will use links like:
//en.wikipedia.org/wiki/Main_Page
instead of links like:
http://en.wikipedia.org/wiki/Main_Page
Doing this ensures that we don't split our squid/varnish caches.
We'll enable this for test.wikipedia.org some time before enabling it globally, so that there will be at least a short time to test beforehand.
- Ryan
On Fri, Jun 17, 2011 at 2:41 PM, Ryan Lane rlane32@gmail.com wrote:
In anticipation for proper HTTPS support on Wikimedia sites, we will be enabling protocol relative URLs. This means that things like logos, references to resources, and interwiki links will use links like:
//en.wikipedia.org/wiki/Main_Page
instead of links like:
http://en.wikipedia.org/wiki/Main_Page
Doing this ensures that we don't split our squid/varnish caches.
We'll enable this for test.wikipedia.org some time before enabling it globally, so that there will be at least a short time to test beforehand.
*squee* :DD
I'm soooo looking forward to consistent https on the main domains! :DDD
Note there's a possibility that this URL tweak may affect some JavaScript-side code or client-side spiders/bots that may expect fully-qualified links or do relative link resolution incorrectly, so everybody keep an eye out for potential issues in your code. Previous testing has been pretty favorable about actual browser support.
-- brion
On Fri, Jun 17, 2011 at 11:41 PM, Ryan Lane rlane32@gmail.com wrote:
We'll enable this for test.wikipedia.org some time before enabling it globally, so that there will be at least a short time to test beforehand.
For completeness: this, both enabling it on testwiki and enabling it sitewide, will be going down some time in early or mid-July, so that's almost a month away. Ryan and I are just planning and announcing this freakishly early :)
Roan Kattouw (Catrope)
On Fri, Jun 17, 2011 at 3:29 PM, Roan Kattouw roan.kattouw@gmail.comwrote:
On Fri, Jun 17, 2011 at 11:41 PM, Ryan Lane rlane32@gmail.com wrote:
We'll enable this for test.wikipedia.org some time before enabling it globally, so that there will be at least a short time to test beforehand.
For completeness: this, both enabling it on testwiki and enabling it sitewide, will be going down some time in early or mid-July, so that's almost a month away. Ryan and I are just planning and announcing this freakishly early :)
Is there a handy config snippet people can use to test things against their local installs in the meantime?
-- brion
On Sat, Jun 18, 2011 at 12:32 AM, Brion Vibber brion@wikimedia.org wrote:
Is there a handy config snippet people can use to test things against their local installs in the meantime?
Afraid not. We have to change hardcoded http:// prefixes in a million different places, see http://wikitech.wikimedia.org/view/Https#Protocol-relative_URLs . The logos are largely monkey work (regex them, then get a team to verify 100s of hopefully-not-broken logos), but other things are much more sensitive. It'd suck to break the setting that determines where the uploaded files on Commons live (http://upload.wikimedia.org/site/lang/etc) or where our static assets and RL things are served from (bits), especially if broken URLs get cached in Squid and/or memcached. This is why we're scheduling this thing this far out when Ryan and I will both be available to very carefully go about this.
Roan Kattouw (Catrope)
On Fri, Jun 17, 2011 at 3:42 PM, Roan Kattouw roan.kattouw@gmail.comwrote:
On Sat, Jun 18, 2011 at 12:32 AM, Brion Vibber brion@wikimedia.org wrote:
Is there a handy config snippet people can use to test things against
their
local installs in the meantime?
Afraid not. We have to change hardcoded http:// prefixes in a million different places, see http://wikitech.wikimedia.org/view/Https#Protocol-relative_URLs . The logos are largely monkey work (regex them, then get a team to verify 100s of hopefully-not-broken logos), but other things are much more sensitive. It'd suck to break the setting that determines where the uploaded files on Commons live (http://upload.wikimedia.org/site/lang/etc) or where our static assets and RL things are served from (bits), especially if broken URLs get cached in Squid and/or memcached. This is why we're scheduling this thing this far out when Ryan and I will both be available to very carefully go about this.
Great that we have a list! :D
Do make sure that all of those individual settings get tested before touching the production cluster; I'd be particularly worried about the possibility of exposing '//domain/something' style URLs into output that requires a fully-clickable link.
It looks like wfExpandUrl() *does* correctly expand protocol-relative URLs against the current $wgServer setting, so a lot of things that already expect to handle path-relative URLs should already be fine.
But some like the interwikis may be assuming fully-qualified URLs; for instance on API results those should probably return fully-qualified URLs but they're probably not running through wfExpandURL.
ApiQueryIWLinks for instance returns interwiki links retrieved via $title->getFullURL(); for interwiki cases this takes whatever was returned by $interwiki->getURL() (which would in this case presumably be the '// fr.wikipedia.org/wiki/$1' form) and passes it back, without making any attempt to expand it further.
This'll probably need refactoring to that getLocalURL() handles the regular interwiki URLs, then in getFullURL() we go ahead and call that and pass the result through wfExpandUrl().
-- brion
On Sat, Jun 18, 2011 at 12:56 AM, Brion Vibber brion@pobox.com wrote:
Great that we have a list! :D
Do make sure that all of those individual settings get tested before touching the production cluster;
Well that's what the testwiki thing is for :)
I'd be particularly worried about the possibility of exposing '//domain/something' style URLs into output that requires a fully-clickable link.
I don't quite get this -- the example below refers to API output, and you're right we need to really output absolute URLs there (good catch!), but I don't see any other scenario where outputting protocol-relative URLs would be bad. Maybe RSS feeds, too, and non-UI things in general, but really everything in the UI ought to be fair game, right?
It looks like wfExpandUrl() *does* correctly expand protocol-relative URLs against the current $wgServer setting, so a lot of things that already expect to handle path-relative URLs should already be fine.
But some like the interwikis may be assuming fully-qualified URLs; for instance on API results those should probably return fully-qualified URLs but they're probably not running through wfExpandURL.
Good point, but I'm more worried about the reverse TBH: what if there's some code that runs stuff through wfExpandUrl() even though it could, and maybe even should, be protocol-relative?
Roan Kattouw (Catrope)
On Fri, Jun 17, 2011 at 4:02 PM, Roan Kattouw roan.kattouw@gmail.comwrote:
On Sat, Jun 18, 2011 at 12:56 AM, Brion Vibber brion@pobox.com wrote:
Great that we have a list! :D
Do make sure that all of those individual settings get tested before touching the production cluster;
Well that's what the testwiki thing is for :)
IIRC testwiki shares a lot of infrastructure and common configuration with the rest of the sites, so unless it's been given an isolated set of config files & interwiki database files, still be careful. :)
I'd be particularly worried about the possibility of exposing '//domain/something' style URLs into output that requires a fully-clickable link.
I don't quite get this -- the example below refers to API output, and you're right we need to really output absolute URLs there (good catch!), but I don't see any other scenario where outputting protocol-relative URLs would be bad. Maybe RSS feeds, too, and non-UI things in general, but really everything in the UI ought to be fair game, right?
Almost everything in web UI should be pretty much fine yes -- I really mean things where the URL ends up someplace *outside* a browser, directly or indirectly, and then needs to be resolved and used without the browser's context. Places you definitely want some full URLs include:
* api * additional specialty APIs (say, movie embedding links) * feeds (in at least some places) * email * export XML that contains URLs for files or pages
There may be other things that we just haven't thought of, which is why I wanted to raise it.
But some like the interwikis may be assuming fully-qualified URLs; for instance on API results those should probably return fully-qualified URLs but they're probably not running through wfExpandURL.
Good point, but I'm more worried about the reverse TBH: what if there's some code that runs stuff through wfExpandUrl() even though it could, and maybe even should, be protocol-relative?
Also worth checking for! I'm less worried about those though since the worst case result is "status quo". :)
-- brion
On Sat, Jun 18, 2011 at 3:11 AM, Brion Vibber brion@pobox.com wrote:
What about user preference for cases like this? With four options: "Prefer HTTP", "Prefer HTTPS", "Force HTTP", "Force HTTPS".
--vvv
On Fri, Jun 17, 2011 at 4:15 PM, Victor Vasiliev vasilvv@gmail.com wrote:
On Sat, Jun 18, 2011 at 3:11 AM, Brion Vibber brion@pobox.com wrote:
What about user preference for cases like this? With four options: "Prefer HTTP", "Prefer HTTPS", "Force HTTP", "Force HTTPS".
The current case is that you get whatever was the current setting on the web server at the time the email got formatted (which usually means, either HTTP or HTTPS depending on *someone else's usage*) or it defaults to HTTP for something formatted from default.
Currently MediaWiki doesn't have a native notion of multiple hostname/protocol availability and it's done by lightly patching configuration at runtime when loaded over HTTPS. Unless something is added, that'll continue in the same way.
My personal preference would be to run *all* logged-in activity over HTTPS, so every mail link etc should be on SSL. But I think that's still a ways out yet and will need better SSL acceleration; poor Ryan Lane will kill me if I keep pushing on that too soon! ;)
-- brion
On Fri, Jun 17, 2011 at 4:20 PM, Brion Vibber brion@pobox.com wrote:
[snip] or it defaults to HTTP for something formatted from default.
s/default/background job running on CLI/
:P
-- brion
My personal preference would be to run *all* logged-in activity over HTTPS, so every mail link etc should be on SSL. But I think that's still a ways out yet and will need better SSL acceleration; poor Ryan Lane will kill me if I keep pushing on that too soon! ;)
Actually, this is exactly what I want. I think we can do it fairly cheaply, but before I commit to that I'd like to test the cluster thoroughly.
One thing to note about this cluster is that is a SSL termination cluster, and as such, MediaWiki will have no idea that the user is coming via HTTPS in the normal way. The SSL termination cluster will set a header to indicate the user is coming via HTTPS, so we'll need to deal with that on the MediaWiki side so that we send secure cookies.
There's a bunch of things that we should likely do in the future as well. We should likely set a non-secure cookie for HTTPS logged in users that indicates the user requests HTTPS only (via a preference, enabled by default), that will redirect them to HTTPS if they somehow arrive at an HTTP page. Strict Transport Security (STS) should also be a consideration at some point in time, at least for users that have already logged in. This doesn't protect the user from initial site spoofing attacks, but could protect against later spoofing attacks (thanks Aryeh for this idea).
I don't think we'll ever get to a point where we can/should use HTTPS for all anon users, but SPDY could be a consideration in the future for anons. After I finish HTTPS I may look at setting up SPDY for testing.
- Ryan
Ryan Lane wrote:
There's a bunch of things that we should likely do in the future as well. We should likely set a non-secure cookie for HTTPS logged in users that indicates the user requests HTTPS only (via a preference, enabled by default), that will redirect them to HTTPS if they somehow arrive at an HTTP page. Strict Transport Security (STS) should also be a consideration at some point in time, at least for users that have already logged in. This doesn't protect the user from initial site spoofing attacks, but could protect against later spoofing attacks (thanks Aryeh for this idea).
I don't think we'll ever get to a point where we can/should use HTTPS for all anon users, but SPDY could be a consideration in the future for anons. After I finish HTTPS I may look at setting up SPDY for testing.
These all sound like good ideas to investigate. Just make sure they're in Bugzilla at some point so they don't get lost in a mailman archive. :-) I think there's a tracking bug for https or secure login somewhere.
MZMcBride
On Sun, Jun 19, 2011 at 2:35 AM, MZMcBride z@mzmcbride.com wrote:
These all sound like good ideas to investigate. Just make sure they're in Bugzilla at some point so they don't get lost in a mailman archive. :-) I think there's a tracking bug for https or secure login somewhere.
MZMcBride
We even have a component for it now ;p
wikitech-l@lists.wikimedia.org