Secure access to Commons?

List overview All Threads
Download

newer

older

Re: [Wikitech-l] [MediaWiki-CVS]...

George Herbert

7 Sep 2008 7 Sep '08

12:33 a.m.

Query - Is there a secure (https) URL / access path in to Wiki Commons?

If not, is there a reason not, and can we fix that?

Thanks!

-- -george william herbert george.herbert@gmail.com

Show replies by date

Casey Brown

7 Sep 7 Sep

12:52 a.m.

On Sat, Sep 6, 2008 at 6:33 PM, George Herbert george.herbert@gmail.com wrote:

...

Query - Is there a secure (https) URL / access path in to Wiki Commons?

Yeah, and it follows the same format as the other ".wikimedia.org" sites (wikipedia/projectname). Here's the Commons gateway: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page.

-- Casey Brown Cbrown1023 --- Note: This e-mail address is used for mailing lists. Personal emails sent to this address will probably get lost.

George Herbert

1:29 a.m.

Excellent. Someone with the appropriate permissions might want to add that to the commons Special:UserLogin page, the same as the other wikis have a link...

Thanks!

On Sat, Sep 6, 2008 at 3:52 PM, Casey Brown cbrown1023.ml@gmail.com wrote:

...

On Sat, Sep 6, 2008 at 6:33 PM, George Herbert george.herbert@gmail.com wrote:

...
Query - Is there a secure (https) URL / access path in to Wiki Commons?

Yeah, and it follows the same format as the other ".wikimedia.org" sites (wikipedia/projectname). Here's the Commons gateway: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page.

-- Casey Brown Cbrown1023

Note: This e-mail address is used for mailing lists. Personal emails sent to this address will probably get lost.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- -george william herbert george.herbert@gmail.com

Casey Brown

2 a.m.

On Sat, Sep 6, 2008 at 7:29 PM, George Herbert george.herbert@gmail.com wrote:

...

Excellent. Someone with the appropriate permissions might want to add that to the commons Special:UserLogin page, the same as the other wikis have a link...

Thanks!

Done. http://commons.wikimedia.org/wiki/MediaWiki:Loginend

-- Casey Brown Cbrown1023 --- Note: This e-mail address is used for mailing lists. Personal emails sent to this address will probably get lost.

Max Semenik

8:07 a.m.

On 07.09.2008, 4:00 Casey wrote:

...

On Sat, Sep 6, 2008 at 7:29 PM, George Herbert george.herbert@gmail.com wrote:

...
Excellent. Someone with the appropriate permissions might want to add that to the commons Special:UserLogin page, the same as the other wikis have a link...

Thanks!

...

Done. http://commons.wikimedia.org/wiki/MediaWiki:Loginend

Are you sure that poor single secure server will handle that load?

-- Best regards, Max Semenik ([[User:MaxSem]])

Thomas Dalton

8:18 a.m.

2008/9/7 Max Semenik maxsem.wiki@gmail.com:

...

On 07.09.2008, 4:00 Casey wrote:

...
On Sat, Sep 6, 2008 at 7:29 PM, George Herbert george.herbert@gmail.com wrote:

...
Excellent. Someone with the appropriate permissions might want to add that to the commons Special:UserLogin page, the same as the other wikis have a link...

Thanks!

...
Done. http://commons.wikimedia.org/wiki/MediaWiki:Loginend

Are you sure that poor single secure server will handle that load?

If not, the devs will soon tell us.

Oldak Quill

1:59 a.m.

2008/9/6 Casey Brown cbrown1023.ml@gmail.com:

...

On Sat, Sep 6, 2008 at 6:33 PM, George Herbert george.herbert@gmail.com wrote:

...
Query - Is there a secure (https) URL / access path in to Wiki Commons?

Yeah, and it follows the same format as the other ".wikimedia.org" sites (wikipedia/projectname). Here's the Commons gateway: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page.

Could https://commons.wikimedia.org/ be made to redirect to that URL? Is there a good reason not to do this? I imagine some users test for secure access by typing https://commons.wikimedia.org/ into their browser. The same goes for the other Wikimedia projects.

-- Oldak Quill (oldakquill@gmail.com)

Platonides

11:29 p.m.

Oldak Quill wrote:

...

2008/9/6 Casey Brown :

...
On Sat, Sep 6, 2008 at 6:33 PM, George Herbert wrote:

...
Query - Is there a secure (https) URL / access path in to Wiki Commons?

Yeah, and it follows the same format as the other ".wikimedia.org" sites (wikipedia/projectname). Here's the Commons gateway: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page.

Could https://commons.wikimedia.org/ be made to redirect to that URL? Is there a good reason not to do this? I imagine some users test for secure access by typing https://commons.wikimedia.org/ into their browser. The same goes for the other Wikimedia projects.

For that you could just serve the pages directly on https://commons.wikimedia.org/ One of the main reasons of having just one secure server is having just one ssl cert. To do that you'd need at least one cert per top-level domain. The other problem and the reason through ssl is a bit flaky is the problem of how to load balance ciphered connections. That's why there's only one server (bart) to handle secure.wikimedia.org (hits are then proxied to the internal load balancer).

Gregory Maxwell

2:30 a.m.

On Sat, Sep 6, 2008 at 6:52 PM, Casey Brown cbrown1023.ml@gmail.com wrote:

...

On Sat, Sep 6, 2008 at 6:33 PM, George Herbert george.herbert@gmail.com wrote:

...
Query - Is there a secure (https) URL / access path in to Wiki Commons?

Yeah, and it follows the same format as the other ".wikimedia.org" sites (wikipedia/projectname). Here's the Commons gateway: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page.

Might be worth noting, especially for commons, that it's not all that secure: The Images are sent over insecure HTTP, so anyone watching your traffic can tell what you're looking at.

Oldak Quill

4:02 a.m.

2008/9/7 Gregory Maxwell gmaxwell@gmail.com:

...

On Sat, Sep 6, 2008 at 6:52 PM, Casey Brown cbrown1023.ml@gmail.com wrote:

...
On Sat, Sep 6, 2008 at 6:33 PM, George Herbert george.herbert@gmail.com wrote:

...
Query - Is there a secure (https) URL / access path in to Wiki Commons?

Yeah, and it follows the same format as the other ".wikimedia.org" sites (wikipedia/projectname). Here's the Commons gateway: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page.

Might be worth noting, especially for commons, that it's not all that secure: The Images are sent over insecure HTTP, so anyone watching your traffic can tell what you're looking at.

Perhaps we should add a red banner to the top of every page accessed through the secure gateway that images are not secure.

-- Oldak Quill (oldakquill@gmail.com)

Aryeh Gregor

4:06 a.m.

On Sat, Sep 6, 2008 at 10:02 PM, Oldak Quill oldakquill@gmail.com wrote:

...

Perhaps we should add a red banner to the top of every page accessed through the secure gateway that images are not secure.

Browsers will typically inform the user that some parts of the page are not secured, and should include visual cues too (like not presenting a padlock icon in the URL bar).

Brion Vibber

8 Sep 8 Sep

8:33 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Aryeh Gregor wrote:

...

On Sat, Sep 6, 2008 at 10:02 PM, Oldak Quill oldakquill@gmail.com wrote:

...
Perhaps we should add a red banner to the top of every page accessed through the secure gateway that images are not secure.

Browsers will typically inform the user that some parts of the page are not secured, and should include visual cues too (like not presenting a padlock icon in the URL bar).

Interestingly, Firefox at least doesn't seem to care about the images being loaded from an insecure server.

It *will* whinge about JavaScript being loaded that way, however.

Note that while loading of images over HTTP may reveal viewed pages (via referers, just like clicking on an external link will) it won't reveal passwords or session cookies.

- -- brion

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjFb/QACgkQwRnhpk1wk44jZwCguBoJ/fwPeBg8ZR3XlftKrXS9 lMEAoN9WLNNg246by+7FV55hksLQm0Nx =FBsh -----END PGP SIGNATURE-----

Gregory Maxwell

9:04 p.m.

On Mon, Sep 8, 2008 at 2:33 PM, Brion Vibber brion@wikimedia.org wrote:

...

Interestingly, Firefox at least doesn't seem to care about the images being loaded from an insecure server.

It *will* whinge about JavaScript being loaded that way, however.

Note that while loading of images over HTTP may reveal viewed pages (via referers, just like clicking on an external link will) it won't reveal passwords or session cookies.

On this subject, as part of the IPv6 testing I've run a JS tester on ENWP for a couple of months now which has determined that for hosts able to run the JS tester, protocol relative urls (i.e. <img src="//upload.wikimedia.org/foo.jpg"/>) work for all clients.

If protocol relatives turn out to be universally supported they would remove one problem from doing a native SSL deployment.

I can't comment on compatibility with clients that do not support javascript / don't execute the v6 test for some other reason.

Ilmari Karonen

9 Sep 9 Sep

9:24 p.m.

New subject: Protocol-relative URLs (was: Secure access to Commons?)

Gregory Maxwell wrote:

...

On this subject, as part of the IPv6 testing I've run a JS tester on ENWP for a couple of months now which has determined that for hosts able to run the JS tester, protocol relative urls (i.e. <img src="//upload.wikimedia.org/foo.jpg"/>) work for all clients.

If protocol relatives turn out to be universally supported they would remove one problem from doing a native SSL deployment.

I can't comment on compatibility with clients that do not support javascript / don't execute the v6 test for some other reason.

So, could we/you just add a couple of small images to some unobtrusive place in the MonoBook skin on enwiki, with one using a protocol-relative URL and the other not, and see what happens?

-- Ilmari Karonen

Gregory Maxwell

9:52 p.m.

New subject: Protocol-relative URLs (was: Secure access to Commons?)

On Tue, Sep 9, 2008 at 3:24 PM, Ilmari Karonen nospam@vyznev.net wrote:

...

Gregory Maxwell wrote:

...
On this subject, as part of the IPv6 testing I've run a JS tester on ENWP for a couple of months now which has determined that for hosts able to run the JS tester, protocol relative urls (i.e. <img src="//upload.wikimedia.org/foo.jpg"/>) work for all clients.

If protocol relatives turn out to be universally supported they would remove one problem from doing a native SSL deployment.

I can't comment on compatibility with clients that do not support javascript / don't execute the v6 test for some other reason.

So, could we/you just add a couple of small images to some unobtrusive place in the MonoBook skin on enwiki, with one using a protocol-relative URL and the other not, and see what happens?

I've basically done this. Not in the monobook skin, but injected via JS for 1:100 requests along with the IPv6 test. The unambiguous result is that it works.

The open question is: "does it work for more primitive clients which do not support JS?" We could use non-JS images, but we'd have no way to get error reports.

The only further test I could really see doing would be doing something like using them for all images and seeing if people start reporting "Wikipedia has no images anymore on my cell phone!", but caching means that if it does cause problems we can't instantly revert.

Good ideas welcome.

Ilmari Karonen

10 Sep 10 Sep

12:17 a.m.

New subject: Protocol-relative URLs

Gregory Maxwell wrote:

...

On Tue, Sep 9, 2008 at 3:24 PM, Ilmari Karonen nospam@vyznev.net wrote:

...
So, could we/you just add a couple of small images to some unobtrusive place in the MonoBook skin on enwiki, with one using a protocol-relative URL and the other not, and see what happens?

I've basically done this. Not in the monobook skin, but injected via JS for 1:100 requests along with the IPv6 test. The unambiguous result is that it works.

The open question is: "does it work for more primitive clients which do not support JS?" We could use non-JS images, but we'd have no way to get error reports.

I was thinking of having two images, one using a protocol-relative URL and the other not. Log the requests for each image and compare the logs: any client that consistently requests only the image with the explicit protocol probably doesn't support protocol-relative URLs.

-- Ilmari Karonen

Aryeh Gregor

2:21 a.m.

New subject: Protocol-relative URLs

On Tue, Sep 9, 2008 at 6:17 PM, Ilmari Karonen nospam@vyznev.net wrote:

...

I was thinking of having two images, one using a protocol-relative URL and the other not. Log the requests for each image and compare the logs: any client that consistently requests only the image with the explicit protocol probably doesn't support protocol-relative URLs.

Oops, my suggestion was two hours late. Maybe I rely too much on Gmail's auto-threading (which didn't catch this, since it changed the subject).

Platonides

1:22 a.m.

New subject: Protocol-relative URLs

Gregory Maxwell wrote:

...

I've basically done this. Not in the monobook skin, but injected via JS for 1:100 requests along with the IPv6 test. The unambiguous result is that it works.

The open question is: "does it work for more primitive clients which do not support JS?" We could use non-JS images, but we'd have no way to get error reports.

The only further test I could really see doing would be doing something like using them for all images and seeing if people start reporting "Wikipedia has no images anymore on my cell phone!", but caching means that if it does cause problems we can't instantly revert.

Good ideas welcome.

You could use some CSS like cursor: url(buggy-css-support.cur); cursor: url(//supported-browser.cur) url(http://broken-browser.cur) default;

Sadly, cursor seems to be the only CSS property allowing fallback. It is also quite an advanced (useless) and annoying property, so old browsers are also likely to just skip it.

It would be nice if we were able to do that with a <img> tag. We can emulate it by having two elements on the same position with an higher z-order for the protocol-relative one, but user agents will load both, so you would still need to rely on users reporting that they saw an image saying "Your browser doesn't support protocol-relative uris, tell us at X!"

Aryeh Gregor

2:14 a.m.

New subject: Protocol-relative URLs (was: Secure access to Commons?)

On Tue, Sep 9, 2008 at 3:52 PM, Gregory Maxwell gmaxwell@gmail.com wrote:

...

The open question is: "does it work for more primitive clients which do not support JS?" We could use non-JS images, but we'd have no way to get error reports.

How about using a few images, some with protocol relatives and some without, and check the request logs for unpaired requests? You'd get some false positives, but if you provided, say, two with protocol relatives and two without, or five with and five without, you'd be pretty unlikely to get anything that happens to get the requests through for *exactly* the non-protocol-relative ones but none of the others. How to get this to work with the caching layer, since it requires serving different HTML to a small sample, is probably something you have better ideas for than I do. Send different version depending on which Squid is requesting it?

Alternatively, if it's rolled out for actual content and screw anyone who doesn't support it, that could be done incrementally, with the percentage of protocol-relative links starting out extremely small and slowly increasing over time if no complaints come in. Then the damage should be as minimal as possible. Or the protocol-relative links could be deployed deterministically based on page name, and then if there's a problem all the affected pages could be purged (but you'd still want to start slow to avoid having to do huge purges).

Ilmari Karonen

2:27 a.m.

New subject: Protocol-relative URLs

Aryeh Gregor wrote:

...

Alternatively, if it's rolled out for actual content and screw anyone who doesn't support it, that could be done incrementally, with the percentage of protocol-relative links starting out extremely small and slowly increasing over time if no complaints come in. Then the damage should be as minimal as possible. Or the protocol-relative links could be deployed deterministically based on page name, and then if there's a problem all the affected pages could be purged (but you'd still want to start slow to avoid having to do huge purges).

I think the next step, after the log comparison test we both suggested, would be to set $wgLogo to a protocol-relative URL. A missing logo wouldn't actually break anything, but you _bet_ people would notice it.

-- Ilmari Karonen

Aryeh Gregor

2:58 a.m.

New subject: Protocol-relative URLs

On Tue, Sep 9, 2008 at 8:27 PM, Ilmari Karonen nospam@vyznev.net wrote:

...

I think the next step, after the log comparison test we both suggested, would be to set $wgLogo to a protocol-relative URL. A missing logo wouldn't actually break anything, but you _bet_ people would notice it.

Now that's a simple, elegant, effective idea. It would require almost no effort, hurt no one, and give immediate feedback. The only catch with this, as with other image-based proposals, is that a client that doesn't support images (as well as, in the case of the logo, CSS) won't be picked up. But I don't see much help for that. There are few enough of those anyway. lynx does support them, I just checked.

One interesting catch, though, that I just noticed when testing. What happens when someone downloads the HTML to their hard drive and views it locally? lynx assumes FTP as the protocol, which is completely wrong. Firefox, Opera, Konqueror, and Chrome all try to use the *file://* protocol -- which is absolutely reasonable but absolutely terrible for us.

So this means that anyone who tries to save a Wikipedia page using protocol-relative URLs to their hard drive will find that all the relevant links are broken. This is, obviously, not a good thing. I can't see any conceivable workaround, and if there is none I don't see any way we (or anyone) can use protocol-relative URLs. Being able to save web pages locally is pretty basic and important functionality that a lot of people must be relying on.

Gregory Maxwell

5:15 a.m.

New subject: Protocol-relative URLs

On Tue, Sep 9, 2008 at 8:58 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:

...

On Tue, Sep 9, 2008 at 8:27 PM, Ilmari Karonen nospam@vyznev.net wrote:

...
I think the next step, after the log comparison test we both suggested, would be to set $wgLogo to a protocol-relative URL. A missing logo wouldn't actually break anything, but you _bet_ people would notice it.

Now that's a simple, elegant, effective idea. It would require almost no effort, hurt no one, and give immediate feedback. The only catch with this, as with other image-based proposals, is that a client that doesn't support images (as well as, in the case of the logo, CSS) won't be picked up. But I don't see much help for that. There are few enough of those anyway. lynx does support them, I just checked.

Probably every browser you can think to name supports them. I suspect that if I added up the user agents known to support it we'd be well above 99% of all traffic.

My bigger concerns are with things like content-munging proxies, anti-porno filters, SSL VPN appliances, and the like, breaking things for clients. Along with truly oddball clients (lynx is not oddball), and mobile devices. Testing with the logo, however, won't give a good exercise, but it wouldn't be a bad start.

Some of these may be obscure enough to ignore, some may not. But they are going to be rare enough that further 'light' pre-deployment testing isn't likely to find them.

...

One interesting catch, though, that I just noticed when testing. What happens when someone downloads the HTML to their hard drive and views it locally? lynx assumes FTP as the protocol, which is completely wrong. Firefox, Opera, Konqueror, and Chrome all try to use the *file://* protocol -- which is absolutely reasonable but absolutely terrible for us.

So this means that anyone who tries to save a Wikipedia page using protocol-relative URLs to their hard drive will find that all the relevant links are broken. This is, obviously, not a good thing. I can't see any conceivable workaround, and if there is none I don't see any way we (or anyone) can use protocol-relative URLs. Being able to save web pages locally is pretty basic and important functionality that a lot of people must be relying on.

Huh? Did you actually try saving a page?

A protocol relative url is a relative URL. We would use them where sites normally use relative URLs, but where we currently use fully qualified URLs because our 'site' spans many domain names (i.e. en.wikipedia.org and upload.wikimedia.org).

If you take some relative URLs from a website and merely write them into a file, of course it isn't going to work. Which is why browsers do not do anything like that when you save a page.

Browsers support relative URLS in saved copies by rewriting them at save time. Otherwise all relative URLs would break in saved documents, and the overwhelming majority of anchors and images on sites outside of Wikipedia are fully relative paths.

I just tested several browsers and they rewrote protocol relative URLs just as they do for any other kind relative URL. Images get saved and work fine. Links get fully qualified out just fine. It just works(tm).

I also tried this on a captured copy of the Wikipedia HTML. It works just as I'd expect.

As far as I can tell there is no problem here, but perhaps I'm missing something.

Aryeh Gregor

2:14 p.m.

New subject: Protocol-relative URLs

On Tue, Sep 9, 2008 at 11:15 PM, Gregory Maxwell gmaxwell@gmail.com wrote:

...

Huh? Did you actually try saving a page?

A protocol relative url is a relative URL. We would use them where sites normally use relative URLs, but where we currently use fully qualified URLs because our 'site' spans many domain names (i.e. en.wikipedia.org and upload.wikimedia.org).

If you take some relative URLs from a website and merely write them into a file, of course it isn't going to work. Which is why browsers do not do anything like that when you save a page.

Browsers support relative URLS in saved copies by rewriting them at save time. Otherwise all relative URLs would break in saved documents, and the overwhelming majority of anchors and images on sites outside of Wikipedia are fully relative paths.

I just tested several browsers and they rewrote protocol relative URLs just as they do for any other kind relative URL. Images get saved and work fine. Links get fully qualified out just fine. It just works(tm).

I also tried this on a captured copy of the Wikipedia HTML. It works just as I'd expect.

As far as I can tell there is no problem here, but perhaps I'm missing something.

As usual, I spoke before I tested, or thought. If you save it as a simple HTML file it breaks, but as you point out, so does every other link on the page. This is a non-issue, sorry for bringing it up. Or at least for bringing it up with such unwarranted alarm.

Aryeh Gregor

9 Sep 9 Sep

12:17 a.m.

On Mon, Sep 8, 2008 at 2:33 PM, Brion Vibber brion@wikimedia.org wrote:

...

Note that while loading of images over HTTP may reveal viewed pages (via referers, just like clicking on an external link will) it won't reveal passwords or session cookies.

According to RFC 2616 (section 15.1.3), it SHOULD NOT reveal Referers either, and AFAIK browsers do implement that. However, you could still probably work out what pages the person is viewing by just looking at which images are being loaded, in many cases.

On Mon, Sep 8, 2008 at 3:04 PM, Gregory Maxwell gmaxwell@gmail.com wrote:

...

On this subject, as part of the IPv6 testing I've run a JS tester on ENWP for a couple of months now which has determined that for hosts able to run the JS tester, protocol relative urls (i.e. <img src="//upload.wikimedia.org/foo.jpg"/>) work for all clients.

If protocol relatives turn out to be universally supported they would remove one problem from doing a native SSL deployment.

Why would one suspect that they're not universally supported?

Brion Vibber

12:35 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Aryeh Gregor wrote:

...

On Mon, Sep 8, 2008 at 3:04 PM, Gregory Maxwell gmaxwell@gmail.com wrote:

...
On this subject, as part of the IPv6 testing I've run a JS tester on ENWP for a couple of months now which has determined that for hosts able to run the JS tester, protocol relative urls (i.e. <img src="//upload.wikimedia.org/foo.jpg"/>) work for all clients.

If protocol relatives turn out to be universally supported they would remove one problem from doing a native SSL deployment.

Why would one suspect that they're not universally supported?

It's the kind of weird, uncommon thing that just screams "I'm a corner case that lots of people probably didn't bother to implement because I'm not in common use in the wild". :)

Even if browsers support it I would expect to see a lot of bots and spiders choke on it -- it's bad enough a lot don't understand that "&" in an <a href="..."> needs to be decoded as "&"... :)

- -- brion

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjFqJ4ACgkQwRnhpk1wk46SHQCgwWEP9SffsC9TkJDSfsc95B3v YGwAn3Y++xiG0Hlxt6wxyUOZeChLz2Ly =4Jco -----END PGP SIGNATURE-----

Jay R. Ashworth

2:30 p.m.

On Mon, Sep 08, 2008 at 03:35:10PM -0700, Brion Vibber wrote:

...

Even if browsers support it I would expect to see a lot of bots and spiders choke on it -- it's bad enough a lot don't understand that "&" in an <a href="..."> needs to be decoded as "&"... :)

Something I have always thought was breakage in the spec. *It's inside quotes*, people; it is outside your domain.

Cheers, -- jra

-- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com '87 e24 St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274 Those who cast the vote decide nothing. Those who count the vote decide everything. -- (Josef Stalin)

Aryeh Gregor

3:47 p.m.

On Tue, Sep 9, 2008 at 8:30 AM, Jay R. Ashworth jra@baylink.com wrote:

...

On Mon, Sep 08, 2008 at 03:35:10PM -0700, Brion Vibber wrote:

...
Even if browsers support it I would expect to see a lot of bots and spiders choke on it -- it's bad enough a lot don't understand that "&" in an <a href="..."> needs to be decoded as "&"... :)

Something I have always thought was breakage in the spec. *It's inside quotes*, people; it is outside your domain.

Not a tenable argument, because HTML entities are needed as much inside quotes as anywhere. " or ' is needed to escape those characters. Moreover, in a non-Unicode character set, you'll typically need to use entities to get most Unicode characters. As soon as entities are needed, you need & to specify a literal ampersand. It would be impossible to say that & doesn't decode to & in quotes -- you wouldn't be able to specify a literal string like """.

Gregory Maxwell

1:56 a.m.

On Mon, Sep 8, 2008 at 6:17 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:

...

On Mon, Sep 8, 2008 at 2:33 PM, Brion Vibber brion@wikimedia.org wrote:

...
Note that while loading of images over HTTP may reveal viewed pages (via referers, just like clicking on an external link will) it won't reveal passwords or session cookies.

According to RFC 2616 (section 15.1.3), it SHOULD NOT reveal Referers either, and AFAIK browsers do implement that. However, you could still probably work out what pages the person is viewing by just looking at which images are being loaded, in many cases.

I suspect that there are fairly few pages with images that can't be identified that way… and the discussion started with commons, so…

...

On Mon, Sep 8, 2008 at 3:04 PM, Gregory Maxwell gmaxwell@gmail.com wrote:

...
On this subject, as part of the IPv6 testing I've run a JS tester on ENWP for a couple of months now which has determined that for hosts able to run the JS tester, protocol relative urls (i.e. <img src="//upload.wikimedia.org/foo.jpg"/>) work for all clients.

If protocol relatives turn out to be universally supported they would remove one problem from doing a native SSL deployment.

Why would one suspect that they're not universally supported?

Because it's exceptionally rare, and most people whom I would otherwise expect to know about it are surprised that it both works and is part of the relevant standards.

On Mon, Sep 8, 2008 at 6:35 PM, Brion Vibber brion@wikimedia.org wrote:

...

It's the kind of weird, uncommon thing that just screams "I'm a corner case that lots of people probably didn't bother to implement because I'm not in common use in the wild". :)

Even if browsers support it I would expect to see a lot of bots and spiders choke on it -- it's bad enough a lot don't understand that "&" in an <a href="..."> needs to be decoded as "&"... :)

Right. Exactly. Although I'd only expect Wikimedia to need to use it for the few things that demand full URLs these days: Images and cross project links. Breaking some spiders with respect to those would probably not be the end of the world.

I do not know, however, if the damage would be limited to spiders. The modern JS-able browsers all appear fine, and the usual non-JS subjects that I was able to test were fine. But there are a lot of clients out there which I am not able to test.

In any case... The protocol-relatives are not necessary for HTTPS on the regular domains, but they would remove the need for making separately parsed copies of pages.

With protocol relatives, native HTTP support requires solving:

1) Wildcard SSL certificates 2) Dumb SSL front-ending proxy to do crypto 3) Either making the load balancer highly IP-sticky *or* setting up software for distributing the SSL session cache (i.e. http://distcache.sourceforge.net/).

Without protocol relatives, you have the above issues plus:

4) Running an additional set of backends that parse pages in a way that uses the HTTPS form for all wikimedia links.

Brion Vibber

2:46 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Gregory Maxwell wrote:

...

With protocol relatives, native HTTP support requires solving:

Wildcard SSL certificates

Dumb SSL front-ending proxy to do crypto

Either making the load balancer highly IP-sticky *or* setting up

software for distributing the SSL session cache (i.e. http://distcache.sourceforge.net/).

Doesn't a new HTTPS connection have to create a new SSL session? I'd think you'd only get away with using the same session when reusing the connection on keepalive, in which case it should just be staying open.

Or is the world of SSL far more strange and wonderful than I've imagined... ;)

(Currently the SSL is done on a proxy in front of the regular web servers; this is an Apache 2.2 proxy, rather than Squid, but it could be any SSL-enabled proxy.)

- -- brion

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjFx20ACgkQwRnhpk1wk46zkACgq+2arU9hlOripRALsCP9Cxuk ckMAn2x4gjLfZJ9mDOZu60D17f8a17xQ =qRAl -----END PGP SIGNATURE-----

Gregory Maxwell

3:57 a.m.

On Mon, Sep 8, 2008 at 8:46 PM, Brion Vibber brion@wikimedia.org wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Gregory Maxwell wrote:

...
With protocol relatives, native HTTP support requires solving:

Wildcard SSL certificates

Dumb SSL front-ending proxy to do crypto

Either making the load balancer highly IP-sticky *or* setting up

software for distributing the SSL session cache (i.e. http://distcache.sourceforge.net/).

Doesn't a new HTTPS connection have to create a new SSL session? I'd think you'd only get away with using the same session when reusing the connection on keepalive, in which case it should just be staying open.

Or is the world of SSL far more strange and wonderful than I've imagined... ;)

"There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy."

The whole SSL RSA/DH keying setup is embarrassingly computationally expensive, so much so that there still is a market for dedicated accelerator chips that do nothing else. (Though perhaps other processor makers will add modular arithmetic units like the SUN T2, and make them pointless sometime in the next few generations).

Because of this SSL supports session caches for hot-starting connections to the same server. You don't have to preserve the cache, but it will cause a lot of wasted CPU if you do not since clients love to fetch images in parallel to hide TCP latency.

It seems like a lot of people have attacked that session sharing problem: there are apache modules for using memcached too, ones based on libspread, filesystem based ones (I know you're dying for an excuse to roll out NFS on all the frontends). In any case, this is a well known and understood task enough so that it's pretty much a solved deal.

...

(Currently the SSL is done on a proxy in front of the regular web servers; this is an Apache 2.2 proxy, rather than Squid, but it could be any SSL-enabled proxy.)

Apache might not be a bad choice, but there are other more targeted options. Even something as simple as running stunnel on the existing squid front ends would work (although, I don't know if there is session cache distribution support for stunnel).

I guess I missed a relevant point in my bullets: Using protocol relative URLs would allow the secure front end to use the same squid infrastructure, leveraging the many many gigabytes of cache will be important for providing comparable performance. Without protocol relatives, images could use the existing cache infrastructure, but the wikitext could not (because the wikitext would need to be parsed differently)

mike.lifeguard

7 Sep 7 Sep

2:04 p.m.

Shouldn't Commons be located at secure.wikimedia.org/wikimedia/commons and Meta at /wikimedia/meta rather than both being located at /wikipedia/? They are not on the wikipedia domain since they are not Wikipedias.

Mike

-----Original Message----- From: Casey Brown [mailto:cbrown1023.ml@gmail.com] Sent: September 6, 2008 7:53 PM To: Wikimedia developers Subject: Re: [Wikitech-l] Secure access to Commons?

On Sat, Sep 6, 2008 at 6:33 PM, George Herbert george.herbert@gmail.com wrote:

...

Query - Is there a secure (https) URL / access path in to Wiki Commons?

Yeah, and it follows the same format as the other ".wikimedia.org" sites (wikipedia/projectname). Here's the Commons gateway: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page.

-- Casey Brown Cbrown1023 --- Note: This e-mail address is used for mailing lists. Personal emails sent to this address will probably get lost.

Andrew Garrett

2:59 p.m.

On Sun, Sep 7, 2008 at 11:04 PM, mike.lifeguard mike.lifeguard@gmail.com wrote:

...

Shouldn't Commons be located at secure.wikimedia.org/wikimedia/commons and Meta at /wikimedia/meta rather than both being located at /wikipedia/? They are not on the wikipedia domain since they are not Wikipedias.

Mike

They're there for historical reasons.

-- Andrew Garrett

Stephen Bain

6:50 p.m.

On Sun, Sep 7, 2008 at 10:59 PM, Andrew Garrett andrew@epstone.net wrote:

...

On Sun, Sep 7, 2008 at 11:04 PM, mike.lifeguard mike.lifeguard@gmail.com wrote:

...
Shouldn't Commons be located at secure.wikimedia.org/wikimedia/commons and Meta at /wikimedia/meta rather than both being located at /wikipedia/? They are not on the wikipedia domain since they are not Wikipedias.

They're there for historical reasons.

Indeed. Meta started as a fork off Wikipedia and was originally at http://meta.wikipedia.com (yes, .com, it's that old - in fact when it started I believe there was only one Wikipedia, enwiki, and it was only after most enwiki policy debate moved off meta onto wikien-l and enwiki itself that meta became what it is today, a cross-project coordination wiki).

Commons was always "Wikimedia Commons", and always at it's current address. However, I believe it was the first wiki under the wikimedia.org domain; meta was still under the wikipedia domain at that point. It is probably for this reason that it was lumped under /wikipedia/, though I'm just guessing.

-- Stephen Bain stephen.bain@gmail.com

Brion Vibber

8 Sep 8 Sep

8:39 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Andrew Garrett wrote:

...

On Sun, Sep 7, 2008 at 11:04 PM, mike.lifeguard mike.lifeguard@gmail.com wrote:

...
Shouldn't Commons be located at secure.wikimedia.org/wikimedia/commons and Meta at /wikimedia/meta rather than both being located at /wikipedia/? They are not on the wikipedia domain since they are not Wikipedias.

Mike

They're there for historical reasons.

The base reason for this is because of the internal naming conventions. We originally created various miscellaneous wikis as special cases within the core Wikipedia group:

Prefix: 'meta', 'commons', 'foundation', etc Database suffix: 'wiki' Directory: 'wikipedia'

...while other multi-language projects got their own suffixes:

Prefix: 'en', 'fr', etc Database suffix: 'wiktionary' Directory: 'wiktionary'

Eventually we started using the 'wikimedia' suffix for some Wikimedia chapters:

Prefix: 'nl', 'se', etc Database suffix: 'wikimedia' Directory: 'wikimedia'

In theory we could migrate other wikis to that system, but it may not be worth the trouble. :P

As for the secure.wikimedia.org oddities in general; we'd like to move the HTTPS system over to the primary domains for better usability (eg, http://commons.wikimedia.org/) but this has been a low priority for a while; I hope we'll be able to get it done over the coming months as we're able to ramp up with more available sysadmin time.

It requires some fun setup with proxying (all our proxy sites would have to proxy HTTPS as well as HTTP) and appropriate domain certificates to make sure everything works properly.

- -- brion

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjFcXQACgkQwRnhpk1wk45m/ACeKGC1CSr1OHXqk6WxO7xxXdXH o/8AniUGItNoFojc9OL8SWblbqisiCFW =sXrP -----END PGP SIGNATURE-----

Anthony

9 p.m.

On Mon, Sep 8, 2008 at 2:39 PM, Brion Vibber brion@wikimedia.org wrote:

...

As for the secure.wikimedia.org oddities in general; we'd like to move the HTTPS system over to the primary domains for better usability (eg, http://commons.wikimedia.org/) but this has been a low priority for a while; I hope we'll be able to get it done over the coming months as we're able to ramp up with more available sysadmin time.

It'd be incredibly helpful if you could at least have a page at https://secure.wikimedia.org/ which links to the secure websites for the various projects. I've often found myself spending a few minutes trying to find the right url for a particular project. I figured it was just me, but apparently not.

Or maybe someone who has an easier time than I using bugzilla can enter this in there.

David Gerard

9 Sep 9 Sep

1:08 a.m.

2008/9/8 Anthony wikimail@inbox.org:

...

It'd be incredibly helpful if you could at least have a page at https://secure.wikimedia.org/ which links to the secure websites for the various projects. I've often found myself spending a few minutes trying to find the right url for a particular project. I figured it was just me, but apparently not.

- d.

Ashar Voultoiz

13 Sep 13 Sep

7:35 a.m.

David Gerard wrote:

...

2008/9/8 Anthony wikimail@inbox.org:

...
It'd be incredibly helpful if you could at least have a page at https://secure.wikimedia.org/ which links to the secure websites for the various projects. I've often found myself spending a few minutes trying to find the right url for a particular project. I figured it was just me, but apparently not.

+1

Mailing lists are not a place to vote.

-- Ashar Voultoiz - WP++++ http://en.wikipedia.org/wiki/User:Hashar http://www.livejournal.com/community/wikitech/ IM: hashar@jabber.org

Gregory Maxwell

7:07 p.m.

On Sat, Sep 13, 2008 at 1:35 AM, Ashar Voultoiz hashar@free.fr wrote:

...

David Gerard wrote:

...
2008/9/8 Anthony wikimail@inbox.org:

...
It'd be incredibly helpful if you could at least have a page at https://secure.wikimedia.org/ which links to the secure websites for the various projects. I've often found myself spending a few minutes trying to find the right url for a particular project. I figured it was just me, but apparently not.

+1

Mailing lists are not a place to vote.

5954

Age (days ago)

5961

Last active (days ago)

wikitech-l@lists.wikimedia.org

37 comments

17 participants

tags (0)

participants (17)

Andrew Garrett
Anthony
Aryeh Gregor
Ashar Voultoiz
Brion Vibber
Casey Brown
David Gerard
George Herbert
Gregory Maxwell
Ilmari Karonen
Jay R. Ashworth
Max Semenik
mike.lifeguard
Oldak Quill
Platonides
Stephen Bain
Thomas Dalton