Time to redirect to https by default?

Lots of monitoring going into place:

https://en.wikipedia.org/wiki/Wikipedia:List_of_articles_censored_in_Saudi_A... http://www.bbc.co.uk/news/uk-politics-17576745

What are the current technical barriers to redirection to https by default?

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Gerard

4:01 p.m.

On 1 April 2012 11:55, Petr Bena benapetr@gmail.com wrote:

...

I see no point in doing that. Https doesn't support caching well and is generally slower. There is no use for readers for that.

The use is that the requests themselves are encrypted, so that the only thing logged is that they went to Wikimedia. You did read the linked articles, right?

- d.

Svip

4:20 p.m.

On 1 April 2012 13:01, David Gerard dgerard@gmail.com wrote:

...

On 1 April 2012 11:55, Petr Bena benapetr@gmail.com wrote:

...
I see no point in doing that. Https doesn't support caching well and is generally slower. There is no use for readers for that.

The use is that the requests themselves are encrypted, so that the only thing logged is that they went to Wikimedia. You did read the linked articles, right?

Obviously, I cannot confirm whether Mr Bena read the linked articles or not, but he did provide an answer regarding the technical restrictions.

Wikimedia already spends an incredible amount of time caching its content, because *so many* users use Wikipedia and its sister projects daily.

And since most of the content is fairly static, caching makes a lot of sense.

However, HTTPS does not support caching (at least not well), which means each page would suddenly have to be generated for *each* page. It's true that MediaWiki itself supports caching, but its own caching is no where near as fast as a caching server like Varnish (although I believe a less powerful caching server is used on Wikimedia's servers).

The trade off is that the service would be slower for everyone or we would need more servers. And I am not sure Wikimedia has that kind of money.

Those are the *technical* limitations to defaulting to HTTPS.

Antoine Musso

9:43 p.m.

Le 01/04/12 12:55, Petr Bena wrote:

...

I see no point in doing that. Https doesn't support caching well and is generally slower. There is no use for readers for that.

HTTPS has nothing to do with caching, it just transports informations between the client and the server so they can actually handle caching.

HTTPS supports caching as well as HTTP since they are exactly the same protocol, the first just being encrypted.

You are right though, in the sense of most web browsers will BY DEFAULT not save a copy of the received content whenever it is received through HTTPS. The reason behind is that HTTPS page is/was usually used to serve private content. Caching can be explicitly set to caching by marking it as public, send "Cache-Control: public" and that should work.

I do agree there is probably no use for readers to have HTTPS enabled. If the purposes is to bypass countries firewall such as in China (or I think Thailand), they will just intercept the HTTPS connection form the server on their hardware, decypher it for analysis and resign the content with their own certificate before sending it back to clients.

That is exactly what you do in a big company when you want to make sure (as an example) that your employee do not use the chat function in Facebook.

The only thing HTTPS is going to prevent, is being still its password when logging in or getting the session cookie hijacked by sniffing the local network. The WMF has already moved its private wikis to HTTPS just for that :-]

cheers,

-- Antoine "hashar" Musso

Platonides

11:33 p.m.

On 01/04/12 18:43, Antoine Musso wrote:

...

Le 01/04/12 12:55, Petr Bena wrote:

...
I see no point in doing that. Https doesn't support caching well and is generally slower. There is no use for readers for that.

HTTPS has nothing to do with caching, it just transports informations between the client and the server so they can actually handle caching.

HTTPS supports caching as well as HTTP since they are exactly the same protocol, the first just being encrypted.

There would be a small difference if you're behind a caching proxy, but that's unlikely to make a difference to pretty much everyone.

...

I do agree there is probably no use for readers to have HTTPS enabled. If the purposes is to bypass countries firewall such as in China (or I think Thailand), they will just intercept the HTTPS connection form the server on their hardware, decypher it for analysis and resign the content with their own certificate before sending it back to clients.

Note that such approach would yield a certificate, which if stored during the attack and later published, is a proof of their evil-doing. Any CA willingly doing that (even if "forced by the government") would (should) be immediately revoked from the browsers certificate bundles.

(I believe such interposition has been done in the past, though)

...

That is exactly what you do in a big company when you want to make sure (as an example) that your employee do not use the chat function in Facebook.

A company can install its own CA certificate in their own computers, and have a policy of "we will sniff everything" (note that if the employee is not conveniently informed of that, the wiretapping could well be illegal). I wonder how they handle self-signed certificates.

Petr Bena

2 Apr 2 Apr

12:20 p.m.

That's not what I wanted to say, I wanted to say "https may cause troubles with caching", In fact some caching servers have problems with https since the header is encrypted as well, so they usually just forward the encrypted traffic to server. I don't say it's impossible to cache this, but it's very complicated

On Sun, Apr 1, 2012 at 6:43 PM, Antoine Musso hashar+wmf@free.fr wrote:

...

Le 01/04/12 12:55, Petr Bena wrote:

...
I see no point in doing that. Https doesn't support caching well and is generally slower. There is no use for readers for that.

HTTPS has nothing to do with caching, it just transports informations between the client and the server so they can actually handle caching.

HTTPS supports caching as well as HTTP since they are exactly the same protocol, the first just being encrypted.

Antoine Musso

2 p.m.

On 2012-04-02 09:20, Petr Bena wrote:

...

That's not what I wanted to say, I wanted to say "https may cause troubles with caching", In fact some caching servers have problems with https since the header is encrypted as well, so they usually just forward the encrypted traffic to server. I don't say it's impossible to cache this, but it's very complicated

That might indeed by an issue.

That is why you want to use HTTPS off loader at the edge of your cluster, they will handle unencryption and then server that as unencrypted traffic again :-]

I believe that is what the WMF is doing by using nginx as an HTTPS proxy. Someone with better knowledge will confirm.

-- Antoine "hashar" Musso

Tei

2:34 p.m.

Perhaps have a black list of countries that are know to break the privacy of communications, then make https default for logued users in these countries.

This may help because:

- It only affect a subgroup of users (the ones from these countries) - It only affect a subgroup of that subgroup, the logued users (not all) - It create a blacklist of "bad countries" where citizens are under surveillance by the governement

This perhaps is not feasible, if theres not easy way to detect the country based on the ip.

-- -- ℱin del ℳensaje.

Petr Bena

8:31 p.m.

I believe it would be best if login form was served using http with check box "Disable ssl" which would be not checked as default. The target page of form would be ssl page in case users wouldn't check it. So that in countries where ssl is problem they could just check it and proceed using unencrypted connection.

On Mon, Apr 2, 2012 at 11:34 AM, Tei oscar.vives@gmail.com wrote:

...

Perhaps have a black list of countries that are know to break the privacy of communications, then make https default for logued users in these countries.

This may help because:

- It only affect a subgroup of users (the ones from these countries) - It only affect a subgroup of that subgroup, the logued users (not all) - It create a blacklist of "bad countries" where citizens are under surveillance by the governement

This perhaps is not feasible, if theres not easy way to detect the country based on the ip.

--

ℱin del ℳensaje.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Daniel Friesen

10:33 p.m.

Serving the login page over http opens login up to MITM attacks by injecting scripts to swipe passwords or modifying the form to only use http. So you've already eliminated half the reason we introduced https. Additionally you cannot control the action="" using a checkbox unless you use JS to do it (and we strive to make sure our login form works for those without JS). So in order to make a disable SSL checkbox work you have to make the action="" a http page that does redirection. However doing that means that now the password is posted over HTTP and a MITM middle can now snoop passwords. Worse this eliminates most of the rest of the advantage of https because now MITM also means we're all the way back to making it possible to snoop user passwords in open Wi-Fi.

On Mon, 02 Apr 2012 08:31:32 -0700, Petr Bena benapetr@gmail.com wrote:

...

I believe it would be best if login form was served using http with check box "Disable ssl" which would be not checked as default. The target page of form would be ssl page in case users wouldn't check it. So that in countries where ssl is problem they could just check it and proceed using unencrypted connection.

On Mon, Apr 2, 2012 at 11:34 AM, Tei oscar.vives@gmail.com wrote:

...
Perhaps have a black list of countries that are know to break the privacy of communications, then make https default for logued users in these countries.

This may help because:

It only affect a subgroup of users (the ones from these countries)

It only affect a subgroup of that subgroup, the logued users (not

all)

It create a blacklist of "bad countries" where citizens are under

surveillance by the governement

This perhaps is not feasible, if theres not easy way to detect the country based on the ip.

--

ℱin del ℳensaje.

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Ryan Lane

3 Apr 3 Apr

midnight

On Mon, Apr 2, 2012 at 6:34 PM, Tei oscar.vives@gmail.com wrote:

...

Perhaps have a black list of countries that are know to break the privacy of communications, then make https default for logued users in these countries.

This may help because:

- It only affect a subgroup of users (the ones from these countries) - It only affect a subgroup of that subgroup, the logued users (not all) - It create a blacklist of "bad countries" where citizens are under surveillance by the governement

This perhaps is not feasible, if theres not easy way to detect the country based on the ip.

I'd definitely not support doing something like this. This would incredibly complicate things.

- Ryan

MZMcBride

1:26 a.m.

Ryan Lane wrote:

...

On Mon, Apr 2, 2012 at 6:34 PM, Tei oscar.vives@gmail.com wrote:

...
Perhaps have a black list of countries that are know to break the privacy of communications, then make https default for logued users in these countries.

This may help because:

- It only affect a subgroup of users (the ones from these countries) - It only affect a subgroup of that subgroup, the logued users (not all) - It create a blacklist of "bad countries" where citizens are under surveillance by the governement

This perhaps is not feasible, if theres not easy way to detect the country based on the ip.

I'd definitely not support doing something like this. This would incredibly complicate things.

Someone came into #wikimedia-tech a few days ago and asked about something similar to this. The idea was to use site-wide JavaScript to auto-redirect users to https on one of the Chinese Wikipedias. I believe this was in combination with geolocation functionality, but I'm not sure.

Do you have any thoughts on individual wikis doing this, assuming there's local community consensus?

MZMcBride

Ryan Lane

2 Apr 2 Apr

11:58 p.m.

On Mon, Apr 2, 2012 at 4:20 PM, Petr Bena benapetr@gmail.com wrote:

...

That's not what I wanted to say, I wanted to say "https may cause troubles with caching", In fact some caching servers have problems with https since the header is encrypted as well, so they usually just forward the encrypted traffic to server. I don't say it's impossible to cache this, but it's very complicated

Using SSL by default means all transparent proxies inbetween aren't hit at all, since they'd be a MITM. I don't necessarily see this as a bad thing, as transparent proxies often break things.

Browsers cache things differently from HTTPS sites, but otherwise everything should work as normal. The SSL termination proxies transparently proxy to our frontend caches after termination. Links are sent as protocol-relative so that we don't split our cache, as well.

- Ryan

Svip

1 Apr 1 Apr

4:23 p.m.

On 1 April 2012 12:06, David Gerard dgerard@gmail.com wrote:

...

http://www.bbc.co.uk/news/uk-politics-17576745

Also, this article was written on 1 April and is far beyond any monitoring scheme ever suggested in the Western World. And I am sure we would have heard about it being mentioned up until this point, if it was real.

So I would take that article with a grain of salt. Particularly the statement about 'real time'. That's not even feasible.

David Gerard

4:59 p.m.

On 1 April 2012 12:23, Svip svippy@gmail.com wrote:

...

On 1 April 2012 12:06, David Gerard dgerard@gmail.com wrote:

...

...
http://www.bbc.co.uk/news/uk-politics-17576745

...

Also, this article was written on 1 April and is far beyond any monitoring scheme ever suggested in the Western World. And I am sure we would have heard about it being mentioned up until this point, if it was real.

It would be nice, but if it's a prank then (a) lots of other newspapers are in on it (b) ORG flagged the programme described several weeks in advance:

http://wiki.openrightsgroup.org/wiki/Communications_Capabilities_Development... http://www.openrightsgroup.org/issues/ccdp

So no, it's in no way a joke. This is absolutely real.

...

So I would take that article with a grain of salt. Particularly the statement about 'real time'. That's not even feasible.

That a desired monitoring regime would require a violation of physics has *never* stopped a legislative push for such.

- d.

Petr Bena

5:52 p.m.

I said there is a little benefit for most of users, of course there would be some who could find it usefull, however that's no reason to redirect all users. I use wikipedia a lot, and I don't care if someone see which pages I open. If someone does care, they should switch to https themselves.

On Sun, Apr 1, 2012 at 1:59 PM, David Gerard dgerard@gmail.com wrote:

...

On 1 April 2012 12:23, Svip svippy@gmail.com wrote:

...
On 1 April 2012 12:06, David Gerard dgerard@gmail.com wrote:

...
...
http://www.bbc.co.uk/news/uk-politics-17576745

...
Also, this article was written on 1 April and is far beyond any monitoring scheme ever suggested in the Western World. And I am sure we would have heard about it being mentioned up until this point, if it was real.

It would be nice, but if it's a prank then (a) lots of other newspapers are in on it (b) ORG flagged the programme described several weeks in advance:

http://wiki.openrightsgroup.org/wiki/Communications_Capabilities_Development... http://www.openrightsgroup.org/issues/ccdp

So no, it's in no way a joke. This is absolutely real.

...
So I would take that article with a grain of salt. Particularly the statement about 'real time'. That's not even feasible.

That a desired monitoring regime would require a violation of physics has *never* stopped a legislative push for such.

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Svip

5:53 p.m.

On 1 April 2012 13:59, David Gerard dgerard@gmail.com wrote:

...

On 1 April 2012 12:23, Svip svippy@gmail.com wrote:

...
On 1 April 2012 12:06, David Gerard dgerard@gmail.com wrote:

...
http://www.bbc.co.uk/news/uk-politics-17576745

Also, this article was written on 1 April and is far beyond any monitoring scheme ever suggested in the Western World. And I am sure we would have heard about it being mentioned up until this point, if it was real.

It would be nice, but if it's a prank then (a) lots of other newspapers are in on it (b) ORG flagged the programme described several weeks in advance:

http://wiki.openrightsgroup.org/wiki/Communications_Capabilities_Development... http://www.openrightsgroup.org/issues/ccdp

So no, it's in no way a joke. This is absolutely real.

Still *kind of* a joke.

...

...
So I would take that article with a grain of salt. Particularly the statement about 'real time'. That's not even feasible.

That a desired monitoring regime would require a violation of physics has *never* stopped a legislative push for such.

But it has always stopped it from being implemented or executed in practice. While the development is terrifying, it is also important to note the lack of actual consequences it will have. Other than being a huge embarrassment.

But I was always under the influence that the UK didn't really care about free speech and privacy.

Piotr Jagielski

7:04 p.m.

New subject: correct way to import SQL dumps into MySQL database in terms of character encoding

Hello,

I'm trying to import categorylinks.sql dump into my MySQL database. I'm able to import it and query for articles in specific categories as long the category name contains only English-language characters. I don't get any results if I try to query for non-English category name. My understanding is that the dump is in UTF-8 format so I tried the following:

create the database using the following command: CREATE DATABASE wiki CHARACTER SET utf8 COLLATE utf8_general_ci;

import the dump using the following command: mysql --user root --password=root wiki < C:\Path\plwiki-20111227-categorylinks.sql --default-character-set=utf8

set my data source URL to the following in my Java code: jdbc:mysql://localhost/plwiki?useUnicode=true&characterEncoding=UTF-8

It still doesn't work. What am I missing? Are there any instructions on how to correctly import the dump anywhere?

Thanks, Piotr

Svip

7:31 p.m.

New subject: correct way to import SQL dumps into MySQL database in terms of character encoding

On 1 April 2012 16:04, Piotr Jagielski piotr.jagielski@op.pl wrote:

...

mysql --user root --password=root wiki < C:\Path\plwiki-20111227-categorylinks.sql --default-character-set=utf8

It's -p, not --password=root and it will prompt you for the password.

Piotr Jagielski

8:05 p.m.

New subject: correct way to import SQL dumps into MySQL database in terms of character encoding

These options should be equivalent. It does load the data using the below command. It just incorrectly handles non-English characters.

Regards, Piotr

On 2012-04-01 16:31, Svip wrote:

...

On 1 April 2012 16:04, Piotr Jagielskipiotr.jagielski@op.pl wrote:

...
mysql --user root --password=root wiki< C:\Path\plwiki-20111227-categorylinks.sql --default-character-set=utf8

It's -p, not --password=root and it will prompt you for the password.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Platonides

8:30 p.m.

New subject: correct way to import SQL dumps into MySQL database in terms of character encoding

On 01/04/12 17:05, Piotr Jagielski wrote:

...

These options should be equivalent. It does load the data using the below command. It just incorrectly handles non-English characters.

Regards, Piotr

Do you have $wgDBmysql5 set in your LocalSettings.php?

Piotr Jagielski

8:37 p.m.

New subject: correct way to import SQL dumps into MySQL database in terms of character encoding

I don't have MediaWiki installed. I'm just trying to import the dump into a standalone database so I can do some batch processing on the data.

Regards, Piotr

On 2012-04-01 17:30, Platonides wrote:

...

On 01/04/12 17:05, Piotr Jagielski wrote:

...
These options should be equivalent. It does load the data using the below command. It just incorrectly handles non-English characters.

Regards, Piotr

Do you have $wgDBmysql5 set in your LocalSettings.php?

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Platonides

2 Apr 2 Apr

1:30 a.m.

New subject: correct way to import SQL dumps into MySQL database in terms of character encoding

On 01/04/12 17:37, Piotr Jagielski wrote:

...

I don't have MediaWiki installed. I'm just trying to import the dump into a standalone database so I can do some batch processing on the data.

Regards, Piotr

It inserts the data fine for me. I suspect your java code is failing to appropiately read them. Try reading the table with a different tool, such as phpMyAdmin.

...

mysql> select * from categorylinks limit 20; +---------+---------------------------------------+-------------------------------------+---------------------+-------------------+--------------+---------+ | cl_from | cl_to | cl_sortkey | cl_timestamp | cl_sortkey_prefix | cl_collation | cl_type | +---------+---------------------------------------+-------------------------------------+---------------------+-------------------+--------------+---------+ | 0 | Ekspresowe_kasowanko | Golembiovski Andzey | 2009-07-09 21:01:30 | | | page | | 2 | Języki_skryptowe | AWK AWK | 2011-01-18 01:11:23 | Awk | uppercase | page | | 4 | Specjalności_lekarskie | ALERGOLOGIA | 2008-04-25 10:31:22 | | uppercase | page | | 6 | Formaty_plików_komputerowych | ASCII | 2011-09-23 11:01:05 | | uppercase | page | | 6 | Kodowania_znaków | ASCII | 2011-09-23 11:01:05 | | uppercase | page | | 7 | Artykuły_na_medal | ATOM | 2010-12-01 16:40:37 | | uppercase | page | | 7 | Artykuły_wymagające_dopracowania | ATOM | 2011-08-16 15:53:43 | | uppercase | page | | 7 | Atomy | ATOM | 2011-08-09 00:56:39 | | uppercase | page | | 8 | Logika_matematyczna | AKSJOMAT | 2007-11-10 08:18:06 | | uppercase | page | | 10 | Arytmetyka | ARYTMETYKA | 2011-10-17 02:36:39 | | uppercase | page | | 11 | Artykuły_pod_opieką_Projektu_Chemia | AMINOKWASY | 2011-08-19 02:48:21 | | uppercase | page | | 12 | Alkeny | * ALKENY | 2006-08-07 17:23:22 | * | uppercase | page | | 13 | Multimedia | ACTIVEX | 2007-05-24 20:20:15 | | uppercase | page | | 13 | Windows | ACTIVEX | 2007-05-24 20:20:15 | | uppercase | page | | 14 | Interfejsy_programistyczne | ! APPLICATION PROGRAMMING INTERFACE | 2011-04-27 11:33:17 | ! | uppercase | page | | 15 | Amiga | AMIGAOS | 2007-09-09 17:19:11 | | uppercase | page | | 15 | Systemy_operacyjne | AMIGAOS | 2007-09-09 17:19:11 | | uppercase | page | | 16 | Organizacje_międzynarodowe | ASSOCIATION FOR COMPUTING MACHINERY | 2011-10-19 15:52:28 | | uppercase | page | | 18 | Funkcje_boolowskie | ALTERNATYWA | 2007-03-23 17:43:05 | | uppercase | page | | 19 | Logika_matematyczna | AKSJOMAT INDUKCJI | 2007-08-31 22:54:55 | | uppercase | page | +---------+---------------------------------------+-------------------------------------+---------------------+-------------------+--------------+---------+ 20 rows in set (0.00 sec)

Marcin Cieslak

1 Apr 1 Apr

10:50 p.m.

New subject: correct way to import SQL dumps into MySQL database in terms of character encoding

...

...
Piotr Jagielski piotr.jagielski@op.pl wrote:

Hello,

set my data source URL to the following in my Java code: jdbc:mysql://localhost/plwiki?useUnicode=true&characterEncoding=UTF-8

Please note you have "plwiki" here and you imported into "wiki". Assuming your .my.cnf is not making things difficult I ran a small Jython script to test:

$ jython Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0 Type "help", "copyright", "credits" or "license" for more information.

...

...
...
from com.ziclix.python.sql import zxJDBC d, u, p, v = "jdbc:mysql://localhost/wiki", "root", None, "org.gjt.mm.mysql.Driver" db = zxJDBC.connect(d, u, p, v, CHARSET="utf8") c=db.cursor() c.execute("select cl_from, cl_to from categorylinks where cl_from=61 limit 10") c.fetchone()

(61, array('b', [65, 110, 100, 111, 114, 97]))

...

...
...
(a,b) = c.fetchone() print b

array('b', [67, 122, -59, -126, 111, 110, 107, 111, 119, 105, 101, 95, 79, 114, 103, 97, 110, 105, 122, 97, 99, 106, 105, 95, 78, 97, 114, 111, 100, -61, -77, 119, 95, 90, 106, 101, 100, 110, 111, 99, 122, 111, 110, 121, 99, 104])

...

...
...
for x in b:

... try: ... print chr(x), ... except ValueError: ... print "%02x" % x, ... C z -3b -7e o n k o w i e _ O r g a n i z a c j i _ N a r o d -3d -4d w _ Z j e d n o c z o n y c h

array('b", [ ... ]) in Jython means that SQL driver returns an array of bytes.

It seems to me that array of bytes contains raw UTF-8, so you need to decode it into proper Unicode that Java uses in strings.

I think this behaviour is described in

http://bugs.mysql.com/bug.php?id=25528

Probably you need to play with getBytes() on a result object to get what you want.

//Saper

Piotr Jagielski

11:32 p.m.

New subject: correct way to import SQL dumps into MySQL database in terms of character encoding

Sorry, I made a mistake in the e-mail. I had the database set to the same name in both places.

My problem is actually opposite because I don't get any result where I use UTF-8 string as an input in the query. But I verified that I don't get correct results where using the query you provided neither. The link with the MySQL bug report might be helpful in resolving the problem so thanks for providing it.

Piotr

On 2012-04-01 19:50, Marcin Cieslak wrote:

...

...
...
Piotr Jagielskipiotr.jagielski@op.pl wrote:

Hello,

set my data source URL to the following in my Java code: jdbc:mysql://localhost/plwiki?useUnicode=true&characterEncoding=UTF-8

Please note you have "plwiki" here and you imported into "wiki". Assuming your .my.cnf is not making things difficult I ran a small Jython script to test:

$ jython Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0 Type "help", "copyright", "credits" or "license" for more information.

...
...
...
from com.ziclix.python.sql import zxJDBC d, u, p, v = "jdbc:mysql://localhost/wiki", "root", None, "org.gjt.mm.mysql.Driver" db = zxJDBC.connect(d, u, p, v, CHARSET="utf8") c=db.cursor() c.execute("select cl_from, cl_to from categorylinks where cl_from=61 limit 10") c.fetchone()

(61, array('b', [65, 110, 100, 111, 114, 97]))

...
...
...
(a,b) = c.fetchone() print b

array('b', [67, 122, -59, -126, 111, 110, 107, 111, 119, 105, 101, 95, 79, 114, 103, 97, 110, 105, 122, 97, 99, 106, 105, 95, 78, 97, 114, 111, 100, -61, -77, 119, 95, 90, 106, 101, 100, 110, 111, 99, 122, 111, 110, 121, 99, 104])

...
...
...
for x in b:

... try: ... print chr(x), ... except ValueError: ... print "%02x" % x, ... C z -3b -7e o n k o w i e _ O r g a n i z a c j i _ N a r o d -3d -4d w _ Z j e d n o c z o n y c h

array('b", [ ... ]) in Jython means that SQL driver returns an array of bytes.

It seems to me that array of bytes contains raw UTF-8, so you need to decode it into proper Unicode that Java uses in strings.

I think this behaviour is described in

http://bugs.mysql.com/bug.php?id=25528

Probably you need to play with getBytes() on a result object to get what you want.

//Saper

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Platonides

8:28 p.m.

On 1 April 2012 14:53, Svip wrote:

...

On 1 April 2012 13:59, David Gerard dgerard@gmail.com wrote:

...
On 1 April 2012 12:23, Svip svippy@gmail.com wrote:

...
So I would take that article with a grain of salt. Particularly the statement about 'real time'. That's not even feasible.

That a desired monitoring regime would require a violation of physics has *never* stopped a legislative push for such.

But it has always stopped it from being implemented or executed in practice. While the development is terrifying, it is also important to note the lack of actual consequences it will have. Other than being a huge embarrassment.

I don't see why it *couldn't* be implemented. Note that the real time statement is no different on how they can snoop your phone calls in real time. Sure, the storage requirements would be crazy, but I don't see specific details on what is to be stored, so it may well be implementable given enough funding.

Bináris

9 p.m.

2012/4/1 David Gerard dgerard@gmail.com

...

http://www.bbc.co.uk/news/uk-politics-17576745

This one may be an April 1 joke, let's wait one day. :-)

-- Bináris

David Gerard

9:39 p.m.

On 1 April 2012 17:00, Bináris wikiposta@gmail.com wrote:

...

2012/4/1 David Gerard dgerard@gmail.com

...

...
http://www.bbc.co.uk/news/uk-politics-17576745

...

This one may be an April 1 joke, let's wait one day. :-)

No, it really isn't, sadly.

- d.

Ryan Lane

2 Apr 2 Apr

1:14 a.m.

TL;DR: we have no plans for anonymous HTTPS by default, but will eventually default to HTTPS for logged-in users.

1. It would require an ssl terminator on every frontend cache. The ssl terminators eat memory, which is also what the frontend caches do. 2. HTTPS dramatically increases latency, which would be kind of painful for mobile. 3. Some countries may completely block HTTPS, but allow HTTP to our sites so that they can track users. Is it better for us to provide them content, or protect their privacy? 4. It's still possible for governments to see that people are going to wikimedia sites when using HTTPS, so it's still possible to oppress people for trying to visit sites that are disallowed.

On Sun, Apr 1, 2012 at 7:06 PM, David Gerard dgerard@gmail.com wrote:

...

Lots of monitoring going into place:

https://en.wikipedia.org/wiki/Wikipedia:List_of_articles_censored_in_Saudi_A... http://www.bbc.co.uk/news/uk-politics-17576745

What are the current technical barriers to redirection to https by default?

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Leslie Carr

1:24 a.m.

On Sun, Apr 1, 2012 at 1:14 PM, Ryan Lane rlane32@gmail.com wrote:

...

TL;DR: we have no plans for anonymous HTTPS by default, but will eventually default to HTTPS for logged-in users.

It would require an ssl terminator on every frontend cache. The ssl

terminators eat memory, which is also what the frontend caches do. 2. HTTPS dramatically increases latency, which would be kind of painful for mobile.

Without getting into how other countries censor data (boo!) I agree with the first two points. SSL terminators are much more memory and cpu intensive which would require many more machines. Also there are more RTT's required for https/ssl and our ping latency is not very good since we do not have a very geographically diverse infrastructure.

The two solutions for this are #1 more and beefier machines and #2 caching centers in various locations physically closer to users (which also requires a lot of #1). Sadly the biggest drawback of these two points is that they both cost a lot of money and that would mean a lot more pop up banners of Jimmy asking for cash :(

Leslie

P.S. I peronally like the idea of a cookie that you can check box at the top of the page (one time showing only perhaps?) that would default send users to https upon request. However I don't think we can do this with our current infrastructure due to the above issues.

...

Some countries may completely block HTTPS, but allow HTTP to our

sites so that they can track users. Is it better for us to provide them content, or protect their privacy? 4. It's still possible for governments to see that people are going to wikimedia sites when using HTTPS, so it's still possible to oppress people for trying to visit sites that are disallowed.

On Sun, Apr 1, 2012 at 7:06 PM, David Gerard dgerard@gmail.com wrote:

...
Lots of monitoring going into place:

https://en.wikipedia.org/wiki/Wikipedia:List_of_articles_censored_in_Saudi_A... http://www.bbc.co.uk/news/uk-politics-17576745

What are the current technical barriers to redirection to https by default?

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Leslie Carr Wikimedia Foundation AS 14907, 43821

Tim Starling

8:33 a.m.

On 02/04/12 06:14, Ryan Lane wrote:

...

TL;DR: we have no plans for anonymous HTTPS by default, but will eventually default to HTTPS for logged-in users.

It would require an ssl terminator on every frontend cache. The ssl

terminators eat memory, which is also what the frontend caches do.

Once we enable it by default for logged-in users, we will care a lot more if someone tries to take it down with a DoS attack. Unless the redirection can be disabled without actually logging in, a DoS attack on the HTTPS frontend would prevent any authenticated activity.

It suggests a need for a robust, overprovisioned service, with tools and procedures in place for identifying and blocking or throttling malicious traffic.

[...]

...

Some countries may completely block HTTPS, but allow HTTP to our

sites so that they can track users. Is it better for us to provide them content, or protect their privacy? 4. It's still possible for governments to see that people are going to wikimedia sites when using HTTPS, so it's still possible to oppress people for trying to visit sites that are disallowed.

It's also possible for governments to snoop on HTTPS communications, by using a private key from a trusted CA to perform a man-in-the-middle attack. Apparently the government of Iran has done this.

If we really want to protect the privacy of our users then we should shut down the regular website and serve our content only via a Tor hidden service ;)

-- Tim Starling

Ryan Lane

11:34 p.m.

On Mon, Apr 2, 2012 at 12:33 PM, Tim Starling tstarling@wikimedia.org wrote:

...

On 02/04/12 06:14, Ryan Lane wrote:

...
TL;DR: we have no plans for anonymous HTTPS by default, but will eventually default to HTTPS for logged-in users.

It would require an ssl terminator on every frontend cache. The ssl

terminators eat memory, which is also what the frontend caches do.

Once we enable it by default for logged-in users, we will care a lot more if someone tries to take it down with a DoS attack. Unless the redirection can be disabled without actually logging in, a DoS attack on the HTTPS frontend would prevent any authenticated activity.

It suggests a need for a robust, overprovisioned service, with tools and procedures in place for identifying and blocking or throttling malicious traffic.

Indeed. We're already pretty over provisioned. We have 4 servers per datacenter, each of which is very bored. All they are doing is acting as a transparent proxy, after ssl termination. We're using RC4 by default (due to BEAST), and AES is also available (the processors we are using have AES support).

Ideally we'll be using STS for logged in users. This will mean it's impossible to turn off the redirection for users that have already logged in for whatever period of time we have STS headers set. We need to consider blocking a DoS from the SSL proxies, the LVS servers, or the routers.

...

...

Some countries may completely block HTTPS, but allow HTTP to our

sites so that they can track users. Is it better for us to provide them content, or protect their privacy? 4. It's still possible for governments to see that people are going to wikimedia sites when using HTTPS, so it's still possible to oppress people for trying to visit sites that are disallowed.

It's also possible for governments to snoop on HTTPS communications, by using a private key from a trusted CA to perform a man-in-the-middle attack. Apparently the government of Iran has done this.

We really should publish our certificate fingerprints. An attack like this can be detected. An end-user being attacked can see if the certificate they are being handed is different from the one we advertise. We could also provide a convergence notary service (or one of the other things like convergence).

...

If we really want to protect the privacy of our users then we should shut down the regular website and serve our content only via a Tor hidden service ;)

I agree that it's impossible to provide total protection of a user's privacy. We could provide a number of services that would help users, though. That said, I don't feel this should be on the top of our priority list.

- Ryan

Platonides

3 Apr 3 Apr

2:31 a.m.

On 02/04/12 20:34, Ryan Lane wrote:

...

...
It's also possible for governments to snoop on HTTPS communications, by using a private key from a trusted CA to perform a man-in-the-middle attack. Apparently the government of Iran has done this.

We really should publish our certificate fingerprints. An attack like this can be detected. An end-user being attacked can see if the certificate they are being handed is different from the one we advertise. We could also provide a convergence notary service (or one of the other things like convergence).

Indeed. Detecting a potential MITM is useless if you can't determine if it's real or not. For instance the switch from RapidSSL to DigiCert certificate was quite suspicious.

I don't know how to best publicise it, though. I suppose we would list them somewhere like https://secure.wikimedia.org/servers.html but if nobody knows it's there...

Ryan Lane

2:35 a.m.

...

Indeed. Detecting a potential MITM is useless if you can't determine if it's real or not. For instance the switch from RapidSSL to DigiCert certificate was quite suspicious.

I don't know how to best publicise it, though. I suppose we would list them somewhere like https://secure.wikimedia.org/servers.html but if nobody knows it's there...

What's https://secure.wikimedia.org?

- Ryan

Antoine Musso

4:22 a.m.

On April 2nd, 2012 at 23:35, Ryan Lane wrote:

...

What's https://secure.wikimedia.org?

Some old experiment. Nothing to see here :-)

-- Antoine "hashar" Musso

Platonides

8:26 p.m.

Ryan Lane wrote:

...

What's https://secure.wikimedia.org?

Ryan

The server which contains https://secure.wikimedia.org/keys.html

Helder

8:34 p.m.

On Tue, Apr 3, 2012 at 12:26, Platonides Platonides@gmail.com wrote:

...

Ryan Lane wrote:

...
What's https://secure.wikimedia.org?

Ryan

The server which contains https://secure.wikimedia.org/keys.html

When I access that page, Google Chrome gives this error message: Failed to load resource: the server responded with a status of 404 (Not Found) GET http://en.wikipedia.org/skins-1.5/monobook/headbg.jpg 404 (Not Found)

Best regards, Helder

Petr Bena

8:43 p.m.

Can we move to the initial discussion regarding http redirect to https please :-) That page doesn't contain anything interesting anyway... (Now after saying this I guess that it's gonna have way more visitors than ever, hehe)

On Tue, Apr 3, 2012 at 5:34 PM, Helder helder.wiki@gmail.com wrote:

...

On Tue, Apr 3, 2012 at 12:26, Platonides Platonides@gmail.com wrote:

...
Ryan Lane wrote:

...
What's https://secure.wikimedia.org?

Ryan

The server which contains https://secure.wikimedia.org/keys.html

When I access that page, Google Chrome gives this error message: Failed to load resource: the server responded with a status of 404 (Not Found) GET http://en.wikipedia.org/skins-1.5/monobook/headbg.jpg 404 (Not Found)

Best regards, Helder

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Liangent

5 Apr 5 Apr

7:31 a.m.

On Tue, Apr 3, 2012 at 11:43 PM, Petr Bena benapetr@gmail.com wrote:

...

Can we move to the initial discussion regarding http redirect to https please :-) That page doesn't contain anything interesting anyway... (Now after saying this I guess that it's gonna have way more visitors than ever, hehe)

On Tue, Apr 3, 2012 at 5:34 PM, Helder helder.wiki@gmail.com wrote:

...
On Tue, Apr 3, 2012 at 12:26, Platonides Platonides@gmail.com wrote:

...
Ryan Lane wrote:

...
What's https://secure.wikimedia.org?

Ryan

The server which contains https://secure.wikimedia.org/keys.html

When I access that page, Google Chrome gives this error message: Failed to load resource: the server responded with a status of 404 (Not Found) GET http://en.wikipedia.org/skins-1.5/monobook/headbg.jpg 404 (Not Found)

Best regards, Helder

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Now there're users reporting in village pump that http://bits.wikimedia.org/ is blocked in China Mainland. Users visiting http://zh.wikipedia.org/ see unstyled pages without scripts while https://zh.wikipedia.org/ works fine.

-Liangent

Ryan Lane

11:55 a.m.

...

Now there're users reporting in village pump that http://bits.wikimedia.org/ is blocked in China Mainland. Users visiting http://zh.wikipedia.org/ see unstyled pages without scripts while https://zh.wikipedia.org/ works fine.

Tell them to use https then? Have them petition their Government to stop censoring them?

bits seems like a very strange service to be blocked, rather than commons or wikipedia. Are they *sure* it's being blocked by the government?

- Ryan

Petr Bena

12:24 p.m.

Indeed, I live in europe and bits seems to be very blocked to me :) there is a problem with connectivity I guess, because I am having this problem as well most of time, the service is up, but slow, which makes the load of one page to take minutes

On Thu, Apr 5, 2012 at 8:55 AM, Ryan Lane rlane32@gmail.com wrote:

...

...
Now there're users reporting in village pump that http://bits.wikimedia.org/ is blocked in China Mainland. Users visiting http://zh.wikipedia.org/ see unstyled pages without scripts while https://zh.wikipedia.org/ works fine.

Tell them to use https then? Have them petition their Government to stop censoring them?

bits seems like a very strange service to be blocked, rather than commons or wikipedia. Are they *sure* it's being blocked by the government?

Ryan

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Petr Bena

12:27 p.m.

Actually bits is a great target for the country which wants to prevent people from having access, it makes it look as the problem is on side of "capitalist wikipedia, which is broken most of time" and thus people should use the proper encyclopedias of China which are correct and looking better than crappy css of wikipedia.

On Thu, Apr 5, 2012 at 9:24 AM, Petr Bena benapetr@gmail.com wrote:

...

Indeed, I live in europe and bits seems to be very blocked to me :) there is a problem with connectivity I guess, because I am having this problem as well most of time, the service is up, but slow, which makes the load of one page to take minutes

On Thu, Apr 5, 2012 at 8:55 AM, Ryan Lane rlane32@gmail.com wrote:

...
...
Now there're users reporting in village pump that http://bits.wikimedia.org/ is blocked in China Mainland. Users visiting http://zh.wikipedia.org/ see unstyled pages without scripts while https://zh.wikipedia.org/ works fine.

Tell them to use https then? Have them petition their Government to stop censoring them?

bits seems like a very strange service to be blocked, rather than commons or wikipedia. Are they *sure* it's being blocked by the government?

Ryan

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Liangent

3:52 p.m.

On Thu, Apr 5, 2012 at 3:27 PM, Petr Bena benapetr@gmail.com wrote:

...

Actually bits is a great target for the country which wants to prevent people from having access, it makes it look as the problem is on side of "capitalist wikipedia, which is broken most of time" and thus people should use the proper encyclopedias of China which are correct and looking better than crappy css of wikipedia.

On Thu, Apr 5, 2012 at 9:24 AM, Petr Bena benapetr@gmail.com wrote:

...
Indeed, I live in europe and bits seems to be very blocked to me :) there is a problem with connectivity I guess, because I am having this problem as well most of time, the service is up, but slow, which makes the load of one page to take minutes

On Thu, Apr 5, 2012 at 8:55 AM, Ryan Lane rlane32@gmail.com wrote:

...
...
Now there're users reporting in village pump that http://bits.wikimedia.org/ is blocked in China Mainland. Users visiting http://zh.wikipedia.org/ see unstyled pages without scripts while https://zh.wikipedia.org/ works fine.

Tell them to use https then? Have them petition their Government to stop censoring them?

bits seems like a very strange service to be blocked, rather than commons or wikipedia. Are they *sure* it's being blocked by the government?

Ryan

The original reporter says things are back to normal now (no block on bits anymore). However there're other people saying they see intermittent issues when connecting to upload, bits and commons.

-Liangent

...

...
...

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

4658

Age (days ago)

4662

Last active (days ago)

wikitech-l@lists.wikimedia.org

43 comments

16 participants

tags (0)

participants (16)

Antoine Musso
Bináris
Daniel Friesen
David Gerard
Helder
Leslie Carr
Liangent
Marcin Cieslak
MZMcBride
Petr Bena
Piotr Jagielski
Platonides
Ryan Lane
Svip
Tei
Tim Starling