On Fri, Aug 16, 2013 at 8:04 PM, Zack Weinberg zackw@cmu.edu wrote:
Wikipedia user handle. I realize how disruptive this would be, but I think we need to consider changing the canonical Wikipedia URL format to https://wikipedia.org/**LANGUAGE/PAGENAMEhttps://wikipedia.org/LANGUAGE/PAGENAME .
Note that LANGUAGE.wikipedia.org/VARIANT/PAGENAME is already in use for wikis which use the language variant conversion code, such as zhwiki. Usually LANGUAGE is a prefix of VARIANT, for example zh-hans, zh-hant, en-us, en-gb, sr, sr-ec.
If you wanted to approach this goal, one could start by creating a proxy service at https://secure.wikipedia.org/LANGUAGE-VARIANT/PAGENAME that did an internal proxy of pages from https://LANGUAGE.wikipedia.org/LANGUAGE-VARIANT/PAGENAME. That would allow some low-risk being-bold exploration of the different implications.
This last article raises a critical point. To render Wikipedia genuinely secure against traffic analysis
Whenever someone seems to veer into discussion of absolute security I get nervous. It would be best to begin with asking "how can we make attacks more expensive"?
Given the contents of the most recent NSA document leaks, it seems like it is also worthwhile to attempt to confound the "are we at least 51% certain that this user is not an American" question. It does seem like combining wikis is a worthwhile step here. I wonder if any arbitrary user of zhwiki (for example) would automatically be assumed >51% chance of being non-American.
Random padding, in fact, is no good at all. The adversary can simply
average over many pageloads and extract the true length.
Again, "no good at all" slides into this "absolute security" fallacy. *How much more difficult* does padding make things? *How many* more pageloads? The adversaries with infinite resources can also legally compel the sysop to compromise the server. But can we improve the situation for medium-sized state actors, or raise the bar so that only targeted users can be compromised (instead of passively collecting information on all users)?
As a start on constructing a better threat model, let me offer two scenarios:
a) NSA passive collection of all traffic to/from wikipedia (XKEYSCORE). It would be nice to frustrate this so that (as a start) only traffic from targeted users could be effectively collected -- for example, by requiring an active MIM attack instead of a passive tap.
b) Great Firewall monitoring of specific pages (Tienanmen square, Falun Gong). Can we better protect the identities of readers of these pages? Can we protect the identities of editors? Can we frustrate attempts to block specific pages?
Real world issues should also be taken into account. Methods that prevent the Great Firewall from blocking specific pages might provoke a site-wide block. Efforts to force utilization of the latest browsers (which support some new protocol) might disenfranchise mobile users or users for whom poverty and resource limitations are a bigger threat than coercive government. Etc... --scott