SSL makes it more difficult; some private wikis are already restricted to
SSL. We also have to consider that
has a recent changes
At minimum, the transit links should be encrypted if feasible. A good
reason not to encrypt is that it's extra performance overhead.
On Sun, Dec 29, 2013 at 11:10 PM, John Vandenberg <jayvdb(a)gmail.com> wrote:
We know NSA wants Wikipedia data, as Wikipedia is
listed in one of the
That slide is about HTTP, and the tech staff are moving the
user/reader base to HTTPS.
As we learn more about the NSA programs, we need to consider vectors
other than HTTP for the NSA to obtain the data they want. And the
userbase needs to be aware of the current risks.
One question from the "Dells are backdored"[sic] thread that is worth
separate consideration is:
Are the Wikimedia transit links encrypted, especially for database
MySQL has replication over SSL, so I assume the answer is Yes.
If not, is this necessary or useful, and feasible ?
However we also need to consider that SSL and other encryption may be
useless against NSA/etc, which means replicating non-public data
should be avoided wherever possible, as it becomes a single point of
Given how public our system is, we don't have a lot of non-public
data, so we might be able to design the architecture so that
information isnt replicated, and also ensure it isnt accessed over
insecure links. I think the only parts of the dataset that are
private & valuable are
* passwords/login cookies,
* checkuser info - IPs and useragents,
* WMF analytics, which includes readers iirc, and
* hidden/deleted edits
* private wikis and mailing lists
Have I missed any?
Are passwords and/or checkuser info replicated?
Is there a data policy on WMF analytics data which prevents it flowing
over insecure links, and limits what is collected and ensures
destruction of the data within reasonable timeframes? i.e. how about
not using cookies to track analytics of readers who are on HTTP
instead of HTTPS?
The private wikis can be restricted to https, depending on the value
of the data on those wikis in the wrong hands. The private mailing
lists will be harder to secure, and at least the English Wikipedia
arbcom list contain a lot of valuable data about contributors.
Regarding hidden/deleted edits, the replication isnt the only source
of this data. All edits are also exposed via Recent Changes
(https/api/etc) as they occur, and the value of these edits is
determined by the fact they are hidden afterwards (e.g. don't appear
in dumps). Is there any way to control who is effectively capturing
all edits via Recent Changes?
Wikimedia-l mailing list