[Wikimedia-l] Data privacy, encrypted links and recent change captures

Jasper Deng jasper at jasperswebsite.com
Mon Dec 30 07:19:04 UTC 2013


SSL makes it more difficult; some private wikis are already restricted to
SSL. We also have to consider that irc.wikimedia.org has a recent changes
feed.

At minimum, the transit links should be encrypted if feasible. A good
reason not to encrypt is that it's extra performance overhead.


On Sun, Dec 29, 2013 at 11:10 PM, John Vandenberg <jayvdb at gmail.com> wrote:

> We know NSA wants Wikipedia data, as Wikipedia is listed in one of the
> NSA slides:
>
> https://commons.wikimedia.org/wiki/File:KS8-001.jpg
>
> That slide is about HTTP, and the tech staff are moving the
> user/reader base to HTTPS.
>
> As we learn more about the NSA programs, we need to consider vectors
> other than HTTP for the NSA to obtain the data they want.  And the
> userbase needs to be aware of the current risks.
>
> One question from the "Dells are backdored"[sic] thread that is worth
> separate consideration is:
>
> Are the Wikimedia transit links encrypted, especially for database
> replication?
> MySQL has replication over SSL, so I assume the answer is Yes.
>
> If not, is this necessary or useful, and feasible ?
>
> However we also need to consider that SSL and other encryption may be
> useless against NSA/etc, which means replicating non-public data
> should be avoided wherever possible, as it becomes a single point of
> failure.
>
> Given how public our system is, we don't have a lot of non-public
> data, so we might be able to design the architecture so that
> information isnt replicated, and also ensure it isnt accessed over
> insecure links.  I think the only parts of the dataset that are
> private & valuable are
> * passwords/login cookies,
> * checkuser info - IPs and useragents,
> * WMF analytics, which includes readers iirc, and
> * hidden/deleted edits
> * private wikis and mailing lists
>
> Have I missed any?
>
> Are passwords and/or checkuser info replicated?
>
> Is there a data policy on WMF analytics data which prevents it flowing
> over insecure links, and limits what is collected and ensures
> destruction of the data within reasonable timeframes?  i.e. how about
> not using cookies to track analytics of readers who are on HTTP
> instead of HTTPS?
>
> The private wikis can be restricted to https, depending on the value
> of the data on those wikis in the wrong hands.  The private mailing
> lists will be harder to secure, and at least the English Wikipedia
> arbcom list contain a lot of valuable data about contributors.
>
> Regarding hidden/deleted edits, the replication isnt the only source
> of this data.  All edits are also exposed via Recent Changes
> (https/api/etc) as they occur, and the value of these edits is
> determined by the fact they are hidden afterwards (e.g. don't appear
> in dumps).  Is there any way to control who is effectively capturing
> all edits via Recent Changes?
>
> --
> John Vandenberg
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request at lists.wikimedia.org?subject=unsubscribe>


More information about the Wikimedia-l mailing list