On Tue, Mar 1, 2016 at 10:49 PM, Dario Taraborelli
<dtaraborelli(a)wikimedia.org> wrote:
This change should fix this, while preserving the
privacy of our readers
browsing content over HTTPS.
That depends greatly on what you mean by readers privacy. By
definition referrers violate the privacy of reading the web, so great
care should be taken when talking about privacy at the same time as
referrers to avoid giving the wrong impression. I guess you mean
HTTPS provides a lot of privacy, and referrers remove only a small
amount of privacy. While these referrers were 'normal' in HTTP days,
they are now explicitly a Wikimedia choice to send user navigation
information to non Wikimedia servers, and often users have no idea
about it. Before it was HTTP to blame; now Wikimedia is responsible.
If a link appears only once on Polish Wikipedia, then a referrer of
'pl.wikipedia.org' (i.e. just the hostname) is sufficient for the
external webserver to know exactly which page the user was reading.
I suspect the medium appearances of an external link is 1 or 2 pages
per wiki, which means referrers actually remove a lot of reader
privacy. If I recall correctly, over 50% of webpage viewed have
Google Analytics on them, which means Google could identify maybe 33%
of the Wikipedia pages you click a link on. That is a lot of possible
reader profiling.
It would be really interesting to see some statistics of the
percentage of clicks on Wikimedia hosted pages where the click
identifies what page the reader was on.
Also worth exploring is how many external links have Google Analytics
(or similar?) in the page.
--
John Vandenberg