On Tue, Mar 1, 2016 at 10:49 PM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
This change should fix this, while preserving the privacy of our readers browsing content over HTTPS.
That depends greatly on what you mean by readers privacy. By definition referrers violate the privacy of reading the web, so great care should be taken when talking about privacy at the same time as referrers to avoid giving the wrong impression. I guess you mean HTTPS provides a lot of privacy, and referrers remove only a small amount of privacy. While these referrers were 'normal' in HTTP days, they are now explicitly a Wikimedia choice to send user navigation information to non Wikimedia servers, and often users have no idea about it. Before it was HTTP to blame; now Wikimedia is responsible.
If a link appears only once on Polish Wikipedia, then a referrer of 'pl.wikipedia.org' (i.e. just the hostname) is sufficient for the external webserver to know exactly which page the user was reading.
I suspect the medium appearances of an external link is 1 or 2 pages per wiki, which means referrers actually remove a lot of reader privacy. If I recall correctly, over 50% of webpage viewed have Google Analytics on them, which means Google could identify maybe 33% of the Wikipedia pages you click a link on. That is a lot of possible reader profiling.
It would be really interesting to see some statistics of the percentage of clicks on Wikimedia hosted pages where the click identifies what page the reader was on. Also worth exploring is how many external links have Google Analytics (or similar?) in the page.