On 11/07/2017 01:44 AM, Mikhail Popov wrote:
By the way, the referer header would only have the search query if the user was using Google/Bing/etc. over HTTP, not HTTPS. For Google searchers using HTTPS, we'd only see they came from "https://www.google.com/", due to Google's "origin" meta referer setting ( https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin)
Since Google & Bing force you into HTTPS, we actually only end up with search queries from a few people who use very out of date browsers that don't support meta referer or HTTPS, since the latest versions of major browsers now do ( https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer#Browser_co...) So keep in mind that any retrieved data would be unrepresentative of overall population, but it doesn't look like Lars is planning to do any statistical analysis.
Correct, though I would be open to someone with the skill and interest helping, I don't plan to do any statistical analysis myself.
About the Referer header, from what I read the header is not sent only if "an unsecured HTTP request is used and the referring page was received with a secure protocol (HTTPS)" [1] That should be rare since the search engines now redirect to HTTPS right away and people would be entering their search terms in a form submitted over HTTPS.
A spot check of a few browsers shows the Refer header in use for these at least when using HTTPS:
Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) QupZilla/1.8.9 Safari/538.1 Mozilla/5.0 (X11; CrOS x86_64 9765.85.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.123 Safari/537.36 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36 Lynx/2.8.9dev.11 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/3.5.6
The Referrer Policy is in the draft stage [2] and might or might not affect source web sites in the future, but if it does it looks like it is a very long way from becoming widely deployed, years if ever. So it is unlikely to be a factor in Q1 2018 or Q2 2018
Another thing I'd note is that a search term may still contain sensitive information even outside the context of the rest of the search query. A phone number or an email address might show up as a single search term, and that's still PII.
It may be possible. Any suggestions on work-arounds other than manual intervention on the database results?
/Lars
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer
[2] https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin