On 11/07/2017 01:44 AM, Mikhail Popov wrote:
By the way, the referer header would only have the
search query if the user
was using Google/Bing/etc. over HTTP, not HTTPS. For Google searchers using
HTTPS, we'd only see they came from "https://www.google.com/", due to
Google's "origin" meta referer setting (
https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin)
Since Google & Bing force you into HTTPS, we actually only end up with
search queries from a few people who use very out of date browsers that
don't support meta referer or HTTPS, since the latest versions of major
browsers now do (
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer#Browser_c…)
So keep in mind that any retrieved data would be unrepresentative of
overall population, but it doesn't look like Lars is planning to do any
statistical analysis.
Correct, though I would be open to someone with the skill and interest
helping, I don't plan to do any statistical analysis myself.
About the Referer header, from what I read the header is not sent only
if "an unsecured HTTP request is used and the referring page was
received with a secure protocol (HTTPS)" [1] That should be rare since
the search engines now redirect to HTTPS right away and people would be
entering their search terms in a form submitted over HTTPS.
A spot check of a few browsers shows the Refer header in use for these
at least when using HTTPS:
Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101
Firefox/56.0
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
Firefox/52.0
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML,
like Gecko) QupZilla/1.8.9 Safari/538.1
Mozilla/5.0 (X11; CrOS x86_64 9765.85.0) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/61.0.3163.123 Safari/537.36
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75
Safari/537.36
Lynx/2.8.9dev.11 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/3.5.6
The Referrer Policy is in the draft stage [2] and might or might not
affect source web sites in the future, but if it does it looks like it
is a very long way from becoming widely deployed, years if ever. So it
is unlikely to be a factor in Q1 2018 or Q2 2018
Another thing I'd note is that a search term may
still contain sensitive
information even outside the context of the rest of the search query. A
phone number or an email address might show up as a single search term, and
that's still PII.
It may be possible. Any suggestions on work-arounds other than manual
intervention on the database results?
/Lars
[1]
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer
[2]
https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin