By the way, the referer header would only have the search query if the user was using Google/Bing/etc. over HTTP, not HTTPS. For Google searchers using HTTPS, we'd only see they came from "https://www.google.com/", due to Google's "origin" meta referer setting ( https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin)
Since Google & Bing force you into HTTPS, we actually only end up with search queries from a few people who use very out of date browsers that don't support meta referer or HTTPS, since the latest versions of major browsers now do ( https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer#Browser_co...) So keep in mind that any retrieved data would be unrepresentative of overall population, but it doesn't look like Lars is planning to do any statistical analysis.
Another thing I'd note is that a search term may still contain sensitive information even outside the context of the rest of the search query. A phone number or an email address might show up as a single search term, and that's still PII.
- Mikhail
On Fri, Nov 3, 2017 at 10:02 AM, Lars Noodén lars.nooden@gmail.com wrote:
On 11/03/2017 04:12 PM, Leila Zia wrote: [snip]
I assume by establishing a project you mean finding a way to get access
to
the data that your research proposal is going to use. If that is
correct:
Yes.
I now have a preliminary draft of a proposal:
https://meta.wikimedia.org/wiki/Research:Finding_Search_ Engine_Terms_Used_to_Retrieve_Wikibooks
I will review this page and get back to you next week. To set expectations: all I can promise is that we will review the page and
discuss
if we can find a light-weight format to help you with it. I can't promise that we can actually make it happen as the resources are very tight on
our
end. We will do our best.
Thanks. I appreciate it.
The ticket for tracking this task is https://phabricator.wikimedia.org/T179693 .
Excellent.
/Lars
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics