By the way, the referer header would only have the search query if the user was using Google/Bing/etc. over HTTP, not HTTPS. For Google searchers using HTTPS, we'd only see they came from "https://www.google.com/", due to Google's "origin" meta referer setting (https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin)

Since Google & Bing force you into HTTPS, we actually only end up with search queries from a few people who use very out of date browsers that don't support meta referer or HTTPS, since the latest versions of major browsers now do (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer#Browser_compatibility) So keep in mind that any retrieved data would be unrepresentative of overall population, but it doesn't look like Lars is planning to do any statistical analysis.

Another thing I'd note is that a search term may still contain sensitive information even outside the context of the rest of the search query. A phone number or an email address might show up as a single search term, and that's still PII.

- Mikhail

On Fri, Nov 3, 2017 at 10:02 AM, Lars Noodén <lars.nooden@gmail.com> wrote:
On 11/03/2017 04:12 PM, Leila Zia wrote:
[snip]
> ​I assume by establishing a project you mean finding a way to get access to
> the data that your research proposal is going to use. If that is correct:​

Yes.

>> I now have a preliminary draft of a proposal:
>>
>> https://meta.wikimedia.org/wiki/Research:Finding_Search_
>> Engine_Terms_Used_to_Retrieve_Wikibooks
>
>
> ​I will review this page and get back to you next week. To set
> expectations: all I can promise is that we will review the page and discuss
> if we can find a light-weight format to help you with it. I can't promise
> that we can actually make it happen as the resources are very tight on our
> end. We will do our best.

Thanks.  I appreciate it.

> The ticket for tracking this task is
> https://phabricator.wikimedia.org/T179693 .

Excellent.

/Lars

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics