By the way, the referer header would only have the search query if the user
was using Google/Bing/etc. over HTTP, not HTTPS. For Google searchers using
HTTPS, we'd only see they came from "https://www.google.com/", due to
Google's "origin" meta referer setting (
https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin)
Since Google & Bing force you into HTTPS, we actually only end up with
search queries from a few people who use very out of date browsers that
don't support meta referer or HTTPS, since the latest versions of major
browsers now do (
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer#Browser_c…)
So keep in mind that any retrieved data would be unrepresentative of
overall population, but it doesn't look like Lars is planning to do any
statistical analysis.
Another thing I'd note is that a search term may still contain sensitive
information even outside the context of the rest of the search query. A
phone number or an email address might show up as a single search term, and
that's still PII.
- Mikhail
On Fri, Nov 3, 2017 at 10:02 AM, Lars Noodén <lars.nooden(a)gmail.com> wrote:
On 11/03/2017 04:12 PM, Leila Zia wrote:
[snip]
I assume by establishing a project you mean
finding a way to get access
to
the data that your research proposal is going to
use. If that is
correct:
Yes.
I now
have a preliminary draft of a proposal:
https://meta.wikimedia.org/wiki/Research:Finding_Search_
Engine_Terms_Used_to_Retrieve_Wikibooks
I will review this page and get back to you next week. To set
expectations: all I can promise is that we will review the page and
discuss
if we can find a light-weight format to help you
with it. I can't promise
that we can actually make it happen as the resources are very tight on
our
end. We will do our best.
Thanks. I appreciate it.
Excellent.
/Lars
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics