Thanks everyone! Separate from Sam's mapping out the frontend
instrumentation work at
https://phabricator.wikimedia.org/T184793 , I have
created a task for the backend work at
https://phabricator.wikimedia.org/T186728 based on this thread.
Regarding the last few posts about the geolocation information, from the
data analysis perspective, there is indeed another, more serious concern
about using the GeoIP cookie: It will create significant discrepancies with
the existing geolocation data we record for pageviews, where we have chosen
to derive this information from the IP instead. (Remember the overarching
goal here of measuring page previews the same way we measure page views
currently; the basic principle is that if a reader visits a page and then
uses the page preview feature on that page to read preview cards, all the
metadata that is recorded for both should have identical values for both
the preview and the pageview.) Therefore, we should go with the kind of
solution Andrew outlined above (adapting/reusing GetGeoDataUDF or such).
On Thu, Feb 1, 2018 at 7:36 AM, Andrew Otto <otto(a)wikimedia.org> wrote:
Wow Sam, yeah, if this cookie works for you, it will
make many things much
easier for us. Check it out and let us know. If it doesn’t work for some
reason, we can figure out the backend geocoding part.
On Thu, Feb 1, 2018 at 2:43 AM, Sam Smith <samsmith(a)wikimedia.org> wrote:
On Tue, Jan 30, 2018 at 8:02 AM, Andrew Otto
<otto(a)wikimedia.org> wrote:
Using the
GeoIP cookie will require reconfiguring the EventLogging
varnishkafka instance [0]
I’m not familiar with this cookie, but, if we used it, I thought it
would be sent back to by the client in the event. E.g. event.country =
response.headers.country; EventLogging.emit(event);
That way, there’s no additional special logic needed on the server side
to geocode or populate the country in the event.
Hah! I didn't think about accessing the GeoIP cookie on the client. As
you say, the implementation is quite easy.
My only concern with this approach is the duplication of the value
between the cookie, which is sent in every HTTP request to the
/beacon/event endpoint, and the event itself. This duplication seems
reasonable when balanced against capturing either: the client IP and then
doing similar geocoding further along in the pipeline; or the cookie for
all requests to that endpoint and then discarding them further along in the
pipeline. It also reflects a seemingly core principle of the EventLogging
system: that it doesn't capture potentiallly PII by default.
-Sam
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB