[Engineering] [Analytics] [Eventlogging] Dropping Client IPs from EventCapsule

Madhumitha Viswanathan mviswanathan at wikimedia.org
Wed Mar 2 18:18:58 UTC 2016


Hi all,

The analytics team, in an effort to collect sensitive data less, plans to
drop the clientIP field from the EventCapsule(
https://meta.wikimedia.org/wiki/Schema:EventCapsule), which is the wrapper
for all events flowing into Eventlogging (Currently IPs and User Agents get
purged after the 90 days mark). The field was originally meant only for
debugging, but has served some research usecases. Most of these cases have
been wrapped up at this point. It has also been used as a proxy to count
number of devices visiting sites like our blog - and since IP's are not a
good measure of that anyway - we plan to move such cases to use Piwik.

The rollout of the change will happen in stages (Drop clientIPs first on
the EL end, then the EventCapsule in meta, and finally on the VarnishKafka
end). It should be a clean deployment and there's no scheduled downtime -
EL will keep working as is. What does change? ClientIP's will start being
set as NULL in your mysql tables. If you update the Eventlogging schema you
maintain - causing new tables to be created, the new tables will not have
the clientIp field in them. The change is planned to be rolled out the week
of 11th or 18th March '16, pending the completion of data collection for
the ongoing QuickSurveys based research work.

Let us know if you have any questions/concerns on the list or on
#wikimedia-analytics. The related phab ticket is here -
https://phabricator.wikimedia.org/T128407.

Thanks,
Madhu Viswanathan
Software Engineer, Analytics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/engineering/attachments/20160302/d21461b4/attachment.html>


More information about the Engineering mailing list