Hi all,
The analytics team, in an effort to collect sensitive data less, plans to drop the clientIP field from the EventCapsule( https://meta.wikimedia.org/wiki/Schema:EventCapsule), which is the wrapper for all events flowing into Eventlogging (Currently IPs and User Agents get purged after the 90 days mark). The field was originally meant only for debugging, but has served some research usecases. Most of these cases have been wrapped up at this point. It has also been used as a proxy to count number of devices visiting sites like our blog - and since IP's are not a good measure of that anyway - we plan to move such cases to use Piwik.
The rollout of the change will happen in stages (Drop clientIPs first on the EL end, then the EventCapsule in meta, and finally on the VarnishKafka end). It should be a clean deployment and there's no scheduled downtime - EL will keep working as is. What does change? ClientIP's will start being set as NULL in your mysql tables. If you update the Eventlogging schema you maintain - causing new tables to be created, the new tables will not have the clientIp field in them. The change is planned to be rolled out the week of 11th or 18th March '16, pending the completion of data collection for the ongoing QuickSurveys based research work.
Let us know if you have any questions/concerns on the list or on #wikimedia-analytics. The related phab ticket is here - https://phabricator.wikimedia.org/T128407.
Thanks, Madhu Viswanathan Software Engineer, Analytics
Thanks Madhu -- it's great to see the analytics team working proactively on things like this.
-Toby
On Wed, Mar 2, 2016 at 10:18 AM, Madhumitha Viswanathan < mviswanathan@wikimedia.org> wrote:
Hi all,
The analytics team, in an effort to collect sensitive data less, plans to drop the clientIP field from the EventCapsule( https://meta.wikimedia.org/wiki/Schema:EventCapsule), which is the wrapper for all events flowing into Eventlogging (Currently IPs and User Agents get purged after the 90 days mark). The field was originally meant only for debugging, but has served some research usecases. Most of these cases have been wrapped up at this point. It has also been used as a proxy to count number of devices visiting sites like our blog - and since IP's are not a good measure of that anyway - we plan to move such cases to use Piwik.
The rollout of the change will happen in stages (Drop clientIPs first on the EL end, then the EventCapsule in meta, and finally on the VarnishKafka end). It should be a clean deployment and there's no scheduled downtime - EL will keep working as is. What does change? ClientIP's will start being set as NULL in your mysql tables. If you update the Eventlogging schema you maintain - causing new tables to be created, the new tables will not have the clientIp field in them. The change is planned to be rolled out the week of 11th or 18th March '16, pending the completion of data collection for the ongoing QuickSurveys based research work.
Let us know if you have any questions/concerns on the list or on #wikimedia-analytics. The related phab ticket is here - https://phabricator.wikimedia.org/T128407.
Thanks, Madhu Viswanathan Software Engineer, Analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks Toby!
This change will be deployed today.
On Wed, Mar 2, 2016 at 11:53 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Thanks Madhu -- it's great to see the analytics team working proactively on things like this.
-Toby
On Wed, Mar 2, 2016 at 10:18 AM, Madhumitha Viswanathan < mviswanathan@wikimedia.org> wrote:
Hi all,
The analytics team, in an effort to collect sensitive data less, plans to drop the clientIP field from the EventCapsule( https://meta.wikimedia.org/wiki/Schema:EventCapsule), which is the wrapper for all events flowing into Eventlogging (Currently IPs and User Agents get purged after the 90 days mark). The field was originally meant only for debugging, but has served some research usecases. Most of these cases have been wrapped up at this point. It has also been used as a proxy to count number of devices visiting sites like our blog - and since IP's are not a good measure of that anyway - we plan to move such cases to use Piwik.
The rollout of the change will happen in stages (Drop clientIPs first on the EL end, then the EventCapsule in meta, and finally on the VarnishKafka end). It should be a clean deployment and there's no scheduled downtime - EL will keep working as is. What does change? ClientIP's will start being set as NULL in your mysql tables. If you update the Eventlogging schema you maintain - causing new tables to be created, the new tables will not have the clientIp field in them. The change is planned to be rolled out the week of 11th or 18th March '16, pending the completion of data collection for the ongoing QuickSurveys based research work.
Let us know if you have any questions/concerns on the list or on #wikimedia-analytics. The related phab ticket is here - https://phabricator.wikimedia.org/T128407.
Thanks, Madhu Viswanathan Software Engineer, Analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
All done. Hashed Client IP's are not being collected anymore on Eventlogging- Varnishkafka is not picking it up from Varnish - and no IPs all the way to the mysql/hadoop end of things.
Thanks all.
On Tue, Mar 8, 2016 at 9:01 AM, Madhumitha Viswanathan < mviswanathan@wikimedia.org> wrote:
Thanks Toby!
This change will be deployed today.
On Wed, Mar 2, 2016 at 11:53 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Thanks Madhu -- it's great to see the analytics team working proactively on things like this.
-Toby
On Wed, Mar 2, 2016 at 10:18 AM, Madhumitha Viswanathan < mviswanathan@wikimedia.org> wrote:
Hi all,
The analytics team, in an effort to collect sensitive data less, plans to drop the clientIP field from the EventCapsule( https://meta.wikimedia.org/wiki/Schema:EventCapsule), which is the wrapper for all events flowing into Eventlogging (Currently IPs and User Agents get purged after the 90 days mark). The field was originally meant only for debugging, but has served some research usecases. Most of these cases have been wrapped up at this point. It has also been used as a proxy to count number of devices visiting sites like our blog - and since IP's are not a good measure of that anyway - we plan to move such cases to use Piwik.
The rollout of the change will happen in stages (Drop clientIPs first on the EL end, then the EventCapsule in meta, and finally on the VarnishKafka end). It should be a clean deployment and there's no scheduled downtime - EL will keep working as is. What does change? ClientIP's will start being set as NULL in your mysql tables. If you update the Eventlogging schema you maintain - causing new tables to be created, the new tables will not have the clientIp field in them. The change is planned to be rolled out the week of 11th or 18th March '16, pending the completion of data collection for the ongoing QuickSurveys based research work.
Let us know if you have any questions/concerns on the list or on #wikimedia-analytics. The related phab ticket is here - https://phabricator.wikimedia.org/T128407.
Thanks, Madhu Viswanathan Software Engineer, Analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- --Madhu :)
<3 thank you to everyone involved in this :)