More. Ideas? I'll email wikitech-l separately.
---------- Forwarded message ---------- From: Adam Baso abaso@wikimedia.org Date: Wed, Apr 16, 2014 at 2:20 PM Subject: Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation To: Wikimedia Mailing List wikimedia-l@lists.wikimedia.org
Great idea!
Anyone on the list know if there's a way to make the debug log facilities do the YYYYMMDD timestamp instead of the longer one?
If not, I suppose we could work to update the core MediaWiki code. [1]
-Adam
1. For those with PHP skills or equivalent, I'm referring to https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba6461.... Scroll to the bottom of the function definition to see the datetimestamp approach.
On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray andrew.gray@dunelm.org.ukwrote:
Hi Adam,
One thought: you don't really need the date/time data at any detailed resolution, do you? If what you're wanting it for is to track major changes ("last month it all switched to this IP") and to purge old data ("delete anything older than 10 March"), you could simply log day rather than datetime.
enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
enwiki / 127.0.0.1 / 123.45 / 2014-04-16
- the latter gives you the data you need while making it a lot harder
to do any kind of close user-identification.
Andrew. On 16 Apr 2014 19:17, "Adam Baso" abaso@wikimedia.org wrote:
Inline.
Thanks for starting this thread.
Sorry if I've overlooked this, but who/what will have access to this
data?
Only members of the mobile team? Local project CheckUsers? Wikimedia Foundation-approved researchers? Wikimedia shell users? AbuseFilter filters?
It's a good question. The thought is to put it in the customary
wfDebugLog
location (with, for example, filename "mccmnc.log") on fluorine.
It just occurred to me that the wiki name (e.g., "enwiki"), but not the full URL, gets logged additionally as part of the wfDebugLog call; to
make
the implicit explicit, wfDebugLog adds a datetime stamp as well, and
that's
useful for purging old records. I'll forward this email to mobile-l and wikitech-l to underscore this.
And this may be a silly question, but is there a reasonable means of approximating how identifying these two data points alone are? That is, Using a mobile country code and exit IP address, is it possible to identify a particular editor or reader? Or perhaps rephrased, is this
data
considered anonymized?
Not a silly question. My approximation is these tuples (datetime, now
that
it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly anonymized, are low identifying (that is, indirect inferences on the data in isolation are unlikely, but technically possible, through examination
of
short tail outliers in a cluster analysis where such readers/editors
exist
in the short tail outliers sets), in contrast to regular web access logs (where direct inferences are easy).
Thanks. I'll forward this along now.
-Adam _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe