After examining this, it looks like EventLogging
is more suited to the
logging task than debug logging and the trappings of needing to alter debug
logging in the core MediaWiki software.
EventLogging logs at the resolution of a second (instead of a day), but
has inbuilt support for record removal after 90 days.
Please do let us know in case of further questions. Here's the logging
schema for those with an interest:
-Adam
On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
Great idea!
Anyone on the list know if there's a way to make the debug log
facilities do the YYYYMMDD timestamp instead of the longer one?
If not, I suppose we could work to update the core MediaWiki code. [1]
-Adam
1. For those with PHP skills or equivalent, I'm referring to
https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba646….
Scroll to the bottom of the function definition to see the datetimestamp
approach.
On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray <
andrew.gray(a)dunelm.org.uk> wrote:
> Hi Adam,
>
> One thought: you don't really need the date/time data at any detailed
> resolution, do you? If what you're wanting it for is to track major
> changes ("last month it all switched to this IP") and to purge old
> data ("delete anything older than 10 March"), you could simply log day
> rather than datetime.
>
> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>
> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>
> - the latter gives you the data you need while making it a lot harder
> to do any kind of close user-identification.
>
> Andrew.
> On 16 Apr 2014 19:17, "Adam Baso" <abaso(a)wikimedia.org> wrote:
>
> > Inline.
> >
> > Thanks for starting this thread.
> > >
> > > Sorry if I've overlooked this, but who/what will have access to
> this
> > data?
> > > Only members of the mobile team? Local project CheckUsers?
> Wikimedia
> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
> > > filters?
> > >
> >
> > It's a good question. The thought is to put it in the customary
> wfDebugLog
> > location (with, for example, filename "mccmnc.log") on fluorine.
> >
> > It just occurred to me that the wiki name (e.g., "enwiki"), but not
> the
> > full URL, gets logged additionally as part of the wfDebugLog call;
> to make
> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
> that's
> > useful for purging old records. I'll forward this email to mobile-l
> and
> > wikitech-l to underscore this.
> >
> >
> > > And this may be a silly question, but is there a reasonable means
> of
> > > approximating how identifying these two data points alone are?
> That is,
> > > Using a mobile country code and exit IP address, is it possible to
> > > identify a particular editor or reader? Or perhaps rephrased, is
> this
> > data
> > > considered anonymized?
> > >
> >
> > Not a silly question. My approximation is these tuples (datetime,
> now that
> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not
> perfectly
> > anonymized, are low identifying (that is, indirect inferences on the
> data
> > in isolation are unlikely, but technically possible, through
> examination of
> > short tail outliers in a cluster analysis where such readers/editors
> exist
> > in the short tail outliers sets), in contrast to regular web access
> logs
> > (where direct inferences are easy).
> >
> > Thanks. I'll forward this along now.
> >
> > -Adam
> > _______________________________________________
> > Wikimedia-l mailing list
> > Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe:
>
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>