FYI, added sampling
---------- Forwarded message ----------
From: Adam Baso <abaso(a)wikimedia.org>
Date: Fri, May 2, 2014 at 1:16 PM
Subject: Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Federico asked if sampling might make sense here. I think it will work, so
I've updated the patchset.
>From a patchset comment I provided:
"It's possible we may have situations where operators have not lots of
users on them accessing Wiki(m|p)edia properties, so we do run some risk of
actually missing IPs, even if exit IPs are concentrators of typically large
sets of users. That said, let's try a 2% sample ratio; and if we find out
it's insufficient, then we'll sample more, if it's oversampling, then we
can adjust the other way, too. New patchset arriving shortly."
(I've since submitted the updated code for review.)
-Adam
On Thu, May 1, 2014 at 7:52 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
> After examining this, it looks like EventLogging is more suited to the
> logging task than debug logging and the trappings of needing to alter debug
> logging in the core MediaWiki software.
>
> EventLogging logs at the resolution of a second (instead of a day), but
> has inbuilt support for record removal after 90 days.
>
> Please do let us know in case of further questions. Here's the logging
> schema for those with an interest:
>
> https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode
>
> Here's the relevant server code:
>
> https://gerrit.wikimedia.org/r/#/c/130991/
>
> -Adam
>
>
>
>
> On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
>
>> Great idea!
>>
>> Anyone on the list know if there's a way to make the debug log facilities
>> do the YYYYMMDD timestamp instead of the longer one?
>>
>> If not, I suppose we could work to update the core MediaWiki code. [1]
>>
>> -Adam
>>
>> 1. For those with PHP skills or equivalent, I'm referring to
>> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba646….
>> Scroll to the bottom of the function definition to see the datetimestamp
>> approach.
>>
>>
>> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray <andrew.gray(a)dunelm.org.uk>wrote:
>>
>>> Hi Adam,
>>>
>>> One thought: you don't really need the date/time data at any detailed
>>> resolution, do you? If what you're wanting it for is to track major
>>> changes ("last month it all switched to this IP") and to purge old
>>> data ("delete anything older than 10 March"), you could simply log day
>>> rather than datetime.
>>>
>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>>>
>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>>>
>>> - the latter gives you the data you need while making it a lot harder
>>> to do any kind of close user-identification.
>>>
>>> Andrew.
>>> On 16 Apr 2014 19:17, "Adam Baso" <abaso(a)wikimedia.org> wrote:
>>>
>>> > Inline.
>>> >
>>> > Thanks for starting this thread.
>>> > >
>>> > > Sorry if I've overlooked this, but who/what will have access to this
>>> > data?
>>> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
>>> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
>>> > > filters?
>>> > >
>>> >
>>> > It's a good question. The thought is to put it in the customary
>>> wfDebugLog
>>> > location (with, for example, filename "mccmnc.log") on fluorine.
>>> >
>>> > It just occurred to me that the wiki name (e.g., "enwiki"), but not the
>>> > full URL, gets logged additionally as part of the wfDebugLog call; to
>>> make
>>> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
>>> that's
>>> > useful for purging old records. I'll forward this email to mobile-l and
>>> > wikitech-l to underscore this.
>>> >
>>> >
>>> > > And this may be a silly question, but is there a reasonable means of
>>> > > approximating how identifying these two data points alone are? That
>>> is,
>>> > > Using a mobile country code and exit IP address, is it possible to
>>> > > identify a particular editor or reader? Or perhaps rephrased, is this
>>> > data
>>> > > considered anonymized?
>>> > >
>>> >
>>> > Not a silly question. My approximation is these tuples (datetime, now
>>> that
>>> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
>>> > anonymized, are low identifying (that is, indirect inferences on the
>>> data
>>> > in isolation are unlikely, but technically possible, through
>>> examination of
>>> > short tail outliers in a cluster analysis where such readers/editors
>>> exist
>>> > in the short tail outliers sets), in contrast to regular web access
>>> logs
>>> > (where direct inferences are easy).
>>> >
>>> > Thanks. I'll forward this along now.
>>> >
>>> > -Adam
>>> > _______________________________________________
>>> > Wikimedia-l mailing list
>>> > Wikimedia-l(a)lists.wikimedia.org
>>> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>>> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>> _______________________________________________
>>> Wikimedia-l mailing list
>>> Wikimedia-l(a)lists.wikimedia.org
>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>>> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>>
>>
>>
>
Update.
---------- Forwarded message ----------
From: Adam Baso <abaso(a)wikimedia.org>
Date: Thu, May 1, 2014 at 7:52 PM
Subject: Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
After examining this, it looks like EventLogging is more suited to the
logging task than debug logging and the trappings of needing to alter debug
logging in the core MediaWiki software.
EventLogging logs at the resolution of a second (instead of a day), but has
inbuilt support for record removal after 90 days.
Please do let us know in case of further questions. Here's the logging
schema for those with an interest:
https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode
Here's the relevant server code:
https://gerrit.wikimedia.org/r/#/c/130991/
-Adam
On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
> Great idea!
>
> Anyone on the list know if there's a way to make the debug log facilities
> do the YYYYMMDD timestamp instead of the longer one?
>
> If not, I suppose we could work to update the core MediaWiki code. [1]
>
> -Adam
>
> 1. For those with PHP skills or equivalent, I'm referring to
> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba646….
> Scroll to the bottom of the function definition to see the datetimestamp
> approach.
>
>
> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray <andrew.gray(a)dunelm.org.uk>wrote:
>
>> Hi Adam,
>>
>> One thought: you don't really need the date/time data at any detailed
>> resolution, do you? If what you're wanting it for is to track major
>> changes ("last month it all switched to this IP") and to purge old
>> data ("delete anything older than 10 March"), you could simply log day
>> rather than datetime.
>>
>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>>
>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>>
>> - the latter gives you the data you need while making it a lot harder
>> to do any kind of close user-identification.
>>
>> Andrew.
>> On 16 Apr 2014 19:17, "Adam Baso" <abaso(a)wikimedia.org> wrote:
>>
>> > Inline.
>> >
>> > Thanks for starting this thread.
>> > >
>> > > Sorry if I've overlooked this, but who/what will have access to this
>> > data?
>> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
>> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
>> > > filters?
>> > >
>> >
>> > It's a good question. The thought is to put it in the customary
>> wfDebugLog
>> > location (with, for example, filename "mccmnc.log") on fluorine.
>> >
>> > It just occurred to me that the wiki name (e.g., "enwiki"), but not the
>> > full URL, gets logged additionally as part of the wfDebugLog call; to
>> make
>> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
>> that's
>> > useful for purging old records. I'll forward this email to mobile-l and
>> > wikitech-l to underscore this.
>> >
>> >
>> > > And this may be a silly question, but is there a reasonable means of
>> > > approximating how identifying these two data points alone are? That
>> is,
>> > > Using a mobile country code and exit IP address, is it possible to
>> > > identify a particular editor or reader? Or perhaps rephrased, is this
>> > data
>> > > considered anonymized?
>> > >
>> >
>> > Not a silly question. My approximation is these tuples (datetime, now
>> that
>> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
>> > anonymized, are low identifying (that is, indirect inferences on the
>> data
>> > in isolation are unlikely, but technically possible, through
>> examination of
>> > short tail outliers in a cluster analysis where such readers/editors
>> exist
>> > in the short tail outliers sets), in contrast to regular web access logs
>> > (where direct inferences are easy).
>> >
>> > Thanks. I'll forward this along now.
>> >
>> > -Adam
>> > _______________________________________________
>> > Wikimedia-l mailing list
>> > Wikimedia-l(a)lists.wikimedia.org
>> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>> _______________________________________________
>> Wikimedia-l mailing list
>> Wikimedia-l(a)lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>
>
>
hi,
is there a possibility to get a banner on enwp for ghana to wiki loves
earth, as this is this years main contest there?
https://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Earth_2014_in_Ghana
rupert
---------- Forwarded message ----------
From: Enock Seth Nyamador <kwadzo459(a)gmail.com>
Date: Thu, May 1, 2014 at 11:37 AM
Subject: Re: [Wikimedia-GH] Wiki Loves Earth Begins!
To: Planning Wikimedia Ghana Chapter <wikimedia-gh(a)lists.wikimedia.org>
Here is our poster:
Regards,
Enock
enwp.org/User:Enock4seth
On Thu, May 1, 2014 at 1:47 AM, Enock Seth Nyamador <kwadzo459(a)gmail.com> wrote:
>
> Hi All,
>
> Wiki Loves Earth has started, you can now upload your photos, here.
>
> FYI anyone reading Wikipedia and Wikimedia Commons from Ghana (specifically Ghanaian IP's) whether logged in or not will see the image below:
>
>
> Regards,
>
> Enock
> enwp.org/User:Enock4seth
_______________________________________________
Wikimedia-GH mailing list
Wikimedia-GH(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-gh
Dear all,
sorry for cross-posting but I just have sent all participants of the
Hackathon a PDF with travel information and a personalized public
transport ticket.
Should you plan to attend the Hackathon but have not received the Travel
Information mail, then please contact me.
Thanks and regards,
Manuel
--
Wikimedia CH - Verein zur Förderung Freien Wissens
Lausanne, +41 (21) 34066-22 - www.wikimedia.ch
Hi,
Thanks to all who have submitted their comments on the proposed page
namespace association handling and the necessary database schemas:
https://www.mediawiki.org/wiki/Requests_for_comment/Associated_namespaceshttps://www.mediawiki.org/wiki/Requests_for_comment/Associated_namespaces/D…
As you can see, it is a long, long standing issue, with a trail of open bug
reports and a pile crushed hopes... let's change that.
If you spot any potential problem with the idea in general, or with any of
your uses/extension in particular, now it is the moment to speak out and
post on the talk page.
In order to keep moving things forward, could anyone help me draft the
page namespace registration API for extensions? Or if you want to step
forward and provide a proposal, even better.
Thanks,
Micru