Hello -

I work on the mobile partner engineering team, probably known better as the Wikipedia Zero people.

If our servers see that an IP address matches with a given partner mobile network operator (MNO), the pertinent Wikipedia Zero text banner is shown to the mobile web user; additionally, offsite links are rewritten to warn a user when s/he may be entering into a zone that requires data access charges.

This is all fine and well, but partner operators sometimes change IP addresses and our systems get out of sync. Our partners are busy and our partner management team does of course strive to work with partners to proactively manage IP and other technical updates, yet inevitably information can fall through the cracks. Consequently, when IPs drift, the system doesn't show banners and do URL rewriting as well as it could.

In order to in part more proactively remediate the drift of operator exit IP addresses, we're interested in logging two pieces of information server-side via the forthcoming rewrites of the Wikipedia for Android and Wikipedia for iOS apps, and in some future state a rewritten Firefox OS app:

(1) MCC-MNC identification code of operator if present (e.g., 123-45 if the connection is on a particular operator - MCC-MNC is in the format ###-##)
(2) Exit IP address (typically, gateway/proxy in MNO infrastructure)

The MCC-MNC identification code is embedded on SIM cards and accessible by routine app APIs.

We would not want to log this information alongside the other Apache webserver-style elements, but instead have just these two columns in a separate nonpublic file location, purging records older than 90 days.

The thought is to just have the app code add the MCC-MNC value to an HTTP header once in a given app session using an MCC-MNC bearing connection (cellular data), and let the server detect the IP as per normal server operation.

In a nutshell, after normalizing the MCC/MNC codes (some are likely to be malformed) and cross-checking against our own MCC/MNC database, we'd be able to see if the IPs are askew, and then reach out to operators to ask if they have any updated IPs since the last time we received an official update.

We think this is a simple and fairly easy way to observe updated IP addresses for operator partners, and prompt the partner management team to reach out to operators for updated official source IP addresses.

The information could incidentally be useful in gauging rough demand for Wikipedia in markets germane to Wikipedia Zero (e.g., higher data costs, lower disposable income), although that is secondary to keeping the IPs accurate.

Internal review suggests this is in alignment with privacy policy, and we wanted to see if there were other thoughts on this approach. We plan to move the discussion over to wikitech-l and then later a broader list, but to avoid cross-posting problems with people having one membership in one list but not another, wanted to start on mobile-l first.

One last thing - the set of IPs for a given MNO are relatively small. For example, an MNO may have 100 IP addresses representing 1 million cellular subscribers. This has two practical consequences: (1) troublingly, even just a handful of missing IPs has outsize impact, and (2) highly targeted geography and behavioral inferences are unlikely for a data set composed of just two types of data elements submitted.