Hi,
You may have followed the discussion on Wikimedia-l (and enwiki-l).
For a mere intellectual curiosity I would like to know why hashing the IPs
with a varying salt won't work.
Wouldn't that provide a way to obfuscate IP addresses while maintaining
uniqueness (i. e. a given IP gets alway hashed to the same hash).
Tim said in a message on enwiki-l that he has looked into the matter but
haven't found any satisfying solution.
So what's the problem with salted hashes?
Note: I have read something about hashing but I am far from being an
expert, please assume I am the classical layman.
Thanks in advance to anyone who will take the time to explain.
C
---------- Messaggio inoltrato ----------
Da: "Lila Tretikov" <lila(a)wikimedia.org>
Data: 05/Apr/2015 11:30
Oggetto: Re: [Wikimedia-l] Announcing: The Wikipedia Prize!
A: "Wikimedia Mailing List" <wikimedia-l(a)lists.wikimedia.org>
Cc:
All,
As Tim mentioned we are seriously looking at
privacy/identity/security/anonymity issues, specifically as it pertains to
IP address exposure -- both from legal and technical standpoint. This won't
happen overnight as we need to get people to work on this and there are a
lot of asks, but this is on our radar.
On a related note, let's skip the sarcasm and treat each other with
straightforward honestly. And for non-English speakers -- who are also (if
not more) in need of this -- sarcasm can be very confusing.
Thanks,
Lila
On Fri, Apr 3, 2015 at 4:02 PM, Cristian Consonni <kikkocristian(a)gmail.com>
wrote:
> Hi Brian,
>
> 2015-03-30 0:25 GMT+02:00 Brian <reflection(a)gmail.com>:
> > Although the initial goal of the Netflix Prize was to design a
> > collaborative filtering algorithm, it became notorious when the data was
> > used to de-anonymize Netflix users. Researchers proved that given just a
> > user's movie ratings on one site, you can plug those ratings into
another
> > site, such as the IMDB. You can then take that information, and with
some
> > Google searches and optionally a bit of cash (for websites that sell
user
> > information, including, in some cases, their SSN) figure out who they
> are.
> > You could even drive up to their house and take a selfie with them, or
> > follow them to work and meet their boss and tell them about their views
> on
> > the topics they were editing.
>
> somewhat tangentially, and to bring back this to topic to a more
> scientific setting I would like to point out that there has already
> been reasearch in the past on this topic.
>
> I highly recommend reading the following paper:
>
> Lieberman, Michael D., and Jimmy Lin. "You Are Where You Edit:
> Locating Wikipedia Contributors through Edit Histories." ICWSM. 2009.
> (PDF <
>
http://www.pensivepuffin.com/dwmcphd/syllabi/infx598_wi12/papers/wikipedia/…
> >)
>
> For those of you that don't want to read the whole paper, you can find
> a recap of the most relevant findings in this presentation by Maurizio
> Napolitano:
> <
> http://www.slideshare.net/napo/social-geography-wikipedia-a-quick-overwiew
> >
>
> The main idea is associating spatial coordinates to a Wikipedia
> articles when possible, this articles are called "geopages". Then you
> extract from the history of articles the users which have edited a
> geopage. If you plot the geopages edited by a given contributor you
> can see that they tend to cluster, so you can define an "edit area".
> The study finds that 30-35% of contributors concentrate their edits in
> an edit area smaller than 1 deg^2 (~12,362 km^2, approximately the
> area of Connecticut or Northern Ireland[1] (thanks, Wikipedia!)).
>
> For another free/libre project with a geographic focus like
> OpenStreetMap this is even more marked, check out for example this
> tool «“Your OSM Heat Map” (aka Where did you contribute?)»[2] by
> Pascal Neis.
>
> This, of course, is not a straightforward de-anonimization but this
> methods work in principle for every contributor even if you obfuscate
> their IP or username (provided that you can still assign all the edits
> from a given user to a unique and univocal identifier)
>
> C
> [1] https://en.wikipedia.org/wiki/Square_degree
> [2a] http://yosmhm.neis-one.org/
> [2b] http://neis-one.org/2011/08/yosmhm/
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
A patch [0] was merged recently that moves files provisioned by some
roles from /srv/* (and /vagrant/*) to /vagrant/srv/*. This puts the
files on the host machine and reduces the number of random
subdirectories under /vagrant that are created. See T89919 [1] and the
patch for more details.
Most users shouldn't really notice any impact from this change, but
users of the iegreview, scholarships and wikimetrics roles will need
to either move their /vagrant/<rolename> checkouts to
/vagrant/srv/<rolename> or double check that any work in progress they
have in those git clones are pushed up to gerrit and pulled back down
to the new working directory.
[0]: https://gerrit.wikimedia.org/r/#/c/200624/
[1]: https://phabricator.wikimedia.org/T89919
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
I've just branched REL1_25 for MediaWiki core from 6dae212. Master
has now been bumped to 1.26alpha.
Please don't break master immediately with all kinds of backwards
incompatible changes...makes backporting fixes a pain. Think of the
poor release manager ;-)
Also: extensions are in the process of being branched and will complete
over the next hour or two. Slow script is slow.
-Chad
Will anything bad happen if entries in the MediaWiki "logging" table are not inserted in chronological order?
Due to a bug, our logging table has incomplete data. I'd like to insert the missing data using a script.
However, the log_id column is auto-increment. This means that when the table is ordered by log_id,
the data will not be in chronological order by log_timestamp.
Is that bad in any way?
Or are all applications (like Special:Log) expected to "order by log_timestamp" rather than log_id?
Thanks,
Dan
TL;DR:
* Our QUnit jobs now uses latest Chromium instead of PhantomJS.
* You can run the test suite from the command line locally now.
Thanks to Tim Starling, Antoine "hashar" Musso, Kunal (legoktm),
Bryan Davis, S Page, and James Forrester; for their help in different areas.
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing
Are you sitting comfortably?
The past few months can be summarised as a long journey through a forest of technical debt. It was also amazing to see just how many layers of infrastructure were able to block this task (and did). [1]
From a local development point of view it couldn't be simpler. In the Gruntfile: Remove grunt-contrib-qunit, add grunt-karma with karma-chrome-launcher. Done? Not quite.
For standalone front-end projects it actually was this simple. OOjs and VisualEditor have been enjoying this new stack since July 2014. This helped refine the stack and the underlying technologies over the past months. More about what this new stack provides in a minute.
== Journey ==
=== Export from Special:JavaScriptTest ===
For MediaWiki core and extensions, one must install MediaWiki before running QUnit. [2][3] That in itself isn't too complicated (set up DB and run install; we have standard macros for that). But, the test doesn't just communicate with MediaWiki. The test suite is actually served *by* MediaWiki. Karma enforces the principle that unit tests run in pure javascript (blank page with sockets, load source files, load QUnit, run test suite).
Back in MediaWiki 1.19 our test suite abided by these best practices. The test suite was a static HTML file that loaded relevant scripts directly. [4] This file was later migrated to Special:JavaScriptTest, which opened the door for undeclared dependencies.
1. Source code is registered to ResourceLoader.
2. Extensions register tests via hooks in PHP.
3. OutputPage and Skin provide config vars and HTML that tests could depend on.
Point 3 was easily resolved by adding the appropriate mw.config or DOM fixture to the few tests that were missing it.
Though Point 2 and 1 weren't going anywhere. We needed to, once again, access our tests suite as a pure JavaScript payload, with no HTML or script tags loading relevant resources.
Hence, the introduction of Special:JavaScriptTest/qunit/export in https://gerrit.wikimedia.org/r/178551.
=== Migrate all the things ===
Objective: Karma [5] on Chromium. Easy to run locally for developers.
To be re-usable, the logic must be generic and use composer/npm. In 2013, most jobs ran on Wikimedia production servers. In order to use npm we had to migrate to Wikimedia Labs first.
Back then, production had only just begun using Ubuntu Trusty.
The relevant modules require Node.js v0.10. Ubuntu Precise has Node.js v0.8. Precise is also stuck with Chromium 37 (EOL). [7] To get auto-updating latest stable Chromium and Node v0.10, we needed to migrate our infrastructure to Ubuntu Trusty first.
Much of the process to install MediaWiki on a Jenkins slave also wasn't puppetised.
=== SQLite ===
This is its own story. See https://phabricator.wikimedia.org/T89180 for details.
The short of it is, we also migrated everything to MySQL. Which we wanted to do anyway.
== New stack ==
1. Chromium
PhantomJS is a great application for many purposes. It was good for us while it lasted. But for JavaScript unit testing, PhantomJS just isn't meant to be. It's too distant from a "real" browser. When we introduced PhantomJS, it was a big step forward toward testing cross-browser. It "uses WebKit" which meant we were kind of covering Chrome and Safari (in 2012).
Safari has had several major releases since PhantomJS v1.9 came out. Chrome had even more releases (and since dropped WebKit in favour of Blink). And, in truth, PhantomJS wasn't that much like Safari and mainstream WebKit to begin with. [8]
Chromium is an actual browser. A browser we actually support. A browser that represents real users of our software. It's only one browser for now, but it's a start.
2. Karma
With Karma as a solid foundation, it's truly simple to add more browsers in the future. OOjs has already added Firefox alongside Chromium in Jenkins. Karma uses Web Driver, a standardised protocol to operate browsers (locally and remotely alike). It has an official plugin hooking it up to SauceLabs, which opens the door to any other browser and platform we want. It's as simple as adding 1 line to the config. [5][6]
We haven't done this yet as there are scalability, performance and security considerations. Having said that, OOjs is currently using an experimental pipeline to test in six different browsers from Jenkins via SauceLabs. Including IE 6 and IE 11 on Windows, and Safari 5 on Mac! [9]
3. Code coverage
One of the advantages of having a pure stack is the ability to run pre-processors on the code using Karma built-ins. [10] The first one I'm looking at is code coverage.
4. Local development and Jenkins equally
It runs the full QUnit test suite in a real browser (multiple even) by simply running "npm test". [11]
Aside from Node.js, it requires no pre-installed software and is as easy as "npm install" to set up.
— Krinkle
[1] https://phabricator.wikimedia.org/T74063 [epic] Adopt Karma with Chromium (tracking).
[2] https://phabricator.wikimedia.org/T89433 Make QUnit tests run without installing MediaWiki.
[3] https://phabricator.wikimedia.org/T89432 Make PHPUnit tests run without installing MediaWiki.
[4] https://github.com/wikimedia/mediawiki/blob/REL1_19/tests/qunit/index.html
[5] https://karma-runner.github.io/
[6] https://karma-runner.github.io/0.12/config/browsers.html
[7] http://packages.ubuntu.com/precise/chromium-browser
[8] http://codepen.io/Krinkle/blog/phantomjs-anno-2014
[9]
https://github.com/wikimedia/oojs/blob/v1.1.6/Gruntfile.js#L76-L107https://integration.wikimedia.org/ci/job/npm/1680/console
[11]
https://karma-runner.github.io/0.12/config/preprocessors.htmlhttps://github.com/karma-runner/karma-coverage
[12] https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing
I am pleased to announce that Stephane Bisson joins WMF this week as a
Software Engineer on the Collaboration Team!
Stephane is an avid learner and traveler. He is passionate about
history, cultures, and languages. He runs, cooks, and enjoys wine
tasting. And is eager to travel to Napa with anyone who knows it well.
Professionally, Stephane spent 5 years writing software for the
manufacturing industry and another 5 years as a consultant for
ThoughtWorks. Ruby and Javascript have a very special place in his
heart.
Stephane will join the Collaboration team focusing on front-end
development and will be in SF with the team this week.
Please welcome Stephane!
--tomasz
<quote name="Timo Tijhof" date="2015-04-04" time="04:40:33 +0100">
> TL;DR:
> * Our QUnit jobs now uses latest Chromium instead of PhantomJS.
> * You can run the test suite from the command line locally now.
>
> Thanks to Tim Starling, Antoine "hashar" Musso, Kunal (legoktm),
> Bryan Davis, S Page, and James Forrester; for their help in different areas.
>
> https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing
Thanks a ton for this great work, Timo and company.
I really want to call out Timo and others here.
Our continuous integration setup is a good one, and is getting better. A
big part of why it continues to improve is because of the
cross-organizational support we (#releng) receive from people like Timo,
Bryan, Kunal, James, etc.
It's great seeing the collaboration happen on these projects.
</kumbaya_moment>
Greg
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |
The new PSR-3 debug logging system brought namespaced external code
(Psr\Log\LoggerInterface) into use in MediaWiki core. The classes I
built out to work with this system are using faux namespaces by virtue
of class names like "MWLoggerFactory", "MWLoggerLegacyLogger" and
"MWLoggerMonologSyslogHandler". Before 1.25 starts rolling out as a
tarball release I'd like to change these classes to use actual PHP
namespaces rather than this clunky collision avoidance mechanism. [0]
There is also a task to backport minimal PSR-3 support to the 1.23 LTS
system to simplify backports of code and extensions that adopt direct
use of PSR-3 and I'd like to only do that once if possible.
The color I have picked for this namespace bikeshed is
MediaWiki\Core\Logger. The MediaWiki root namespace is a pretty
obvious choice. "Core" is inserted to distinguish this fundamental
MediaWiki functionality from any existing or future extensions that
might use namespaces. I'm hoping "Logger" is sufficiently distinct
from other uses of the term "log" in MediaWiki which generally mean
"audit trail" rather than "debugging information". I'd be fine with
throwing Debug in between Core and Logger too if consensus found for
that instead.
I'd also like to start organizing these files in a directory structure
that would be compatible with the PSR-4 auto loader standard. PSR-1
required all namespace elements to be in the file path as directories,
but PSR-4 allows a common prefix for all classes to be dropped. I was
thinking an includes/Core/ directory could be used as the common base
for these and future namespaced classes in MediaWiki core.
We had some discussion last summer [2] about namespace use in
extensions that seemed to end with "cool do it when you want" and "we
don't really need any standard conventions". Since I'm suggesting
namespace usage in core I figured this was worth another (hopefully
short) round of discussion with the larger community than is likely to
see my patches when they land in Gerrit.
[0]: https://phabricator.wikimedia.org/T93406
[1]: https://phabricator.wikimedia.org/T91653
[2]: http://www.gossamer-threads.com/lists/wiki/wikitech/476296
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855