---------- Forwarded message ----------
From: Derk-Jan Hartman <d.j.hartman+wmf_ml@gmail.com>
Date: Thu, Apr 9, 2015 at 4:50 PM
Subject: Re: [WikimediaMobile] [Apps] Stripping content inside brackets from the first sentence of articles

On 9 apr. 2015, at 08:44, Dan Garry <dgarry@wikimedia.org> wrote:

Hey Derk-Jan,

Thanks for chiming in. It's great to have your input.

On 8 April 2015 at 23:22, Derk-Jan Hartman <d.j.hartman+wmf_ml@gmail.com> wrote:
This is me waving my red flag and telling you that you are walking into/picking a huge fight with ‘the community’. You can’t win anything here, there will only be losers.

As a long time community member myself, I am aware of how contentious this change is. But I disagree that there will only be losers, as there are plenty of readers who will benefit from the change.

My experience with WMF launches shows me that there are a few golden rules:
1: Don’t do contentious changes without working on the root causes of why some things are contentious
2: Measure the shit out of it.
3: Be prepared that it will still be contentious and rollback.
4: Analyze the data, find more solutions

If you want to convince users that something needs to be done here, you should consider handing tools to the community to detect that something is wrong. Like https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive

I'd love to solve this more systematically. What are your suggestions for how we could do that? We've not had much luck thinking of any so far.


1: Measure user response to the bracketed text
2: Graph user response to the bracketed text
3: Add graph to every single follow up action that you do
4: First add all the code to parse what is in there.
5: Log what is in there (or do it based on a db dump)
6: Classify what is in there (yes, that means manual labour, make it a wikigrok-like game ? ) Note, the long tail is probably more interesting here than the 80% that you already know about.
7: Ask for semantic classes and help rolling them out, so it is easier to strip this stuff.
8: Start by stripping things that are duplicate (if in infobox then strip, else not)
9: Selectively strip something like IPA, because you have measured that people are not interested in it
10: Intelligently output what remains in there (if everything hidden, skip the () altogether, take care of trailing , etc).

Hope that helps you forward.