---------- Forwarded message ----------
From: Derk-Jan Hartman <d.j.hartman+wmf_ml(a)gmail.com>
Date: Thu, Apr 9, 2015 at 4:50 PM
Subject: Re: [WikimediaMobile] [Apps] Stripping content inside brackets
from the first sentence of articles
On 9 apr. 2015, at 08:44, Dan Garry <dgarry(a)wikimedia.org> wrote:
Hey Derk-Jan,
Thanks for chiming in. It's great to have your input.
On 8 April 2015 at 23:22, Derk-Jan Hartman <d.j.hartman+wmf_ml(a)gmail.com>
wrote:
This is me waving my red flag and telling you that you are walking
into/picking a huge fight with ‘the community’. You can’t win anything
here, there will only be losers.
As a long time community member myself, I am aware of how contentious this
change is. But I disagree that there will only be losers, as there are
plenty of readers who will benefit from the change.
My experience with WMF launches shows me that there are a few golden rules:
1: Don’t do contentious changes without working on the root causes of why
some things are contentious
2: Measure the shit out of it.
3: Be prepared that it will still be contentious and rollback.
4: Analyze the data, find more solutions
If you want to convince users that something needs to
be done here, you
should consider handing tools to the community to detect that something is
wrong. Like
https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive
I'd love to solve this more systematically. What are your suggestions for
how we could do that? We've not had much luck thinking of any so far.
Isolate:
1: Measure user response to the bracketed text
2: Graph user response to the bracketed text
3: Add graph to every single follow up action that you do
4: First add all the code to parse what is in there.
5: Log what is in there (or do it based on a db dump)
6: Classify what is in there (yes, that means manual labour, make it a
wikigrok-like game ? ) Note, the long tail is probably more interesting
here than the 80% that you already know about.
7: Ask for semantic classes and help rolling them out, so it is easier to
strip this stuff.
8: Start by stripping things that are duplicate (if in infobox then strip,
else not)
9: Selectively strip something like IPA, because you have measured that
people are not interested in it
10: Intelligently output what remains in there (if everything hidden, skip
the () altogether, take care of trailing , etc).
Hope that helps you forward.
DJ