Thanks for cc'ing me Jonathan, I wouldn't have seen this otherwise.
TL;DR - Objectively measurable criteria. Clear process. No surprises.
The context of my giving the example of Vector as a good example *of
process* was after the presentation about the future of 'Flow' at
Wikimania.[1] I highly recommend people read the slides of this session if
you've not already - great stuff![2] In particular, I was talking about how
the Usability Initiative team were the first to use an opt-in Beta process
at the WMF. It was the use of iterative development, progressive rollout,
and closed-loop feedback that made their work a successful *process*. I
wasn't talking about the Vector skin per-se.
Significantly, they had a publicly-declared and measurable, criteria for
determining what counted as "community acceptance/support". This criteria
was 80% retention rate of opt-in users. They did not lock-down the features
of one version of their beta and move to the next version until they could
show that 80% of people who tried it, preferred it. Moreover, they stuck to
this objective criteria for measuring consensus support all the way to the
final rollout.[3]
This system was a great way to identify people who had the willingness to
change but had concerns, as opposed to getting bogged down by people who
would never willingly accept a change or people who would accept all
changes regardless. It also meant that those people became 'community
advocates' for the new system because they had positive experiences of
their feedback being taken into account.
And I DO remember the process, and the significance that was attached to it
by the team (which included Trevor Parscal), because in 2009 I interviewed
the whole team in person for the Wikipedia Weekly podcast.[4] Far
from "*looking
at the past through rose coloured glasses" *I recall the specific
pain-points on the day that the Vector Skin became the default. These were
the inter-language links list being autocollapsed, and the Wikipedia logo
was updated.[5] The fact that it was THESE things that caused all the
controversy on the day that Vector went from Beta to opt-out is
instructive. These were the two things that were NOT part of the Beta
testing period - no process, surprises. Tthe people who had valid feedback
had not been given an opportunity to provide it and valid feedback came
instead in the form of swift criticism on mailing lists.[6]
My support for concept of a clearly defined, objectively measured, rollout
*process* for new features is not new... When Fabrice announced "beta
features" in November 2013 I was the first to respond - referring to the
same examples, and telling the same story about the Usability Initiative's
processes.[7]
Then, as now, the "beta features" tab lists the number of users who have
opted-in to a tool, but there is no comparative/objective explanation of
what that actually means! For example, it tells me that 33,418 people have
opted-in to "Hovercards", but is that good? How long did it take to reach
that level? How many people have switched it off? What proportion of the
active editorship is that? And most importantly - what relationship does
this number have to whether Hovercards will 'graduate' or 'fall' the
opt-in
Beta process?
Which brings me to the point I made to Jonathan, and also Pau, at Wikimania
about the future of Flow.
If there's two things we Wikimedians hate most, I've come to believe that
they are:
1) The absence of a clear process, or a failure to follow that process
2) Being surprised
We can, generally, abide outcomes/decisions that we don't like (e.g.
article-deletion debates) as long as the process by which that decision was
arrived at was clearly explained, and objectively followed. I believe this
is why there was so much anger and frustration about the 'autoconfirm
article creation trial' on en.wp [8] and the 'superprotect' controversy -
because they represented a failure to follow a process, and a surprise
(respectively).
So, even more than the Vector skin or even the Visual Editor, Flow
ABSOLUTELY MUST have a clear, objectively measurable, *process* for
measuring community consensus because it will be replacing
community-designed and community-operated workflows (e.g. [9]). This means
that once it is enabled on a particular workflow:
1) an individual user can't opt-out to the old system.
2) it will most affect, and be most used by, admins and other
very-active-users.
Therefore, I believe that this development must be an iterative process of
working on 1 workflow on 1 wiki at a time, with objective measures of
consensus-support that are at least partially *determined by the affected
community itself*. This will be the only way that Flow can gain community
consensus for replacing the existing
template/sub-page/gadget/transclusion/category-based workflows.[10]
Because Flow will be updating admin-centric workflows, if it is rolled-out
in a way that is anything less than this then it will strike the community
as hubris - "it is necessary to destroy the town in order to save it".[11]
-Liam / Wittylama
P.S. While you're at it please make ALL new features go through the "Beta
features" process with some consistent/discoverable process. As it is, some
things live there permanently in limbo, some things DO have a process
associated with them, and some things bypass the beta system
altogether. As bawolff
said, this means people feel they don't have any influence over the rollout
process and therefore chose to not be involved at all.[12]
[1]
https://wikimania2015.wikimedia.org/wiki/Submissions/User(s)_Talk(ing):_The…
[2]
https://wikimania2015.wikimedia.org/wiki/File:User(s)_Talk(ing)_-_Wikimania…
[3]
https://blog.wikimedia.org/2010/05/13/a-new-look-for-wikipedia/
[4] Sorry - I can't find the file anymore though. This was the page:
https://en.wikipedia.org/wiki/Wikipedia:WikipediaWeekly/Episode76
[5]
https://blog.wikimedia.org/2010/05/13/wikipedia-in-3d/
[6]
https://commons.wikimedia.org/wiki/Talk:Wikipedia/2.0#Logo_revisions_need_i…
[7]
https://lists.wikimedia.org/pipermail/wikimedia-l/2013-November/128896.html
[8]
https://en.wikipedia.org/wiki/Wikipedia:Autoconfirmed_article_creation_trial
[9]
https://wikimania2015.wikimedia.org/w/index.php?title=File:User(s)_Talk(ing…
[10]
https://wikimania2015.wikimedia.org/w/index.php?title=File:User(s)_Talk(ing…
[11]
https://en.wikipedia.org/wiki/B%E1%BA%BFn_Tre#Vietnam_War
[12]
https://lists.wikimedia.org/pipermail/design/2015-July/002355.html
wittylama.com
Peace, love & metadata
On 27 July 2015 at 22:51, Jonathan Morgan <jmorgan(a)wikimedia.org> wrote:
On Mon, Jul 27, 2015 at 11:02 AM, Ryan Lane
<rlane32(a)gmail.com> wrote:
For instance, if a change negatively affects an editor's workflow, it
should
be reflected in data like "avg/p95/p99 time for x action to occur", where
x
is some normal editor workflow.
That is indeed one way you can provide evidence of correlation; but in
live deployments (which are, at best, quasi-experiments
<https://en.wikipedia.org/wiki/Quasi-experiment>), you seldom get results
that are as unequivocal as the example you're presenting here. And
quantifying the influence of a single causal factor (such as the impact of
a particular UI change on time-on-task for this or that editing workflow)
is even harder.
Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on
this subject, and we have confidently identified a set of factors that are
likely contributors. Some of these can be directly measured: the decreased
retention rate of newcomers; the effect of early, negative experiences on
newcomer retention; a measurable increase over time in phenomena (like
reverts, warnings, new article deletions) that likely cause those negative
experiences. But none of us who have studied the editor decline believe
that these are the only factors. And many community members who have read
our research don't even accept our premises, let alone our findings.
I'm not at all afraid of sounding pedantic here (or of writing a long-ass
wall of text), because I think that many WMF and former-WMF participants in
this discussion are glossing over important stuff: Yes, we need a more
evidence-based product design process. But we also need a more
collaborative, transparent, and iterative deployment process. Having solid
research and data on the front-end of your product lifecycle is important,
but it's not some kind of magic bullet and is no substitute for community
involvement in product design (through the lifecycle).
We have an excellent Research & Data
<https://wikimediafoundation.org/wiki/Staff_and_contractors#Research_and_Data>
team. The best one we've ever had at WMF. Pound-for-pound, they're as good
as or better than the Data Science teams at Google or Facebook. None of
them would ever claim, as you seem to here, that all you need to build good
products are well-formed hypotheses and access to buckets of log data.
I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in
case he doesn't follow this list). We talked about strategies for deploying
new products on Wikimedia projects: what works, what doesn't. He held up
the design/deployment process for Vector as an example of *good* process,
one that we should (re)adopt.
Vector was created based on extensive user research and community
consultation[1]. Then WMF made a beta, and invited people across projects
to opt-in and try it out on prototype wikis[2]. The product team set public
criteria for when it would release the product as default across
production projects: retention of 80% of the Beta users who had opted in,
after a certain amount of time. When a beta tester opted out, they were
sent a survey to find out why[3]. The product team attempted to triage the
issues reported in these surveys, address them in the next iteration, or
(if they couldn't/wouldn't fix them), at least publicly acknowledge the
feedback. Then they created a phased deployment schedule, and stuck to
it[4].
This was, according to Liam (who's been around the movement a lot longer
than most of us at WMF), a successful strategy. It built trust, and engaged
volunteers as both evangelists and co-designers. I am personally very eager
to hear from other community members who were around at the time what they
thought of the process, and/or whether there are other examples of good WMF
product deployments that we could crib from as we re-assess our current
process. From what I've seen, we still follow many good practices in our
product deployments, but we follow them haphazardly and inconsistently.
Whether or not we (WMF) think it is fair that we have to listen to "vocal
minorities" (Ryan's words), these voices often represent *and influence*
the sentiments of the broader, less vocal, contributor base in important
ways. And we won't be able to get people to accept our conclusions, however
rigorously we demonstrate them or carefully we couch them in scientific
trappings, if they think we're fundamentally incapable of building
something worthwhile, or deploying it responsibly.
We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We
aren't a start-up (most of which fail) that can focus all our resources on
one radical new idea. We aren't a tech giant like Google or Facebook, that
can churn out a bunch of different beta products, throw them at a wall and
see what sticks.
And we're not a commercial community-driven site like Quora or Yelp, which
can constantly monkey with their interface and feature set in order to
maximize ad revenue or try out any old half-baked strategy to monetize its
content. There's a fundamental difference between Wikimedia and Quora. In
Quora's case, a for-profit company built a platform and invited people to
use it. In Wikimedia's case, a bunch of volunteers created a platform,
filled it with content, and then a non-profit company was created to
support that platform, content, and community.
Our biggest opportunity to innovate, as a company, is in our design
process. We have a dedicated, multi-talented, active community of
contributors. Those of us who are getting paid should be working on
strategies for leveraging that community to make better products, rather
than trying to come up with new ways to perform end runs around them.
Jonathan
1.
https://usability.wikimedia.org/wiki/What%27s_new,_questions_and_answers#Ho…
2.
https://usability.wikimedia.org/wiki/Prototype
3.
https://usability.wikimedia.org/wiki/Beta_Feedback_Survey
4.
https://usability.wikimedia.org/wiki/Releases/Default_Switch
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>