Does anybody remember Winter? - http://unicorn.wmflabs.org/winter/
Brandon's designs made a lot of sense and looked like a much-needed refreshment of what should be MediaWiki's default skin, but now a few months after he left the project appears to be in limbo.
Is there any intention to follow up on that or to start new work in that area?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
On Jul 20, 2015 13:50, "Amir E. Aharoni" amir.aharoni@mail.huji.ac.il wrote:
Does anybody remember Winter? - http://unicorn.wmflabs.org/winter/
Brandon's designs made a lot of sense and looked like a much-needed
refreshment of what should be MediaWiki's default skin, but now a few months after he left the project appears to be in limbo.
The Compact personal bar and Fixed header Beta features implement some ideas from Winter. Both are available on the beta cluster. I know bug(s) with CPB block further deployment
I think any official work on this would fall under the Reading team, I don't see it on https://m.mediawiki.org/wiki/Reading/Strategy_and_Roadmap
I agree there's good stuff in Winter.
On 07/20/2015 02:26 PM, S Page wrote:
The Compact personal bar and Fixed header Beta features implement some ideas from Winter. Both are available on the beta cluster. I know bug(s) with CPB block further deployment
It was undeployed yesterday[1][2].
[1] https://gerrit.wikimedia.org/r/#/c/225668/ [2] https://phabricator.wikimedia.org/T87489
-- Legoktm
Amir E. Aharoni wrote:
Does anybody remember Winter? - http://unicorn.wmflabs.org/winter/
Hi.
Yes, I remember Winter. It was a nice prototype.
Vector needs love, for sure. But my impression is that the Wikimedia Foundation design team has neither the focus nor the commitment to provide this support. This means you'll need volunteers, particularly ones capable of working with the Wikimedia community to modernize the interface without annoying or angering users (readers and editors alike) in the process.
My personal view is that a gradual approach is preferable to a large and sudden redesign. There's now an experimental responsive mode in the Vector skin. I'm very cautiously optimistic about its path forward.
MZMcBride
I also believe that iterating on Vector is highly preferable to introducing a new skin. That said, to try and be as fair as I can to Brandon, he publicly declared last year at Wikimania London, "Winter is not a skin".
While I didn't understand his explaination of what it was, observationally it appeared to be the user interface equivalent of a futurist predicting a few years into the future in reasonable detail. Some of it will end up being true, some will not. My understanding is that the Winter implementation was of the semi-functional prototype variety and little or none of the design work was based on usability research of Vector, the status quo.
In contrast, Vector really is a skin that was implemented specifically for production use and is now a battle tested platform from which to build upon. Also, the UX improvements that were made over Monobook, the status quo at the time, were based on usability research. This is a practice we should continue with for future changes.
I know that it's sometimes exciting to people to make dramatic reveals of proposals for sweeping changes. It's also fun to get excited about them. However, this grand-unveiling boil-the-ocean approach never works out in practice. It unnecessarily strains design, developement and community engagement efforts. It is wasteful and wreckless. It is arrogant and ignorant. It's not who we are, and it's not how we do things.
Even Vector was based heavily on Monobook, and in every way in which early versions of Vector deviated from Monobook, without just cause, it was "fixed" to be more similar. This was not wrong. Making arbitrary changes was wrong. Starting from scratch is even worse.
We should carefully continue along the path of iterating on Vector. We should gradually converge it's styling and implementation with that of OOjs UI. We should continue improving usability and accessibility on a variety of form factors. We should perform research and base changes on the findings it produces. This will enable us to move forward with minimal cost, and far less drama.
- Trevor
On Monday, July 20, 2015, MZMcBride z@mzmcbride.com wrote:
Amir E. Aharoni wrote:
Does anybody remember Winter? - http://unicorn.wmflabs.org/winter/
Hi.
Yes, I remember Winter. It was a nice prototype.
Vector needs love, for sure. But my impression is that the Wikimedia Foundation design team has neither the focus nor the commitment to provide this support. This means you'll need volunteers, particularly ones capable of working with the Wikimedia community to modernize the interface without annoying or angering users (readers and editors alike) in the process.
My personal view is that a gradual approach is preferable to a large and sudden redesign. There's now an experimental responsive mode in the Vector skin. I'm very cautiously optimistic about its path forward.
MZMcBride
Design mailing list Design@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/design
Trevor Parscal <tparscal@...> writes:
I also believe that iterating on Vector is highly preferable to
introducing a new skin. That said, to try and be as fair as I can to Brandon, he publicly declared last year at Wikimania London, "Winter is not a skin".
While I didn't understand his explaination of what it was, observationally
it appeared to be the user interface equivalent of a futurist predicting a few years into the future in reasonable detail. Some of it will end up being true, some will not. My understanding is that the Winter implementation was of the semi-functional prototype variety and little or none of the design work was based on usability research of Vector, the status quo.
Isn't that research 7 years old? I remember the usability project and for all intents and purposes is was mostly a failure. We didn't meet most of the terms of the grants and very little came out of it, other than vector, which is a modest change from monobook.
In contrast, Vector really is a skin that was implemented specifically for
production use and is now a battle tested platform from which to build upon. Also, the UX improvements that were made over Monobook, the status quo at the time, were based on usability research. This is a practice we should continue with for future changes.
"battle tested" == outdated and relatively unchanged in nearly 7 years. The web evolves and Wikimedia does not (at least for readers).
I know that it's sometimes exciting to people to make dramatic reveals of
proposals for sweeping changes. It's also fun to get excited about them. However, this grand-unveiling boil-the-ocean approach never works out in practice. It unnecessarily strains design, developement and community engagement efforts. It is wasteful and wreckless. It is arrogant and ignorant. It's not who we are, and it's not how we do things.
This actually works amazingly well in practice for most organizations. Maybe Wikimedia /should/ be this and maybe it /should/ be how Wikimedia does things. Isn't a motto of the movement "Be bold"? What happened to that? Maybe we should change things to "Be careful; it's scary to change".
Even Vector was based heavily on Monobook, and in every way in which early
versions of Vector deviated from Monobook, without just cause, it was "fixed" to be more similar. This was not wrong. Making arbitrary changes was wrong. Starting from scratch is even worse.
Only because the community is scared of change. Every community is, though. People got used to Vector and they'd get used to Winter after a month or two. This happens frequently to other major sites. The thing you need to keep in mind is that you need to actually hold strong for a few months until people get used to things, while fixing legitimate bugs.
We should carefully continue along the path of iterating on Vector. We
should gradually converge it's styling and implementation with that of OOjs UI. We should continue improving usability and accessibility on a variety of form factors. We should perform research and base changes on the findings it produces. This will enable us to move forward with minimal cost, and far less drama.
It's sad that Wikimedia has given up on users.
- Ryan Lane
I find this conversation worrying.
On 21/07/15 17:52, Ryan Lane wrote:
Trevor Parscal <tparscal@...> writes:
I also believe that iterating on Vector is highly preferable tointroducing a new skin.
Ideally, each new skin that is introduced is an interation on the previous. What worked well is maintained and built upon, what didn't is changed. We don't ever want to simply throw out what we have.
But we also need to find a point to break off and actually make it into a new one. Keep going in the direction of winter (or whatever), and there comes a point when it simply is not Vector anymore - and that's fine, but there's no reason to take the Vector that was away from those who legitimately liked it, either. We allow users (including third-party users) their preferences, and there is historical value, too, in keeping the older styles around in some form.
In contrast, Vector really is a skin that was implemented specifically for production use and is now a battle tested platform from which to build upon. Also, the UX improvements that were made over Monobook, the status quo at the time, were based on usability research. This is a practice we should continue with for future changes.
"battle tested" == outdated and relatively unchanged in nearly 7 years. The web evolves and Wikimedia does not (at least for readers).
Aye, we do need to move on. But there are also lessons in what has lingered all this time - we need to look at it and understand why in order to properly address it and serve the underlying needs. This is why we iterate on what's there, and don't only make drastically new things.
I know that it's sometimes exciting to people to make dramatic reveals of proposals for sweeping changes. It's also fun to get excited about them. However, this grand-unveiling boil-the-ocean approach never works out in practice. It unnecessarily strains design, developement and community engagement efforts. It is wasteful and wreckless. It is arrogant and ignorant. It's not who we are, and it's not how we do things.
This actually works amazingly well in practice for most organizations. Maybe Wikimedia /should/ be this and maybe it /should/ be how Wikimedia does things.
We are not most organisations; where many answer to external stakeholders, and the consumers are simply the product, that is not so here. Wikimedia doesn't just answer to its communities, it IS the communities - all of them, the various projects, the WMF, GLAM, even dark corners of Commons and random people doing meetups for editathons - and its purpose is not profit, but education via a tenable, usable end result of efforts from all of them.
Isn't a motto of the movement "Be bold"? What happened to that? Maybe we should change things to "Be careful; it's scary to change".
Neither of these work without the other. Being bold, you must be careful, or it will blow up in your face. Being careful gets you nowhere without also being bold.
Even Vector was based heavily on Monobook, and in every way in which early versions of Vector deviated from Monobook, without just cause, it was "fixed" to be more similar. This was not wrong. Making arbitrary changes was wrong. Starting from scratch is even worse.
Only because the community is scared of change. Every community is, though. People got used to Vector and they'd get used to Winter after a month or two. This happens frequently to other major sites. The thing you need to keep in mind is that you need to actually hold strong for a few months until people get used to things, while fixing legitimate bugs.
"The community is scared of change" seems to be a common excuse from those too scared to work with communities outside of their own.
And many communities do propose change - some changes are good, some not so good, some need more resources to ever actually work. Just shoving things down people's throats, however, does not work. Consider the multimedia viewer, which needed an overhaul for copyrights alone and is still problematic to date. Consider visual editor when it was first released; even now, when it is so much more powerful, it isn't even available by default on many major wikis. Consider the typography refresh, which has been piecemeally reverted over the course of months. Then look at extensions like massmessage, abusefilter, timedmediahandler, apisandbox, globalcssjs, and others which considered the use cases and worked with the end users to make a sensible product with little reason to reject it. These may be smaller changes, or less reader-facing, but the way they were developed, never even mind how they were introduced, is particularly important. People were involved, problems were considered.
If you want to know what the "community" is afraid of, it's not change. It's things being developed entirely without them even in mind, getting shoved at them forcefully, and breaking what workflows they have. Unlike for some organisations, these are not simply users we profit off of while they amuse themselves, but volunteers donating their time, effort, and content, and they are the ones you should be concerning yourself with always. Not the readers, them.
We make the content work for the readers so that the volunteers' efforts are not in vain.
We should carefully continue along the path of iterating on Vector. We should gradually converge it's styling and implementation with that of OOjs UI. We should continue improving usability and accessibility on a variety of form factors. We should perform research and base changes on the findings it produces. This will enable us to move forward with minimal cost, and far less drama.
It's sad that Wikimedia has given up on users.
Who has given up? The fact that we are even having this conversation seems pretty clear evidence that we haven't just yet.
Isarra Yos <zhorishna@...> writes:
Aye, we do need to move on. But there are also lessons in what has lingered all this time - we need to look at it and understand why in order to properly address it and serve the underlying needs. This is why we iterate on what's there, and don't only make drastically new things.
Do we actually know the lessons? Are they listed anywhere? Are they valid anymore? Do modern web practices cover them?
It's great to iterate on things when they are relatively modern. It's folly to do so when you're almost a decade behind the industry standard. The argument itself is odd because Vector has not been iterating steadily towards modern practices. It's been stagnant for years.
We are not most organisations; where many answer to external stakeholders, and the consumers are simply the product, that is not so here. Wikimedia doesn't just answer to its communities, it IS the communities - all of them, the various projects, the WMF, GLAM, even dark corners of Commons and random people doing meetups for editathons - and its purpose is not profit, but education via a tenable, usable end result of efforts from all of them.
The community you're talking about is the editor community, which is a tiny fraction of the overall community, but attempts to speak with authority over the entirety of it. The vocal portion of the editor community that speaks with this authority is even a minor fraction of the editor community. We're talking about .001% of the entire community that holds the entire movement hostage (5167 people voted in the last election, and there's 430 million monthly active readers).
The reader community is massive and has no voice, except their complaints across the internet. The WMF can and should be the voice for the reader community.
Isn't a motto of the movement "Be bold"? What happened to that? Maybe we should change things to "Be careful; it's scary to change".
Neither of these work without the other. Being bold, you must be careful, or it will blow up in your face. Being careful gets you nowhere without also being bold.
The status quo is that change never happens because people are too scared to change. There's no boldness here. There's hardly even basic assertiveness.
"The community is scared of change" seems to be a common excuse from those too scared to work with communities outside of their own.
Or an argument of those who think it's not in the readers' best interest to have editors with little to no knowledge of software engineering or UX design dictating the engineering and design of reader features.
And many communities do propose change - some changes are good, some not so good, some need more resources to ever actually work. Just shoving things down people's throats, however, does not work. Consider the multimedia viewer, which needed an overhaul for copyrights alone and is still problematic to date. Consider visual editor when it was first released; even now, when it is so much more powerful, it isn't even available by default on many major wikis. Consider the typography refresh, which has been piecemeally reverted over the course of months. Then look at extensions like massmessage, abusefilter, timedmediahandler, apisandbox, globalcssjs, and others which considered the use cases and worked with the end users to make a sensible product with little reason to reject it. These may be smaller changes, or less reader-facing, but the way they were developed, never even mind how they were introduced, is particularly important. People were involved, problems were considered.
If you want to know what the "community" is afraid of, it's not change. It's things being developed entirely without them even in mind, getting shoved at them forcefully, and breaking what workflows they have. Unlike for some organisations, these are not simply users we profit off of while they amuse themselves, but volunteers donating their time, effort, and content, and they are the ones you should be concerning yourself with always. Not the readers, them.
We make the content work for the readers so that the volunteers' efforts are not in vain.
I've also volunteered my time for the past 10 years, but as an engineer. I care about Wikimedia more as a reader than as an editor and my experience as a reader is not great and the editor community is the primary reason for this. The WMF's hesitation to make change is heavily based on the pitchforks and torches lit by this community.
We should carefully continue along the path of iterating on Vector. We
should
gradually converge it's styling and implementation with that of OOjs UI. We should continue improving usability and accessibility on a
variety of
form factors. We should perform research and base changes on the
findings it
produces. This will enable us to move forward with minimal cost, and far less drama.
It's sad that Wikimedia has given up on users.
Who has given up? The fact that we are even having this conversation seems pretty clear evidence that we haven't just yet.
There's not really a conversation. The UX lead is saying "Winter is dead, let's continue with the iterations on Vector", though there's no real iteration going on. The editor community is opposed to any change that doesn't completely agree with them, where the "them" is around 5,000 people who also can't agree with each other and aren't qualified to be making the decisions to begin with.
- Ryan Lane
On 07/21/2015 02:38 PM, Ryan Lane wrote:
There's not really a conversation. The UX lead is saying "Winter is dead, let's continue with the iterations on Vector", though there's no real iteration going on.
I'd consider https://gerrit.wikimedia.org/r/#/c/220667/ to be a good start of iterating on Vector.
-- Legoktm
My views are most closely aligned with Ryan to be honest and historically I've lost 3rd party users to mediawiki instances because of how it looks, and the choice isn't great out there. I'm yet to meet someone outside our community who likes how Wikipedia looks, that's always the first thing they complain about. I fear we suffer from Stockholm syndrome working in our codebase that we forget about those voices that don't get heard. We are the .001%!
Whilst I'm glad to see the patch lego pointed to merged, I would wage money that $wgVectorResponsive when set to true would cause a lot of backlash (some people just don't like responsive sites [1]) and I predict it will need to become a separate skin called VectorResponsive to keep 'everyone happy'.
I think it's okay to iterate, but from my many experiences in the mediawiki skin world, you have to leave the status quo as an option and make the new skin experience opt in. Even then it's hard to get things out of opt in mode - personal compact toolbar was well received on the most part but a complete hack in implementation yet I saw no progress in consolidating it into our experience.
Vector is not evolving, otherwise it would have happened already. The only changes to it in the past 3 years have been badly received typography changes and minor tweaks.
Traditionally, more skins has created more headaches, but maybe it's time to rethink this infrastructure [2] and encourage a more abundant selection of skins on our wikis. From my perspective the lack of competition in the Wikipedia skin world is preventing innovation. FWIW I'd love to have a go at making a new skin based on Winter's ideas in my spare time with a fixed header, but given that I have no confidence it will ever get on the cluster I have no motivation to do this. Where is Apex deployed for example [3]? Why can't I try this out on Wikipedia and see if I prefer the experience?
The closest thing I see to MediaWiki are Wikia wiki's and Wordpress and both of those seem to have a much more active and healthy skin ecosystem. Is this something we want to recreate or are we saying that Vector is the only skin MediaWiki will ever need? If that's the case, I'm troubled.
In MobileFrontend the Minerva skin was created and I would estimate is the most actively developed of skins at the moment. We make decisions that people don't like, to keep the interface as simple and uncluttered as we possibly can, as that's what it's designed for. People can choose Vector if they prefer that experience on mobile, and I truly hope they'll be able to try a responsive version of Vector too. I'm aware some people hate it but at least it's trying to create a drastically different Wikipedia site experience and I'd like to see more skins like this. Choice is an important aspect of any open source project.
[1] https://www.google.com/search?q=i+hate+responsive+sites&oq=i+hate+respon... [2] https://www.mediawiki.org/wiki/Requests_for_comment/Redo_skin_framework [3] https://www.mediawiki.org/wiki/Skin:Apex
On Tue, Jul 21, 2015 at 3:23 PM, Legoktm legoktm.wikipedia@gmail.com wrote:
On 07/21/2015 02:38 PM, Ryan Lane wrote:
There's not really a conversation. The UX lead is saying "Winter is dead, let's continue with the iterations on Vector", though there's no real iteration going on.
I'd consider https://gerrit.wikimedia.org/r/#/c/220667/ to be a good start of iterating on Vector.
-- Legoktm
Design mailing list Design@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/design
Traditionally, more skins has created more headaches, but maybe it's time to rethink this infrastructure [2] and encourage a more abundant selection of skins on our wikis. From my perspective the lack of competition in the Wikipedia skin world is preventing innovation. FWIW I'd love to have a go at making a new skin based on Winter's ideas in my spare time with a fixed header, but given that I have no confidence it will ever get on the cluster I have no motivation to do this. Where is Apex deployed for example [3]? Why can't I try this out on Wikipedia and see if I prefer the experience?
I've heard complaints from some skin writers, that the lack of stability in MediaWiki's skin system is a major annoyance for them. I don't know how representative that view is, but redesigning the skin system every 6 months is probably not a great way to get more skins made.
The closest thing I see to MediaWiki are Wikia wiki's and Wordpress and both of those seem to have a much more active and healthy skin ecosystem. Is this something we want to recreate or are we saying that Vector is the only skin MediaWiki will ever need? If that's the case, I'm troubled.
Last I checked, Wikia was running MediaWiki, and their code was open source (albeit, with weird dependencies).
Its unsurprising that wordpress is beating us in skin diversity, given the use case and how the install base of wordpress is distributed.
Our skin ecosystem could probably be better, but I'm unconvinced by this comparison. I don't think its as horrible as you make it out though. I've seen plenty of wikis use skins that are not monobook/vector.
Choice is an important aspect of any open source project.
As a general statement, that's debatable. There's plenty of open source projects that specifically try to reduce choice in order to be minimal, or meet other requirements. As far as MediaWiki, goes I'd agree that skin choice is an important goal. Its not entirely clear that that is an important goal for Wikimedia though.
-- bawolff
On Tue, Jul 21, 2015 at 4:00 PM, Jon Robson jrobson@wikimedia.org wrote:
My views are most closely aligned with Ryan to be honest and historically I've lost 3rd party users to mediawiki instances because of how it looks, and the choice isn't great out there. I'm yet to meet someone outside our community who likes how Wikipedia looks, that's always the first thing they complain about. I fear we suffer from Stockholm syndrome working in our codebase that we forget about those voices that don't get heard. We are the .001%!
If the problem is that important voices (readers') are not being heard, the solution is to ask them, not push for global deployment of a completely new and basically untested UI concept. Readers are no less opinionated than editors, and their wants and needs are no less important or heterogeneous. Whether Winter looks more in line with someone (Ryan's?) idea of "the industry standard" in 2015 than Vector doesn't mean it provides a better experience for anyone.
You can't just assert that Winter's an improvement; you have to test. Winter was designed based on a certain set of assumptions on what people want out of their Wikipedia reading/editing experience. Even if you believe, as I do, that many of these are good/clever/inspired assumptions, Winter (or new features introduced by Winter) still needs to be tested before they are deployed as the default option on Wikimedia wikis. Vector was also designed based assumptions... but it also had the benefit of a whole lot of user testing and community consultation.
I think it's okay to iterate, but from my many experiences in the mediawiki skin world, you have to leave the status quo as an option and make the new skin experience opt in. Even then it's hard to get things out of opt in mode - personal compact toolbar was well received on the most part but a complete hack in implementation yet I saw no progress in consolidating it into our experience.
The fact that iterating takes time, and that it's hard to get existing users to adopt new software, is not a valid argument for making sudden, sweeping changes to the desktop Wikipedia interface. Iterating takes time because when it's done well (read: when you're actually iterating, rather than making ad hoc changes), the software is being improved *for the people it's designed for* and *for the things its designed to do*. If you think it's going to be hard to drive adoption of incremental UI improvements, try getting buy in on a whole slew of them introduced all at once, without a solid rationale or empirical evidence to back up your decision.
Vector is not evolving, otherwise it would have happened already. The only changes to it in the past 3 years have been badly received typography changes and minor tweaks.
This sounds like a problem with process, not a problem with Vector. Switching to Winter won't fix it. If we somehow managed to introduce Winter tomorrow, how would we assure that it continued to evolve?
Traditionally, more skins has created more headaches, but maybe it's time to rethink this infrastructure [2] and encourage a more abundant selection of skins on our wikis. From my perspective the lack of competition in the Wikipedia skin world is preventing innovation. FWIW I'd love to have a go at making a new skin based on Winter's ideas in my spare time with a fixed header, but given that I have no confidence it will ever get on the cluster I have no motivation to do this. Where is Apex deployed for example [3]? Why can't I try this out on Wikipedia and see if I prefer the experience?
This seems to be the heart of the problem (at least, the problem for WMF as a software company). We need to make it easier to test and then incorporate test results (including direct user feedback) into products. Again, this is a process/infrastructure issue, not a problem with our current UI. Tests can be standard usability studies; single-user opt-in deployments (like beta features); time-limited pilots for a single wiki, namespace, or page; or controlled A/B tests with random sampling of a class of users. None of that has anything to do with whether Winter is better, or worse, than Vector.
I like Winter. I'd like to see us move in that direction. But what I really want to do is test whether Winter works for the people it's supposed to: readers and editors. Because not everyone likes what I like, and not everyone interacts with Wikipedia/MediaWiki the way I do.
We're talking about Winter like it's one thing, but it's really a collection of bold, interesting design ideas. I find many of these design ideas compelling ('sticky' search/menu bar; responsive design), other less so (hiding the ToC under a hamburger menu...ugh). It's not an all or nothing proposition with Winter, or with Vector. We should be talking about how to upgrade our testing infrastructure and our design process so that we can incorporate the best parts of Winter into the default MediaWiki user experience of MediaWiki. Then we can call it whatever we want.
Design mailing list Design@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/design
On Thu, Jul 23, 2015 at 1:31 PM, Jonathan Morgan jmorgan@wikimedia.org wrote:
On Tue, Jul 21, 2015 at 4:00 PM, Jon Robson jrobson@wikimedia.org wrote:
My views are most closely aligned with Ryan to be honest and historically I've lost 3rd party users to mediawiki instances because of how it looks, and the choice isn't great out there. I'm yet to meet someone outside our community who likes how Wikipedia looks, that's always the first thing they complain about. I fear we suffer from Stockholm syndrome working in our codebase that we forget about those voices that don't get heard. We are the .001%!
If the problem is that important voices (readers') are not being heard, the solution is to ask them, not push for global deployment of a completely new
Yes, agreed. The reading web team is actually thinking about ways we can gather feedback from our reader audience to aid design.
and basically untested UI concept. Readers are no less opinionated than editors, and their wants and needs are no less important or heterogeneous. Whether Winter looks more in line with someone (Ryan's?) idea of "the industry standard" in 2015 than Vector doesn't mean it provides a better experience for anyone.
Agreed. I do recognise however that Vector is not the best experience and I'm lamenting our conservativeness in the area of skins. I'm personally frustrated that it seems that despite recognising we have no understand of how to go about making it better. I personally do not feel empowered to try things and listen to the things we have learnt fromexperiments. The recent change legoktm points out slaps some responsive styles on Vector. It's not clear how we are going to test this and measure whether it is good or bad and eventually make a decision whether we should do it or not (FWIW I think slapping on media queries is not a recipe for success in making mobile device friendly experience but I was happy to see someone try something and I'm happy to be proved wrong)
You can't just assert that Winter's an improvement; you have to test.
I'm personally not asserting anything but FWIW I recall user tests were run on experiments such as the fixed header and showed that people found located items better. We didn't see it through to completion though (whoever was involved in that please let me know what happened).
Winter was designed based on a certain set of assumptions on what people want out of their Wikipedia reading/editing experience. Even if you believe, as I do, that many of these are good/clever/inspired assumptions, Winter (or new features introduced by Winter) still needs to be tested before they are deployed as the default option on Wikimedia wikis. Vector was also designed based assumptions... but it also had the benefit of a whole lot of user testing and community consultation.
Sure and as I said above a lot of them were - but I wasn't involved in that.
I think it's okay to iterate, but from my many experiences in the mediawiki skin world, you have to leave the status quo as an option and make the new skin experience opt in. Even then it's hard to get things out of opt in mode - personal compact toolbar was well received on the most part but a complete hack in implementation yet I saw no progress in consolidating it into our experience.
The fact that iterating takes time, and that it's hard to get existing users to adopt new software, is not a valid argument for making sudden, sweeping changes to the desktop Wikipedia interface.
I wasn't suggesting this, but as Ryan says, various big websites do big redesigns, and do just fine. These designs are not sweeping changes, they have been iterated on and beta tested on small audiences, over a period of time, and then suddenly unveiled in completion to an audience, so despite the backlash that is guaranteed by big redesigns from some of your users, on the long term these websites have made informed decisions on how the site should look to improve the usability and experience of users.
Iterating takes time because
when it's done well (read: when you're actually iterating, rather than making ad hoc changes),
Sure.. but right now we don't even seem to be iterating and that to me is the problem. We've tried iterating in beta features but those initiatives (personal compact toolbar, typography refresh, multimedia viewer) struggled for various reasons.
the software is being improved for the people it's designed for and for the things its designed to do. If you think it's going to be hard to drive adoption of incremental UI improvements, try getting buy in on a whole slew of them introduced all at once, without a solid rationale or empirical evidence to back up your decision.
Vector is not evolving, otherwise it would have happened already. The only changes to it in the past 3 years have been badly received typography changes and minor tweaks.
This sounds like a problem with process, not a problem with Vector.
And this is the crux of the matter in my opinion and what I am asking. How do people think we should improve this process? We do a lot of lamenting and defending on this list but never seem to offer action items... any bold offers about how we reverse this anti-pattern?
Traditionally, more skins has created more headaches, but maybe it's time to rethink this infrastructure [2] and encourage a more abundant selection of skins on our wikis. From my perspective the lack of competition in the Wikipedia skin world is preventing innovation. FWIW I'd love to have a go at making a new skin based on Winter's ideas in my spare time with a fixed header, but given that I have no confidence it will ever get on the cluster I have no motivation to do this. Where is Apex deployed for example [3]? Why can't I try this out on Wikipedia and see if I prefer the experience?
This seems to be the heart of the problem (at least, the problem for WMF as a software company). We need to make it easier to test and then incorporate test results (including direct user feedback) into products. Again, this is a process/infrastructure issue, not a problem with our current UI. Tests can be standard usability studies; single-user opt-in deployments (like beta features); time-limited pilots for a single wiki, namespace, or page; or controlled A/B tests with random sampling of a class of users. None of that has anything to do with whether Winter is better, or worse, than Vector.
I like Winter. I'd like to see us move in that direction. But what I really want to do is test whether Winter works for the people it's supposed to: readers and editors. Because not everyone likes what I like, and not everyone interacts with Wikipedia/MediaWiki the way I do.
We're talking about Winter like it's one thing, but it's really a collection of bold, interesting design ideas. I find many of these design ideas compelling ('sticky' search/menu bar; responsive design), other less so (hiding the ToC under a hamburger menu...ugh). It's not an all or nothing proposition with Winter, or with Vector. We should be talking about how to upgrade our testing infrastructure and our design process so that we can incorporate the best parts of Winter into the default MediaWiki user experience of MediaWiki. Then we can call it whatever we want.
Design mailing list Design@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/design
-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF)
Design mailing list Design@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/design
On 2015-07-23, at 8:49 PM, Jon Robson wrote:
This sounds like a problem with process, not a problem with Vector.
And this is the crux of the matter in my opinion and what I am asking. How do people think we should improve this process? We do a lot of lamenting and defending on this list but never seem to offer action items... any bold offers about how we reverse this anti-pattern?
We need the *process* to be more obvious, and the *principles* behind the changes agreed-upon. I'll elaborate, but first, some context…
On 2015-07-24, at 1:40 AM, Ryan Lane wrote:
What I'm saying is that there should be a process to make an interface change directed at readers, with stated test results, A/B tested, and adopted if testing meets the criteria of the test results. The editor community should have little to no say in the process, except to suggest experiments or question obviously incorrect test results.
The basic idea is that through proper testing of features you should be able to know an experience is better for the readers without them having a direct voice.
An example: Make search more discoverable. Add a feature or make an interface change to test this. A/B test it. See if the frequency of search usage increased. See if it adversely affected other metrics. If it helped search usage and didn't negatively affect other metrics, adopt the change.
The issue is that there will be a vocal minority of people who absolutely hate this change, no matter what it is. These people should be ignored.
This is *exactly* the sort of issue that leads to conflict. Some parts emphasized:
The editor community should have little to no say in the process
or
a vocal minority
or, worst,
These people should be ignored.
A/B testing is one thing, but our problems are *social*, are *political*, and that's precisely what I see above. This is not a productive approach, because it pits stakeholders against one another. Wikipedia is not a *competition*, it's supposed to be a *collaboration*. It's even worse when it's framed in the otherwise reasonable context of A/B testing, because that conceals the part of it that has one particular subset of stakeholders decide what metrics (e.g. search) are important. While I do disagree, I don't mean to argue specifically against Ryan Lane's position here—I'm just using it as an example of positions that exacerbate the social problems. It doesn't matter in what ways he or I are right or wrong on the approach if it's going to lead to another conflict.
If we ignore people, or worse, specifically disenfranchise them, that's sure to lead to conflict when the interested stakeholders pursue their interests and thus become that "vocal minority". Rather, we need an obvious process, backed by principles that most of everyone can agree on, so that we don't hit catches like one-sided priorities. Yes, we do need to figure out how to make sure that reader interests are represented in those principles. If the shared process and shared principles lead us to something that some people don't agree with, *then* there might be a justification to tell that minority to stuff it in the name of progress.
I'll leave off there, because the next thing I intuitively want to go onto involve my personal views, and those aren't relevant to this point (they can wait for later). Instead: a question: what *principles* ought to underpin designs moving forward from Vector? If we can't work through disagreements there, we're going to see objections once an unbalanced set of principles are implemented in design patterns.
Nihiltres
While I agree in principle with what Nihiltres states, it doesn't help us very much. There is so much resentment that has built up on several sides, that I don't see how we are going to get past that.
Also the scaling required to fulfill everyone's wishes using the stated methodology is huge. Think in the order of putting at least 25 people on requirements analyses, design and technology work for a year. My gut feeling, based on years of Dev and Wikimedia experience tells me that this would be a bigger project than VE. Which is of course insane with it being 'just' a skin, but it's the only way we can right this ship, unless a lot of people learn something about the virtues of imperfection.
DJ
On Fri, Jul 24, 2015 at 10:17 AM, Nihiltres nihiltres@ataraxic.net wrote:
On 2015-07-23, at 8:49 PM, Jon Robson wrote:
This sounds like a problem with process, not a problem with Vector.
And this is the crux of the matter in my opinion and what I am asking. How do people think we should improve this process? We do a lot of lamenting and defending on this list but never seem to offer action items... any bold offers about how we reverse this anti-pattern?
We need the *process* to be more obvious, and the *principles* behind the changes agreed-upon. I'll elaborate, but first, some context…
On 2015-07-24, at 1:40 AM, Ryan Lane wrote:
What I'm saying is that there should be a process to make an interface change directed at readers, with stated test results, A/B tested, and adopted if testing meets the criteria of the test results. The editor community should have little to no say in the process, except to suggest experiments or question obviously incorrect test results.
The basic idea is that through proper testing of features you should be
able
to know an experience is better for the readers without them having a
direct
voice.
An example: Make search more discoverable. Add a feature or make an interface change to test this. A/B test it. See if the frequency of
search
usage increased. See if it adversely affected other metrics. If it helped search usage and didn't negatively affect other metrics, adopt the
change.
The issue is that there will be a vocal minority of people who absolutely hate this change, no matter what it is. These people should be ignored.
This is *exactly* the sort of issue that leads to conflict. Some parts emphasized:
The editor community should have little to no say in the process
or
a vocal minority
or, worst,
These people should be ignored.
A/B testing is one thing, but our problems are *social*, are *political*, and that's precisely what I see above. This is not a productive approach, because it pits stakeholders against one another. Wikipedia is not a *competition*, it's supposed to be a *collaboration*. It's even worse when it's framed in the otherwise reasonable context of A/B testing, because that conceals the part of it that has one particular subset of stakeholders decide what metrics (e.g. search) are important. While I do disagree, I don't mean to argue specifically against Ryan Lane's position here—I'm just using it as an example of positions that exacerbate the social problems. It doesn't matter in what ways he or I are right or wrong on the approach if it's going to lead to another conflict.
If we ignore people, or worse, specifically disenfranchise them, that's sure to lead to conflict when the interested stakeholders pursue their interests and thus become that "vocal minority". Rather, we need an obvious process, backed by principles that most of everyone can agree on, so that we don't hit catches like one-sided priorities. Yes, we do need to figure out how to make sure that reader interests are represented in those principles. If the shared process and shared principles lead us to something that some people don't agree with, *then* there might be a justification to tell that minority to stuff it in the name of progress.
I'll leave off there, because the next thing I intuitively want to go onto involve my personal views, and those aren't relevant to this point (they can wait for later). Instead: a question: what *principles* ought to underpin designs moving forward from Vector? If we can't work through disagreements there, we're going to see objections once an unbalanced set of principles are implemented in design patterns.
Nihiltres _______________________________________________ Design mailing list Design@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/design
Nihiltres <nihiltres@...> writes:
An example: Make search more discoverable. Add a feature or make an interface change to test this. A/B test it. See if the frequency of search usage increased. See if it adversely affected other metrics. If it helped search usage and didn't negatively affect other metrics, adopt the change.
The issue is that there will be a vocal minority of people who absolutely hate this change, no matter what it is. These people should be ignored.
This is *exactly* the sort of issue that leads to conflict. Some parts
emphasized:
The editor community should have little to no say in the process
or
a vocal minority
or, worst,
These people should be ignored.
A/B testing is one thing, but our problems are *social*, are *political*,
and that's precisely what I see
above. This is not a productive approach, because it pits stakeholders
against one another. Wikipedia is
not a *competition*, it's supposed to be a *collaboration*. It's even
worse when it's framed in the
otherwise reasonable context of A/B testing, because that conceals the
part of it that has one particular
subset of stakeholders decide what metrics (e.g. search) are important.
While I do disagree, I don't mean
to argue specifically against Ryan Lane's position here—I'm just using it
as an example of positions
that exacerbate the social problems. It doesn't matter in what ways he or
I are right or wrong on the
approach if it's going to lead to another conflict.
The idea is to remove the social or political problems from the process. Define the goals and feature sets (this is the part of the process that requires community interaction), implement and test the changes, review the results. The data is the voice of the community. It's what proves if an idea is good or bad.
As I said before, though, there's always some vocal minority that will hate change, even when it's presented with data proving it to be good. These people should be ignored at this stage of the process. They can continue to provide input to future changes, but the data should be authoritative.
If we ignore people, or worse, specifically disenfranchise them, that's
sure to lead to conflict when the
interested stakeholders pursue their interests and thus become that "vocal
minority". Rather, we need
an obvious process, backed by principles that most of everyone can agree
on, so that we don't hit catches
like one-sided priorities. Yes, we do need to figure out how to make sure
that reader interests are
represented in those principles. If the shared process and shared
principles lead us to something that
some people don't agree with, *then* there might be a justification to
tell that minority to stuff it in the
name of progress.
I'll leave off there, because the next thing I intuitively want to go onto
involve my personal views, and
those aren't relevant to this point (they can wait for later). Instead: a
question: what *principles*
ought to underpin designs moving forward from Vector? If we can't work
through disagreements there,
we're going to see objections once an unbalanced set of principles are
implemented in design patterns.
There's not really a lack of principles, there's a lack of reasonable process. What's wrong with change guided by data science? We know the scientific process works. The current process is design by a committee that's comprised mostly of people untrained in the field, with no data proving anyone's case. Even when there is data it's often ignored in favor of consensus of the editor community.
- Ryan
Having been away from WMF engineering and design for almost a year, I'd like to reiterate how what Ryan is outlining is not some bold outlandish idea. In fact, it's standard operating procedure for how the top tier of product development is done at every non-enterprise software company worth a damn.
At Quora, we basically follow a version of this philosophy, though we certainly consult a ton directly with our community before, during and after the product development process. We just do this in a consultative way—not a consensus driven one.
I would say the one unintentionally misleading part of what Ryan is saying is that it makes it sound like a zero sum game where the company wins and the community loses.
It's actually the opposite. Wikimedians today, wanting things to be perfect according to their standards and for consensus among all to be arrived at, dramatically slow things down and increasing the time/cost of development. When you have a much more rapid pace of change enabled by data and by the ability to ignore vocal minorities, it means more stuff gets done. This would free up huge amounts of design and development hours to focus on fixing tools near and dear to said vocal minorities, ultimately making everyone happier.
On Fri, Jul 24, 2015 at 4:54 PM Ryan Lane rlane32@gmail.com wrote:
Nihiltres <nihiltres@...> writes:
An example: Make search more discoverable. Add a feature or make an interface change to test this. A/B test it. See if the frequency of
search
usage increased. See if it adversely affected other metrics. If it
helped
search usage and didn't negatively affect other metrics, adopt the
change.
The issue is that there will be a vocal minority of people who
absolutely
hate this change, no matter what it is. These people should be ignored.
This is *exactly* the sort of issue that leads to conflict. Some parts
emphasized:
The editor community should have little to no say in the process
or
a vocal minority
or, worst,
These people should be ignored.
A/B testing is one thing, but our problems are *social*, are *political*,
and that's precisely what I see
above. This is not a productive approach, because it pits stakeholders
against one another. Wikipedia is
not a *competition*, it's supposed to be a *collaboration*. It's even
worse when it's framed in the
otherwise reasonable context of A/B testing, because that conceals the
part of it that has one particular
subset of stakeholders decide what metrics (e.g. search) are important.
While I do disagree, I don't mean
to argue specifically against Ryan Lane's position here—I'm just using it
as an example of positions
that exacerbate the social problems. It doesn't matter in what ways he or
I are right or wrong on the
approach if it's going to lead to another conflict.
The idea is to remove the social or political problems from the process. Define the goals and feature sets (this is the part of the process that requires community interaction), implement and test the changes, review the results. The data is the voice of the community. It's what proves if an idea is good or bad.
As I said before, though, there's always some vocal minority that will hate change, even when it's presented with data proving it to be good. These people should be ignored at this stage of the process. They can continue to provide input to future changes, but the data should be authoritative.
If we ignore people, or worse, specifically disenfranchise them, that's
sure to lead to conflict when the
interested stakeholders pursue their interests and thus become that
"vocal minority". Rather, we need
an obvious process, backed by principles that most of everyone can agree
on, so that we don't hit catches
like one-sided priorities. Yes, we do need to figure out how to make sure
that reader interests are
represented in those principles. If the shared process and shared
principles lead us to something that
some people don't agree with, *then* there might be a justification to
tell that minority to stuff it in the
name of progress.
I'll leave off there, because the next thing I intuitively want to go
onto involve my personal views, and
those aren't relevant to this point (they can wait for later). Instead: a
question: what *principles*
ought to underpin designs moving forward from Vector? If we can't work
through disagreements there,
we're going to see objections once an unbalanced set of principles are
implemented in design patterns.
There's not really a lack of principles, there's a lack of reasonable process. What's wrong with change guided by data science? We know the scientific process works. The current process is design by a committee that's comprised mostly of people untrained in the field, with no data proving anyone's case. Even when there is data it's often ignored in favor of consensus of the editor community.
- Ryan
Design mailing list Design@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/design
The idea is to remove the social or political problems from the process.
Everyone in basically any context wants to remove social and political problems. Ignoring them is not the same as removing them.
Define the goals and feature sets (this is the part of the process that requires community interaction), implement and test the changes, review
the
results. The data is the voice of the community. It's what proves if an
idea
is good or bad.
As I said before, though, there's always some vocal minority that will
hate
To be clear I dont think every small vocal minority needs to be taken into account and i dont think wikipedians do either. Sometimes people seem to use the word vocal minority for a majority of users in some class.
change, even when it's presented with data proving it to be good. These people should be ignored at this stage of the process. They can continue
to
provide input to future changes, but the data should be authoritative.
Data does not prove things "good". Data proves (or more likely provides some support but not proves) some objective hypothesis. Proving normative claims with objective data is pretty impossible.
That may sound pendantic, but i think its an important distinction. Evidence should be presented in the form of "This change improved findability of the edit button by 40% among anons in our experiment [link to details]. Therefor I/we believe this is a good change because I/we think that findability of edit button is important". Separating what the data proves and what are personal opinions about the data is important to make the "science" sound legitament and not manipulatrd.
There's not really a lack of principles, there's a lack of reasonable process. What's wrong with change guided by data science? We know the scientific process work.
We know its also extremely easy to manipulate, especially when the science is only done by one party that has a specific objective. It can also be myopic, concentrating on one factor well ignoring the holistic whole.
Ultimately the usefulness depends on the skill of whomever is doesigning and conducting the experiments.
The current process is design by a committee that's comprised mostly of people untrained in the field, with no data proving anyone's case. Even when there is data it's often ignored in favor of consensus of the editor community.
Consensus of the editor commmunity is ancedotal data. That data may be extremely biased and should be evaluated carefully. But it doesnt make sense to just throw it out totally, particularaly in cases where its the only data we have. We should also be evaluating why consensus and data are conflicting. Maybe there are unstudied factors causing the conflict so the two positions are not mutually exclusive.
-- Bawolff _______________________________________________
Design mailing list Design@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/design
Brian Wolff <bawolff@...> writes:
Data does not prove things "good". Data proves (or more likely provides
some support but not proves) some objective hypothesis. Proving normative claims with objective data is pretty impossible.
That may sound pendantic, but i think its an important distinction.
Evidence should be presented in the form of "This change improved findability of the edit button by 40% among anons in our experiment [link to details]. Therefor I/we believe this is a good change because I/we think that findability of edit button is important". Separating what the data proves and what are personal opinions about the data is important to make the "science" sound legitament and not manipulatrd.
It sounds pedantic because it is :). Good/bad in my proposal was targeting the hypothesis, not the moral concept of good/bad. Good = the hypothesis is shown to be effective; bad = the hypothesis is shown to be ineffective.
What you've ignored in my proposal is the part where the community input is part of the formation of the hypothesis. I also mentioned that vocal minorities should be ignored with the exception of questioning the methodology of the data analysis.
Consensus of the editor commmunity is ancedotal data. That data may be
extremely biased and should be evaluated carefully. But it doesnt make sense to just throw it out totally, particularaly in cases where its the only data we have. We should also be evaluating why consensus and data are conflicting. Maybe there are unstudied factors causing the conflict so the two positions are not mutually exclusive.
--
Anecdotal data should be used as a means of following up on experiments, but should not be considered in the data set as it's an unreliable source. If there's a large amount of anecdotal data coming in, it's something that should be part of the standard data set. There's obviously exceptions to this, but assuming there's enough data it should be possible to gauge the effectiveness of changes without relying on anecdotal data.
For instance, if a change negatively affects an editor's workflow, it should be reflected in data like "avg/p95/p99 time for x action to occur", where x is some normal editor workflow.
- Ryan
On Mon, Jul 27, 2015 at 11:02 AM, Ryan Lane rlane32@gmail.com wrote:
For instance, if a change negatively affects an editor's workflow, it should be reflected in data like "avg/p95/p99 time for x action to occur", where x is some normal editor workflow.
That is indeed one way you can provide evidence of correlation; but in live deployments (which are, at best, quasi-experiments https://en.wikipedia.org/wiki/Quasi-experiment), you seldom get results that are as unequivocal as the example you're presenting here. And quantifying the influence of a single causal factor (such as the impact of a particular UI change on time-on-task for this or that editing workflow) is even harder.
Knowing that something occurs isn't the same as knowing why. Take the English Wikipedia editor decline. There has been a lot of good research on this subject, and we have confidently identified a set of factors that are likely contributors. Some of these can be directly measured: the decreased retention rate of newcomers; the effect of early, negative experiences on newcomer retention; a measurable increase over time in phenomena (like reverts, warnings, new article deletions) that likely cause those negative experiences. But none of us who have studied the editor decline believe that these are the only factors. And many community members who have read our research don't even accept our premises, let alone our findings.
I'm not at all afraid of sounding pedantic here (or of writing a long-ass wall of text), because I think that many WMF and former-WMF participants in this discussion are glossing over important stuff: Yes, we need a more evidence-based product design process. But we also need a more collaborative, transparent, and iterative deployment process. Having solid research and data on the front-end of your product lifecycle is important, but it's not some kind of magic bullet and is no substitute for community involvement in product design (through the lifecycle).
We have an excellent Research & Data https://wikimediafoundation.org/wiki/Staff_and_contractors#Research_and_Data team. The best one we've ever had at WMF. Pound-for-pound, they're as good as or better than the Data Science teams at Google or Facebook. None of them would ever claim, as you seem to here, that all you need to build good products are well-formed hypotheses and access to buckets of log data.
I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in case he doesn't follow this list). We talked about strategies for deploying new products on Wikimedia projects: what works, what doesn't. He held up the design/deployment process for Vector as an example of *good* process, one that we should (re)adopt.
Vector was created based on extensive user research and community consultation[1]. Then WMF made a beta, and invited people across projects to opt-in and try it out on prototype wikis[2]. The product team set public criteria for when it would release the product as default across production projects: retention of 80% of the Beta users who had opted in, after a certain amount of time. When a beta tester opted out, they were sent a survey to find out why[3]. The product team attempted to triage the issues reported in these surveys, address them in the next iteration, or (if they couldn't/wouldn't fix them), at least publicly acknowledge the feedback. Then they created a phased deployment schedule, and stuck to it[4].
This was, according to Liam (who's been around the movement a lot longer than most of us at WMF), a successful strategy. It built trust, and engaged volunteers as both evangelists and co-designers. I am personally very eager to hear from other community members who were around at the time what they thought of the process, and/or whether there are other examples of good WMF product deployments that we could crib from as we re-assess our current process. From what I've seen, we still follow many good practices in our product deployments, but we follow them haphazardly and inconsistently.
Whether or not we (WMF) think it is fair that we have to listen to "vocal minorities" (Ryan's words), these voices often represent *and influence* the sentiments of the broader, less vocal, contributor base in important ways. And we won't be able to get people to accept our conclusions, however rigorously we demonstrate them or carefully we couch them in scientific trappings, if they think we're fundamentally incapable of building something worthwhile, or deploying it responsibly.
We can't run our product development like "every non-enterprise software company worth a damn" (Steven's words), and that shouldn't be our goal. We aren't a start-up (most of which fail) that can focus all our resources on one radical new idea. We aren't a tech giant like Google or Facebook, that can churn out a bunch of different beta products, throw them at a wall and see what sticks.
And we're not a commercial community-driven site like Quora or Yelp, which can constantly monkey with their interface and feature set in order to maximize ad revenue or try out any old half-baked strategy to monetize its content. There's a fundamental difference between Wikimedia and Quora. In Quora's case, a for-profit company built a platform and invited people to use it. In Wikimedia's case, a bunch of volunteers created a platform, filled it with content, and then a non-profit company was created to support that platform, content, and community.
Our biggest opportunity to innovate, as a company, is in our design process. We have a dedicated, multi-talented, active community of contributors. Those of us who are getting paid should be working on strategies for leveraging that community to make better products, rather than trying to come up with new ways to perform end runs around them.
Jonathan
1. https://usability.wikimedia.org/wiki/What%27s_new,_questions_and_answers#How... 2. https://usability.wikimedia.org/wiki/Prototype 3. https://usability.wikimedia.org/wiki/Beta_Feedback_Survey 4. https://usability.wikimedia.org/wiki/Releases/Default_Switch
I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in case he doesn't follow this list). We talked about strategies for deploying new products on Wikimedia projects: what works, what doesn't. He held up the design/deployment process for Vector as an example of *good* process, one that we should (re)adopt.
Vector was created based on extensive user research and community consultation[1]. Then WMF made a beta, and invited people across projects to opt-in and try it out on prototype wikis[2]. The product team set public criteria for when it would release the product as default across production projects: retention of 80% of the Beta users who had opted in, after a certain amount of time. When a beta tester opted out, they were sent a survey to find out why[3]. The product team attempted to triage the issues reported in these surveys, address them in the next iteration, or (if they couldn't/wouldn't fix them), at least publicly acknowledge the feedback. Then they created a phased deployment schedule, and stuck to it[4].
This was, according to Liam (who's been around the movement a lot longer than most of us at WMF), a successful strategy. It built trust, and engaged volunteers as both evangelists and co-designers. I am personally very eager to hear from other community members who were around at the time what they thought of the process, and/or whether there are other examples of good WMF product deployments that we could crib from as we re-assess our current process. From what I've seen, we still follow many good practices in our product deployments, but we follow them haphazardly and inconsistently.
I agree wholeheartedly with your email. But I wonder if this part is a bit looking at the past through rose coloured glasses. Vector roll out was certainly better than some other feature rollouts, but... it was hardly without pain if I remember correctly. Although it was a long time ago, and before I was involved on the dev side, so my memory is a bit fuzzy.
-- -bawolff
On Mon, Jul 27, 2015 at 1:59 PM, Brian Wolff bawolff@gmail.com wrote:
I agree wholeheartedly with your email. But I wonder if this part is a bit looking at the past through rose coloured glasses. Vector roll out was certainly better than some other feature rollouts, but... it was hardly without pain if I remember correctly. Although it was a long time ago, and before I was involved on the dev side, so my memory is a bit fuzzy.
I was also a bit surprised to hear that process held up as a positive example. But it was before my time as well, so I don't have direct knowledge, just what Liam related to me.
The parts of that process I'm most excited about are: 1. setting public success criteria ahead of time, based on user adoption/retention 2. the public commitment to iterate*, before broad rollout, based on specific feedback from beta testers.
- J
*and not just fix bugs, but actually revise/add/eliminate features
On 7/27/15, Jonathan Morgan jmorgan@wikimedia.org wrote:
On Mon, Jul 27, 2015 at 1:59 PM, Brian Wolff bawolff@gmail.com wrote:
I agree wholeheartedly with your email. But I wonder if this part is a bit looking at the past through rose coloured glasses. Vector roll out was certainly better than some other feature rollouts, but... it was hardly without pain if I remember correctly. Although it was a long time ago, and before I was involved on the dev side, so my memory is a bit fuzzy.
I was also a bit surprised to hear that process held up as a positive example. But it was before my time as well, so I don't have direct knowledge, just what Liam related to me.
The parts of that process I'm most excited about are:
- setting public success criteria ahead of time, based on user
adoption/retention 2. the public commitment to iterate*, before broad rollout, based on specific feedback from beta testers.
- J
*and not just fix bugs, but actually revise/add/eliminate features
Yes I agree that setting out public criteria ahead of time is something very nice. I see a lot of comments from users who feel they are powerless to prevent the feature from being fully deployed if it turns out to be bad, and thus don't want to have any trials at all, because they feel it leads down a road which cannot be turned back on.
--bawolff
Jonathan Morgan, 27/07/2015 23:05:
I was also a bit surprised to hear that process held up as a positive example. But it was before my time as well, so I don't have direct knowledge, just what Liam related to me.
Vector was probably the first case where MediaWiki turned into a real battlefield, with WMF on one side and volunteers (i.e. all the traditional MediaWiki users and developers) on the other. http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/49535/
Of course there are still lessons to learn from the process and Trevor's email proves some were learnt. :) I'm not sure what exactly would the the things to copy or drop, but as someone involved in communication of the initiative in 2009–10, I can say that: * explaining the project was much easier than it is with most WMF software projects now; * WMF got much better at releasing data and analysis (when they exist)!
Nemo
Jonathan Morgan <jmorgan@...> writes:
On Mon, Jul 27, 2015 at 11:02 AM, Ryan Lane
rlane32-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
For instance, if a change negatively affects an editor's workflow, it should be reflected in data like "avg/p95/p99 time for x action to occur", where x is some normal editor workflow.
That is indeed one way you can provide evidence of correlation; but in
live deployments (which are, at best, quasi-experiments), you seldom get results that are as unequivocal as the example you're presenting here. And quantifying the influence of a single causal factor (such as the impact of a particular UI change on time-on-task for this or that editing workflow) is even harder.
The idea of A/B tests is to try to isolate things. You're not going to get perfect data all of the time and you'll likely need to retry experiments with more focus until you can be assured your tests are accurate, but this is definitely doable in live deployments.
I used editing as an example, but you're right in that it's difficult to get reliable metrics for a lot of editing actions (though it should be a bit easier in VE). That's of course why I gave a search example previously, which is much easier to isolate. In fact, most reader based tests should be pretty reliable, since the reader feature set is much smaller and the number of readers is massive. This topic is about skin changes, btw ;).
Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on this subject, and we have confidently identified a set of factors that are likely contributors. Some of these can be directly measured: the decreased retention rate of newcomers; the effect of early, negative experiences on newcomer retention; a measurable increase over time in phenomena (like reverts, warnings, new article deletions) that likely cause those negative experiences. But none of us who have studied the editor decline believe that these are the only factors. And many community members who have read our research don't even accept our premises, let alone our findings.
The best way to solve a complex problem is to first understand the problem (which you've done through research), then to break it down into small actionable parts (you've already mentioned them), then to tackle each problem by proposing solutions, implementing them in a testable way and then seeing if the results are positive or not.
The results of changing the warnings had pretty strong indications that the new messages moved the retention numbers in a positive way, right? Why shouldn't we trust the data there? If the data wasn't good enough, is there any way to make them more accurate?methodology.
I'm not at all afraid of sounding pedantic here (or of writing a long-ass
wall of text), because I think that many WMF and former-WMF participants in this discussion are glossing over important stuff: Yes, we need a more evidence-based product design process. But we also need a more collaborative, transparent, and iterative deployment process. Having solid research and data on the front-end of your product lifecycle is important, but it's not some kind of magic bullet and is no substitute for community involvement in product design (through the lifecycle).
We have an excellent Research & Data team. The best one we've ever had at
WMF. Pound-for-pound, they're as good as or better than the Data Science teams at Google or Facebook. None of them would ever claim, as you seem to here, that all you need to build good products are well-formed hypotheses and access to buckets of log data.
Until the very, very recent past there wasn't even the ability to measure the simplest of things. There's no real-time or close to real-time measurements. There's no health dashboards for vital community metrics. There's no experimentation framework. Since there's no experiment framework there's no run-time controls for product managers to run A/B tests of feature flagged features. There's very few analytics events in MediaWiki.
I don't want to sound negative, because I understand why all of this is the case, since analytics has been poorly resourced, ignored and managed into the ground until pretty recently, but Wikimedia isn't at the level of most early startups when it comes to analytics.
Wikimedia does have (and has historically had) excellent researchers that have been doing amazing work with insanely small amounts of data and infrastructure.
I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in
case he doesn't follow this list). We talked about strategies for deploying new products on Wikimedia projects: what works, what doesn't. He held up the design/deployment process for Vector as an example of good process, one that we should (re)adopt.
Vector was created based on extensive user research and community
consultation[1]. Then WMF made a beta, and invited people across projects to opt-in and try it out on prototype wikis[2]. The product team set public criteria for when it would release the product as default across production projects: retention of 80% of the Beta users who had opted in, after a certain amount of time. When a beta tester opted out, they were sent a survey to find out why[3]. The product team attempted to triage the issues reported in these surveys, address them in the next iteration, or (if they couldn't/wouldn't fix them), at least publicly acknowledge the feedback. Then they created a phased deployment schedule, and stuck to it[4].
I was on that project (Usability Initiative) as an ops engineer. I was hired for it, in fact. I remember that project well and I wouldn't call it a major success. It was successful in that it changed the default skin to something that was slightly more modern than Monobook, but it was the only truly successful part of the entire project. I think Vector is the only surviving code from it. The vast majority of Vector features didn't make it permanently into the Vector skin. Mostly what stayed around was the "look and feel" of the skin.
The community was a lot more accepting of change then, but it was still a pretty massive battle. The PM of that project nearly worked herself to death.
Whether or not we (WMF) think it is fair that we have to listen to "vocal
minorities" (Ryan's words), these voices often represent and influence the sentiments of the broader, less vocal, contributor base in important ways. And we won't be able to get people to accept our conclusions, however rigorously we demonstrate them or carefully we couch them in scientific trappings, if they think we're fundamentally incapable of building something worthwhile, or deploying it responsibly.
Yeah. Obviously it's necessary to not ship broken or very buggy code, but that's a different story. It's also a lot easier to know if your code is broken when you A/B test it before it's shipped. It should be noticeable from the metrics, or the metrics aren't good enough.
We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We aren't a start-up (most of which fail) that can focus all our resources on one radical new idea. We aren't a tech giant like Google or Facebook, that can churn out a bunch of different beta products, throw them at a wall and see what sticks.
What's your proposal that's somehow better than what most of the rest of the sites on the internet are doing? Maybe you can't do exactly what they're doing due to lack of resources, but you can at least do the basics.
And we're not a commercial community-driven site like Quora or Yelp, which
can constantly monkey with their interface and feature set in order to maximize ad revenue or try out any old half-baked strategy to monetize its content. There's a fundamental difference between Wikimedia and Quora. In Quora's case, a for-profit company built a platform and invited people to use it. In Wikimedia's case, a bunch of volunteers created a platform, filled it with content, and then a non-profit company was created to support that platform, content, and community.
I don't understand how you can say this. This is exactly how fundraising at WMF works and it's been shown to be incredibly effective. WMF is most likely the most effective organization in the world at large-scale small donations. It's this way because it constantly tests changes to see what's more effective. It does this using almost exactly the methodology I'm describing. Why can't we bring a little bit of this awesomeness into the rest of the engineering organization?
- Ryan
Ryan Lane, 28/07/2015 07:52:
I don't understand how you can say this. This is exactly how fundraising at WMF works and it's been shown to be incredibly effective.
There is no proof that the WMF fundraising is effective. https://meta.wikimedia.org/wiki/Talk:Fundraising_2012/Report When you don't measure costs and externalities, of course any profit looks good.
Nemo
Responses inline!
On Mon, Jul 27, 2015 at 10:52 PM, Ryan Lane rlane32@gmail.com wrote:
The idea of A/B tests is to try to isolate things. You're not going to get perfect data all of the time and you'll likely need to retry experiments with more focus until you can be assured your tests are accurate, but this is definitely doable in live deployments.
I used editing as an example, but you're right in that it's difficult to get reliable metrics for a lot of editing actions (though it should be a bit easier in VE). That's of course why I gave a search example previously, which is much easier to isolate. In fact, most reader based tests should be pretty reliable, since the reader feature set is much smaller and the number of readers is massive. This topic is about skin changes, btw ;).
We started out talking about skin changes, but we've been in meta-discussion-land for a few days now. That's not surprising (we touched on probably the biggest perennial conflicts between WMF and the editing community). What's surprising to me is that, so far, this time the discussion has been both frank and relatively proactive. So I want to ride this wave as far as it takes us.
A/B tests are great, and we should use them more often for reader-facing UI. But a new default skin isn't just reader-facing; it's everyone-facing. Making things easier, more engaging, or more delightful for non-editors isn't going to do us much good if it makes things harder, less engaging, or less delightful for editors.
There are definitely products that are primarily reader-facing. But most of our products (and certainly the default skin) have a substantial impact on the editing experience as well. Earlier, you said the editor community "should be worked around when changes are meant to affect readers and those changes don't directly negatively affect editor metrics." I counter that: a) there is no single editor metric, or set of metrics, that we can use to fully determine the impact of this or that design change on the editing experience of Wikipedia. b) even if there were such metrics, it would be highly counterproductive for WMF to say to editors "we don't care about your experiences, just your aggregate performance". Also, dickish.
Because I see two issues at play here, and I think they are inextricably linked: We need to be more evidence-driven, and we need more, not less, community involvement in our design process.
If we don't become more evidence-driven (which requires updates to both or processes and our infrastructure), we will always struggle to build products that meet the needs of our users (readers, editors, third-party MediaWiki peeps).
But *whether or not we become more evidence-driven, *we will always struggle to get the products we build implemented, if our most powerful user group doesn't currently trust us to act in their best interest. Or even our own.
Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on this subject, and we have confidently identified a set of factors that are likely contributors. Some of these can be directly measured: the decreased retention rate of newcomers; the effect of early, negative experiences on newcomer retention; a measurable increase over time in phenomena (like reverts, warnings, new article deletions) that likely cause those negative experiences. But none of us who have studied the editor decline believe that these are the only factors. And many community members who have read our research don't even accept our premises, let alone our findings.
The best way to solve a complex problem is to first understand the problem (which you've done through research), then to break it down into small actionable parts (you've already mentioned them), then to tackle each problem by proposing solutions, implementing them in a testable way and then seeing if the results are positive or not.
The results of changing the warnings had pretty strong indications that the new messages moved the retention numbers in a positive way, right? Why shouldn't we trust the data there? If the data wasn't good enough, is there any way to make them more accurate?methodology.
The data were good :) Actually, Snuggle and the Teahouse both came out of this line of research. These two products share several features, that Winter (and most of our major products) don't:
1. They are permanently opt-in: no person has to use them. No Wikimedia project has to adopt them. 2. They add functionality, rather than replacing it. 3. They are incrementalist approaches to addressing a major issue identified through careful front-end research. 4. They were designed in collaboration (not just consultation) with editors. 4. They are powered (to this day) by dedicated volunteers who are invested in their success. 5. They were cheap to build, and are cheap to maintain.
Some of these features probably limit their overall impact. But they virtually assure their long-term sustainability, which means they can keep on addressing the newcomer retention problem, even after the grants/dissertations that supported their development are gone. FWIW, many other new editor engagement products have had to be scuttled after the product team that developed them (and championed them) was disbanded, or the Foundation's priorities changed.
I'm not suggesting that this design approach offers a template for how to make people <3 VE or whatever, but there are lessons here about how to do evidence-based design well, and about the advantages of getting core contributors to feel invested in what you build.
Until the very, very recent past there wasn't even the ability to measure the simplest of things. There's no real-time or close to real-time measurements. There's no health dashboards for vital community metrics. There's no experimentation framework. Since there's no experiment framework there's no run-time controls for product managers to run A/B tests of feature flagged features. There's very few analytics events in MediaWiki.
I don't want to sound negative, because I understand why all of this is the case, since analytics has been poorly resourced, ignored and managed into the ground until pretty recently, but Wikimedia isn't at the level of most early startups when it comes to analytics.
Wikimedia does have (and has historically had) excellent researchers that have been doing amazing work with insanely small amounts of data and infrastructure.
I didn't think you were dissing the researchers; sorry if it came off that way. My point was that our research & data team know that a) A/B tests alone aren't usually sufficient to justify major design changes and b) good science won't convince anyone if they already mistrust or dislike you. Leila and Aaron, for example, have had to invest a lot of time explaining, contextualizing, defending their research, trying to (re)build trust so that people will give their research a fair hearing.
I was on that project (Usability Initiative) as an ops engineer. I was hired for it, in fact. I remember that project well and I wouldn't call it a major success. It was successful in that it changed the default skin to something that was slightly more modern than Monobook, but it was the only truly successful part of the entire project. I think Vector is the only surviving code from it. The vast majority of Vector features didn't make it permanently into the Vector skin. Mostly what stayed around was the "look and feel" of the skin.
The community was a lot more accepting of change then, but it was still a pretty massive battle. The PM of that project nearly worked herself to death.
Right! It's way harder now. All of us whose jobs require us to interact with community members around product design have to fight that battle. There's a lot of mistrust: we're perceived by many as being incompetent, and/or acting in bad faith vis a vis the core contributors to Wikimedia projects. It really sucks sometimes.
But we, as an organization (if not as individuals), bear a good deal of responsibility for the state we're in. A lot of it stems from the way we have designed and deployed products in the past. Fixing that requires more than more research and better testing infrastructure. And perpetuating the meme that the community is afraid of change and that's why we can't have nice things... certainly doesn't help.
Whether or not we (WMF) think it is fair that we have to listen to "vocal
minorities" (Ryan's words), these voices often represent and influence the sentiments of the broader, less vocal, contributor base in important ways. And we won't be able to get people to accept our conclusions, however rigorously we demonstrate them or carefully we couch them in scientific trappings, if they think we're fundamentally incapable of building something worthwhile, or deploying it responsibly.
Yeah. Obviously it's necessary to not ship broken or very buggy code, but that's a different story. It's also a lot easier to know if your code is broken when you A/B test it before it's shipped. It should be noticeable from the metrics, or the metrics aren't good enough.
We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We aren't a start-up (most of which fail) that can focus all our resources on one radical new idea. We aren't a tech giant like Google or Facebook, that can churn out a bunch of different beta products, throw them at a wall and see what sticks.
What's your proposal that's somehow better than what most of the rest of the sites on the internet are doing? Maybe you can't do exactly what they're doing due to lack of resources, but you can at least do the basics.
My proposal is that we should follow a more participatory design process. Better tools and research are necessary, but insufficient. And the "consulting" model that Quora uses isn't appropriate to Wikimedia.
It sounds to me like you and Steven think that we can build faster and better if we distance ourselves more from the community--abstracting their experience as metrics, and limiting their participation to consultation. But I don't think that what's slowing us down is our efforts to work with communities around what we deploy, where we deploy it, and when. I think what slows us down is that we constantly say that we're open and collaborative, but often fail to be open and collaborative when it matters most. This engenders mistrust, which makes it harder for us to experiment, delays deployments, results in buggier, less usable, and less useful products, and virtually guarantees that many of our core users are going to defer or actively resist adopting what we build.
In order to dig ourselves out, let's pursue a two-pronged strategy of: a) evidence driven product development: using quantitative and qualitative research to decide what to build and how to build it b) a transparent, iterative, and participatory process: telling people what intend to build, when and under what circumstances we intend to deploy it, and consistently addressing the feedback we get from people at every stage, in good faith
We won't ever succeed with a) if we don't show that we can implement b) consistently.
And we're not a commercial community-driven site like Quora or Yelp,
which can constantly monkey with their interface and feature set in order to maximize ad revenue or try out any old half-baked strategy to monetize its content. There's a fundamental difference between Wikimedia and Quora. In Quora's case, a for-profit company built a platform and invited people to use it. In Wikimedia's case, a bunch of volunteers created a platform, filled it with content, and then a non-profit company was created to support that platform, content, and community.
I don't understand how you can say this. This is exactly how fundraising at WMF works and it's been shown to be incredibly effective. WMF is most likely the most effective organization in the world at large-scale small donations. It's this way because it constantly tests changes to see what's more effective. It does this using almost exactly the methodology I'm describing. Why can't we bring a little bit of this awesomeness into the rest of the engineering organization?
Fundraising is great! I love fundraising. And not just because they pay my salary--they have great research and an enviable testing infrastructure. But tracking the performance of banners that drive monetary contributions is a fundamentally different task from tracking the performance (<-- not sure that word even applies) of a whole new default UI that fundamentally changes the way both casual readers and dedicated editors interact with Wikipedia. Fundraising products, and the process by which we design and evaluate them, aren't representative of our big software products like Mobile site/apps, Content Translation, VE, Flow, etc.
That's why I'm pushing on your "we can make it work through A/B testing" thesis around deploying something as radical and complex as Winter, as opposed to iterating on Vector. A whole new skin affects everyone's experience of the site in complex and multifaceted ways; there's no single (or even primary) metric of performance. And we can't expect to short-cut the design process or short-circuit community involvement. The only way out is through.
Jonathan
Hey all!
I'm not very familiar with mediawiki skins, so apologies if this is ridiculous, not possible, mentioned already, etc, but what Jonathan said here really stood out to me as maybe at the heart of the issue:
"But a new default skin isn't just reader-facing; it's everyone-facing. Making things easier, more engaging, or more delightful for non-editors isn't going to do us much good if it makes things harder, less engaging, or less delightful for editors."
My question is, could mediawiki use one skin when editing (Vector, or rename it "Vector-Editing") and a copy of Vector ("Vector-Reading" or something) when not editing (i.e. when reading)? They'd be initially identical, but going forward they could begin to slowly diverge as required by their respective editing and reading flows. There'd just have to be a mechanism to switch at the appropriate time... in theory.
-Monte
On Tue, Jul 28, 2015 at 5:51 PM, Jonathan Morgan jmorgan@wikimedia.org wrote:
Responses inline!
On Mon, Jul 27, 2015 at 10:52 PM, Ryan Lane rlane32@gmail.com wrote:
The idea of A/B tests is to try to isolate things. You're not going to get perfect data all of the time and you'll likely need to retry experiments with more focus until you can be assured your tests are accurate, but this is definitely doable in live deployments.
I used editing as an example, but you're right in that it's difficult to get reliable metrics for a lot of editing actions (though it should be a bit easier in VE). That's of course why I gave a search example previously, which is much easier to isolate. In fact, most reader based tests should be pretty reliable, since the reader feature set is much smaller and the number of readers is massive. This topic is about skin changes, btw ;).
We started out talking about skin changes, but we've been in meta-discussion-land for a few days now. That's not surprising (we touched on probably the biggest perennial conflicts between WMF and the editing community). What's surprising to me is that, so far, this time the discussion has been both frank and relatively proactive. So I want to ride this wave as far as it takes us.
A/B tests are great, and we should use them more often for reader-facing UI. But a new default skin isn't just reader-facing; it's everyone-facing. Making things easier, more engaging, or more delightful for non-editors isn't going to do us much good if it makes things harder, less engaging, or less delightful for editors.
There are definitely products that are primarily reader-facing. But most of our products (and certainly the default skin) have a substantial impact on the editing experience as well. Earlier, you said the editor community "should be worked around when changes are meant to affect readers and those changes don't directly negatively affect editor metrics." I counter that: a) there is no single editor metric, or set of metrics, that we can use to fully determine the impact of this or that design change on the editing experience of Wikipedia. b) even if there were such metrics, it would be highly counterproductive for WMF to say to editors "we don't care about your experiences, just your aggregate performance". Also, dickish.
Because I see two issues at play here, and I think they are inextricably linked: We need to be more evidence-driven, and we need more, not less, community involvement in our design process.
If we don't become more evidence-driven (which requires updates to both or processes and our infrastructure), we will always struggle to build products that meet the needs of our users (readers, editors, third-party MediaWiki peeps).
But *whether or not we become more evidence-driven, *we will always struggle to get the products we build implemented, if our most powerful user group doesn't currently trust us to act in their best interest. Or even our own.
Knowing that something occurs isn't the same as knowing why. Take the
English Wikipedia editor decline. There has been a lot of good research on this subject, and we have confidently identified a set of factors that are likely contributors. Some of these can be directly measured: the decreased retention rate of newcomers; the effect of early, negative experiences on newcomer retention; a measurable increase over time in phenomena (like reverts, warnings, new article deletions) that likely cause those negative experiences. But none of us who have studied the editor decline believe that these are the only factors. And many community members who have read our research don't even accept our premises, let alone our findings.
The best way to solve a complex problem is to first understand the problem (which you've done through research), then to break it down into small actionable parts (you've already mentioned them), then to tackle each problem by proposing solutions, implementing them in a testable way and then seeing if the results are positive or not.
The results of changing the warnings had pretty strong indications that the new messages moved the retention numbers in a positive way, right? Why shouldn't we trust the data there? If the data wasn't good enough, is there any way to make them more accurate?methodology.
The data were good :) Actually, Snuggle and the Teahouse both came out of this line of research. These two products share several features, that Winter (and most of our major products) don't:
- They are permanently opt-in: no person has to use them. No Wikimedia
project has to adopt them. 2. They add functionality, rather than replacing it. 3. They are incrementalist approaches to addressing a major issue identified through careful front-end research. 4. They were designed in collaboration (not just consultation) with editors. 4. They are powered (to this day) by dedicated volunteers who are invested in their success. 5. They were cheap to build, and are cheap to maintain.
Some of these features probably limit their overall impact. But they virtually assure their long-term sustainability, which means they can keep on addressing the newcomer retention problem, even after the grants/dissertations that supported their development are gone. FWIW, many other new editor engagement products have had to be scuttled after the product team that developed them (and championed them) was disbanded, or the Foundation's priorities changed.
I'm not suggesting that this design approach offers a template for how to make people <3 VE or whatever, but there are lessons here about how to do evidence-based design well, and about the advantages of getting core contributors to feel invested in what you build.
Until the very, very recent past there wasn't even the ability to measure the simplest of things. There's no real-time or close to real-time measurements. There's no health dashboards for vital community metrics. There's no experimentation framework. Since there's no experiment framework there's no run-time controls for product managers to run A/B tests of feature flagged features. There's very few analytics events in MediaWiki.
I don't want to sound negative, because I understand why all of this is the case, since analytics has been poorly resourced, ignored and managed into the ground until pretty recently, but Wikimedia isn't at the level of most early startups when it comes to analytics.
Wikimedia does have (and has historically had) excellent researchers that have been doing amazing work with insanely small amounts of data and infrastructure.
I didn't think you were dissing the researchers; sorry if it came off that way. My point was that our research & data team know that a) A/B tests alone aren't usually sufficient to justify major design changes and b) good science won't convince anyone if they already mistrust or dislike you. Leila and Aaron, for example, have had to invest a lot of time explaining, contextualizing, defending their research, trying to (re)build trust so that people will give their research a fair hearing.
I was on that project (Usability Initiative) as an ops engineer. I was hired for it, in fact. I remember that project well and I wouldn't call it a major success. It was successful in that it changed the default skin to something that was slightly more modern than Monobook, but it was the only truly successful part of the entire project. I think Vector is the only surviving code from it. The vast majority of Vector features didn't make it permanently into the Vector skin. Mostly what stayed around was the "look and feel" of the skin.
The community was a lot more accepting of change then, but it was still a pretty massive battle. The PM of that project nearly worked herself to death.
Right! It's way harder now. All of us whose jobs require us to interact with community members around product design have to fight that battle. There's a lot of mistrust: we're perceived by many as being incompetent, and/or acting in bad faith vis a vis the core contributors to Wikimedia projects. It really sucks sometimes.
But we, as an organization (if not as individuals), bear a good deal of responsibility for the state we're in. A lot of it stems from the way we have designed and deployed products in the past. Fixing that requires more than more research and better testing infrastructure. And perpetuating the meme that the community is afraid of change and that's why we can't have nice things... certainly doesn't help.
Whether or not we (WMF) think it is fair that we have to listen to
"vocal minorities" (Ryan's words), these voices often represent and influence the sentiments of the broader, less vocal, contributor base in important ways. And we won't be able to get people to accept our conclusions, however rigorously we demonstrate them or carefully we couch them in scientific trappings, if they think we're fundamentally incapable of building something worthwhile, or deploying it responsibly.
Yeah. Obviously it's necessary to not ship broken or very buggy code, but that's a different story. It's also a lot easier to know if your code is broken when you A/B test it before it's shipped. It should be noticeable from the metrics, or the metrics aren't good enough.
We can't run our product development like "every non-enterprise software
company worth a damn" (Steven's words), and that shouldn't be our goal. We aren't a start-up (most of which fail) that can focus all our resources on one radical new idea. We aren't a tech giant like Google or Facebook, that can churn out a bunch of different beta products, throw them at a wall and see what sticks.
What's your proposal that's somehow better than what most of the rest of the sites on the internet are doing? Maybe you can't do exactly what they're doing due to lack of resources, but you can at least do the basics.
My proposal is that we should follow a more participatory design process. Better tools and research are necessary, but insufficient. And the "consulting" model that Quora uses isn't appropriate to Wikimedia.
It sounds to me like you and Steven think that we can build faster and better if we distance ourselves more from the community--abstracting their experience as metrics, and limiting their participation to consultation. But I don't think that what's slowing us down is our efforts to work with communities around what we deploy, where we deploy it, and when. I think what slows us down is that we constantly say that we're open and collaborative, but often fail to be open and collaborative when it matters most. This engenders mistrust, which makes it harder for us to experiment, delays deployments, results in buggier, less usable, and less useful products, and virtually guarantees that many of our core users are going to defer or actively resist adopting what we build.
In order to dig ourselves out, let's pursue a two-pronged strategy of: a) evidence driven product development: using quantitative and qualitative research to decide what to build and how to build it b) a transparent, iterative, and participatory process: telling people what intend to build, when and under what circumstances we intend to deploy it, and consistently addressing the feedback we get from people at every stage, in good faith
We won't ever succeed with a) if we don't show that we can implement b) consistently.
And we're not a commercial community-driven site like Quora or Yelp,
which can constantly monkey with their interface and feature set in order to maximize ad revenue or try out any old half-baked strategy to monetize its content. There's a fundamental difference between Wikimedia and Quora. In Quora's case, a for-profit company built a platform and invited people to use it. In Wikimedia's case, a bunch of volunteers created a platform, filled it with content, and then a non-profit company was created to support that platform, content, and community.
I don't understand how you can say this. This is exactly how fundraising at WMF works and it's been shown to be incredibly effective. WMF is most likely the most effective organization in the world at large-scale small donations. It's this way because it constantly tests changes to see what's more effective. It does this using almost exactly the methodology I'm describing. Why can't we bring a little bit of this awesomeness into the rest of the engineering organization?
Fundraising is great! I love fundraising. And not just because they pay my salary--they have great research and an enviable testing infrastructure. But tracking the performance of banners that drive monetary contributions is a fundamentally different task from tracking the performance (<-- not sure that word even applies) of a whole new default UI that fundamentally changes the way both casual readers and dedicated editors interact with Wikipedia. Fundraising products, and the process by which we design and evaluate them, aren't representative of our big software products like Mobile site/apps, Content Translation, VE, Flow, etc.
That's why I'm pushing on your "we can make it work through A/B testing" thesis around deploying something as radical and complex as Winter, as opposed to iterating on Vector. A whole new skin affects everyone's experience of the site in complex and multifaceted ways; there's no single (or even primary) metric of performance. And we can't expect to short-cut the design process or short-circuit community involvement. The only way out is through.
Jonathan
-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Design mailing list Design@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/design
On 2015-07-28, at 10:32 PM, Monte Hurd wrote:
Hey all!
I'm not very familiar with mediawiki skins, so apologies if this is ridiculous, not possible, mentioned already, etc, but what Jonathan said here really stood out to me as maybe at the heart of the issue:
"But a new default skin isn't just reader-facing; it's everyone-facing. Making things easier, more engaging, or more delightful for non-editors isn't going to do us much good if it makes things harder, less engaging, or less delightful for editors."
My question is, could mediawiki use one skin when editing (Vector, or rename it "Vector-Editing") and a copy of Vector ("Vector-Reading" or something) when not editing (i.e. when reading)? They'd be initially identical, but going forward they could begin to slowly diverge as required by their respective editing and reading flows. There'd just have to be a mechanism to switch at the appropriate time... in theory.
-Monte
No, that's a bad idea. Editing is the core feature of Wikipedia. The interface when not editing should *scream* editability. Disentangle reading from editing, and we risk exacerbating the existing problem of recruiting newbies: it would make it harder for them to acclimatize if there's a big interface shift on top of everything else they have to learn (citing, neutrality, you name it). For example, VisualEditor is an effort to reduce the existing shift, by (largely) removing wikitext from the list of things necessary to learn.
Nihiltres
Thanks for cc'ing me Jonathan, I wouldn't have seen this otherwise.
TL;DR - Objectively measurable criteria. Clear process. No surprises.
The context of my giving the example of Vector as a good example *of process* was after the presentation about the future of 'Flow' at Wikimania.[1] I highly recommend people read the slides of this session if you've not already - great stuff![2] In particular, I was talking about how the Usability Initiative team were the first to use an opt-in Beta process at the WMF. It was the use of iterative development, progressive rollout, and closed-loop feedback that made their work a successful *process*. I wasn't talking about the Vector skin per-se.
Significantly, they had a publicly-declared and measurable, criteria for determining what counted as "community acceptance/support". This criteria was 80% retention rate of opt-in users. They did not lock-down the features of one version of their beta and move to the next version until they could show that 80% of people who tried it, preferred it. Moreover, they stuck to this objective criteria for measuring consensus support all the way to the final rollout.[3]
This system was a great way to identify people who had the willingness to change but had concerns, as opposed to getting bogged down by people who would never willingly accept a change or people who would accept all changes regardless. It also meant that those people became 'community advocates' for the new system because they had positive experiences of their feedback being taken into account.
And I DO remember the process, and the significance that was attached to it by the team (which included Trevor Parscal), because in 2009 I interviewed the whole team in person for the Wikipedia Weekly podcast.[4] Far from "*looking at the past through rose coloured glasses" *I recall the specific pain-points on the day that the Vector Skin became the default. These were the inter-language links list being autocollapsed, and the Wikipedia logo was updated.[5] The fact that it was THESE things that caused all the controversy on the day that Vector went from Beta to opt-out is instructive. These were the two things that were NOT part of the Beta testing period - no process, surprises. Tthe people who had valid feedback had not been given an opportunity to provide it and valid feedback came instead in the form of swift criticism on mailing lists.[6]
My support for concept of a clearly defined, objectively measured, rollout *process* for new features is not new... When Fabrice announced "beta features" in November 2013 I was the first to respond - referring to the same examples, and telling the same story about the Usability Initiative's processes.[7]
Then, as now, the "beta features" tab lists the number of users who have opted-in to a tool, but there is no comparative/objective explanation of what that actually means! For example, it tells me that 33,418 people have opted-in to "Hovercards", but is that good? How long did it take to reach that level? How many people have switched it off? What proportion of the active editorship is that? And most importantly - what relationship does this number have to whether Hovercards will 'graduate' or 'fall' the opt-in Beta process?
Which brings me to the point I made to Jonathan, and also Pau, at Wikimania about the future of Flow. If there's two things we Wikimedians hate most, I've come to believe that they are: 1) The absence of a clear process, or a failure to follow that process 2) Being surprised
We can, generally, abide outcomes/decisions that we don't like (e.g. article-deletion debates) as long as the process by which that decision was arrived at was clearly explained, and objectively followed. I believe this is why there was so much anger and frustration about the 'autoconfirm article creation trial' on en.wp [8] and the 'superprotect' controversy - because they represented a failure to follow a process, and a surprise (respectively).
So, even more than the Vector skin or even the Visual Editor, Flow ABSOLUTELY MUST have a clear, objectively measurable, *process* for measuring community consensus because it will be replacing community-designed and community-operated workflows (e.g. [9]). This means that once it is enabled on a particular workflow: 1) an individual user can't opt-out to the old system. 2) it will most affect, and be most used by, admins and other very-active-users. Therefore, I believe that this development must be an iterative process of working on 1 workflow on 1 wiki at a time, with objective measures of consensus-support that are at least partially *determined by the affected community itself*. This will be the only way that Flow can gain community consensus for replacing the existing template/sub-page/gadget/transclusion/category-based workflows.[10]
Because Flow will be updating admin-centric workflows, if it is rolled-out in a way that is anything less than this then it will strike the community as hubris - "it is necessary to destroy the town in order to save it".[11]
-Liam / Wittylama
P.S. While you're at it please make ALL new features go through the "Beta features" process with some consistent/discoverable process. As it is, some things live there permanently in limbo, some things DO have a process associated with them, and some things bypass the beta system altogether. As bawolff said, this means people feel they don't have any influence over the rollout process and therefore chose to not be involved at all.[12]
[1] https://wikimania2015.wikimedia.org/wiki/Submissions/User(s)_Talk(ing):_The_...
[2] https://wikimania2015.wikimedia.org/wiki/File:User(s)_Talk(ing)_-_Wikimania_... [3] https://blog.wikimedia.org/2010/05/13/a-new-look-for-wikipedia/ [4] Sorry - I can't find the file anymore though. This was the page: https://en.wikipedia.org/wiki/Wikipedia:WikipediaWeekly/Episode76 [5] https://blog.wikimedia.org/2010/05/13/wikipedia-in-3d/ [6] https://commons.wikimedia.org/wiki/Talk:Wikipedia/2.0#Logo_revisions_need_in... [7] https://lists.wikimedia.org/pipermail/wikimedia-l/2013-November/128896.html [8] https://en.wikipedia.org/wiki/Wikipedia:Autoconfirmed_article_creation_trial [9] https://wikimania2015.wikimedia.org/w/index.php?title=File:User(s)_Talk(ing)... [10] https://wikimania2015.wikimedia.org/w/index.php?title=File:User(s)_Talk(ing)... [11] https://en.wikipedia.org/wiki/B%E1%BA%BFn_Tre#Vietnam_War [12] https://lists.wikimedia.org/pipermail/design/2015-July/002355.html
wittylama.com Peace, love & metadata
On 27 July 2015 at 22:51, Jonathan Morgan jmorgan@wikimedia.org wrote:
On Mon, Jul 27, 2015 at 11:02 AM, Ryan Lane rlane32@gmail.com wrote:
For instance, if a change negatively affects an editor's workflow, it should be reflected in data like "avg/p95/p99 time for x action to occur", where x is some normal editor workflow.
That is indeed one way you can provide evidence of correlation; but in live deployments (which are, at best, quasi-experiments https://en.wikipedia.org/wiki/Quasi-experiment), you seldom get results that are as unequivocal as the example you're presenting here. And quantifying the influence of a single causal factor (such as the impact of a particular UI change on time-on-task for this or that editing workflow) is even harder.
Knowing that something occurs isn't the same as knowing why. Take the English Wikipedia editor decline. There has been a lot of good research on this subject, and we have confidently identified a set of factors that are likely contributors. Some of these can be directly measured: the decreased retention rate of newcomers; the effect of early, negative experiences on newcomer retention; a measurable increase over time in phenomena (like reverts, warnings, new article deletions) that likely cause those negative experiences. But none of us who have studied the editor decline believe that these are the only factors. And many community members who have read our research don't even accept our premises, let alone our findings.
I'm not at all afraid of sounding pedantic here (or of writing a long-ass wall of text), because I think that many WMF and former-WMF participants in this discussion are glossing over important stuff: Yes, we need a more evidence-based product design process. But we also need a more collaborative, transparent, and iterative deployment process. Having solid research and data on the front-end of your product lifecycle is important, but it's not some kind of magic bullet and is no substitute for community involvement in product design (through the lifecycle).
We have an excellent Research & Data https://wikimediafoundation.org/wiki/Staff_and_contractors#Research_and_Data team. The best one we've ever had at WMF. Pound-for-pound, they're as good as or better than the Data Science teams at Google or Facebook. None of them would ever claim, as you seem to here, that all you need to build good products are well-formed hypotheses and access to buckets of log data.
I had a great conversation with Liam Wyatt at Wikimania (cc'ing him, in case he doesn't follow this list). We talked about strategies for deploying new products on Wikimedia projects: what works, what doesn't. He held up the design/deployment process for Vector as an example of *good* process, one that we should (re)adopt.
Vector was created based on extensive user research and community consultation[1]. Then WMF made a beta, and invited people across projects to opt-in and try it out on prototype wikis[2]. The product team set public criteria for when it would release the product as default across production projects: retention of 80% of the Beta users who had opted in, after a certain amount of time. When a beta tester opted out, they were sent a survey to find out why[3]. The product team attempted to triage the issues reported in these surveys, address them in the next iteration, or (if they couldn't/wouldn't fix them), at least publicly acknowledge the feedback. Then they created a phased deployment schedule, and stuck to it[4].
This was, according to Liam (who's been around the movement a lot longer than most of us at WMF), a successful strategy. It built trust, and engaged volunteers as both evangelists and co-designers. I am personally very eager to hear from other community members who were around at the time what they thought of the process, and/or whether there are other examples of good WMF product deployments that we could crib from as we re-assess our current process. From what I've seen, we still follow many good practices in our product deployments, but we follow them haphazardly and inconsistently.
Whether or not we (WMF) think it is fair that we have to listen to "vocal minorities" (Ryan's words), these voices often represent *and influence* the sentiments of the broader, less vocal, contributor base in important ways. And we won't be able to get people to accept our conclusions, however rigorously we demonstrate them or carefully we couch them in scientific trappings, if they think we're fundamentally incapable of building something worthwhile, or deploying it responsibly.
We can't run our product development like "every non-enterprise software company worth a damn" (Steven's words), and that shouldn't be our goal. We aren't a start-up (most of which fail) that can focus all our resources on one radical new idea. We aren't a tech giant like Google or Facebook, that can churn out a bunch of different beta products, throw them at a wall and see what sticks.
And we're not a commercial community-driven site like Quora or Yelp, which can constantly monkey with their interface and feature set in order to maximize ad revenue or try out any old half-baked strategy to monetize its content. There's a fundamental difference between Wikimedia and Quora. In Quora's case, a for-profit company built a platform and invited people to use it. In Wikimedia's case, a bunch of volunteers created a platform, filled it with content, and then a non-profit company was created to support that platform, content, and community.
Our biggest opportunity to innovate, as a company, is in our design process. We have a dedicated, multi-talented, active community of contributors. Those of us who are getting paid should be working on strategies for leveraging that community to make better products, rather than trying to come up with new ways to perform end runs around them.
Jonathan
https://usability.wikimedia.org/wiki/What%27s_new,_questions_and_answers#How... 2. https://usability.wikimedia.org/wiki/Prototype 3. https://usability.wikimedia.org/wiki/Beta_Feedback_Survey 4. https://usability.wikimedia.org/wiki/Releases/Default_Switch -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
On 7/27/15, Ryan Lane rlane32@gmail.com wrote:
Brian Wolff <bawolff@...> writes:
Data does not prove things "good". Data proves (or more likely provides
some support but not proves) some objective hypothesis. Proving normative claims with objective data is pretty impossible.
That may sound pendantic, but i think its an important distinction.
Evidence should be presented in the form of "This change improved findability of the edit button by 40% among anons in our experiment [link to details]. Therefor I/we believe this is a good change because I/we think that findability of edit button is important". Separating what the data proves and what are personal opinions about the data is important to make the "science" sound legitament and not manipulatrd.
It sounds pedantic because it is :). Good/bad in my proposal was targeting the hypothesis, not the moral concept of good/bad. Good = the hypothesis is shown to be effective; bad = the hypothesis is shown to be ineffective.
At the risk of being a bit nitpicky here, if that's the case, and when you say "...there's always some vocal minority that will hate change, even when it's presented with data proving it to be good", what you really mean is "... there's always some vocal minority that will hate change, even when it's presented with data proving that there exists some hypothesis about the change that can be shown to be in effect".
Similarly, when you say "The data is the voice of the community. It's what proves if an idea is good or bad.", what you really mean "The data is the voice of the community. It's what proves if an idea has a hypothesis which has been shown to be in effect or not be in effect."
Well I tend to agree with the versions of these statements where "good" means a hypothesis is in effect, I don't think they make for a very compelling argument.
What you've ignored in my proposal is the part where the community input is part of the formation of the hypothesis. I also mentioned that vocal minorities should be ignored with the exception of questioning the methodology of the data analysis.
Fair enough. Well I don't think data "experiments" should be the be-all and end-all, its certainly a useful tool. I agree that ensuring community input in hypothesis formation and methodology critique is vital to make sure that we make the best use of this tool.
Anecdotal data should be used as a means of following up on experiments, but should not be considered in the data set as it's an unreliable source. If there's a large amount of anecdotal data coming in, it's something that should be part of the standard data set. There's obviously exceptions to this, but assuming there's enough data it should be possible to gauge the effectiveness of changes without relying on anecdotal data.
For instance, if a change negatively affects an editor's workflow, it should be reflected in data like "avg/p95/p99 time for x action to occur", where x is some normal editor workflow.
Say we wanted to improve discoverability of the edit button for new users. So we put it in <blink> tags. This pisses off everyone for the obvious reason. How would we measure user aggravation?
--bawolff
On Wed, Jul 22, 2015 at 1:00 AM, Jon Robson jrobson@wikimedia.org wrote:
I'd love to have a go at making a new skin based on Winter's ideas in my spare time with a fixed header, but given that I have no confidence it will ever get on the cluster I have no motivation to do this. Where is Apex deployed for example [3]? Why can't I try this out on Wikipedia and see if I prefer the experience?
mediawiki.org is in the cluster and, as I learned in the past weeks, experimentation with optional skins should be fine there. It's a first step.
S and I are in the process of requesting the availability of the Blueprint skin as optional and experimental in mediawiki.org -- https://phabricator.wikimedia.org/T93613. Having Apex (and/or Bluesky, Foreground, etc) joining the party would be very useful.
Whenever a new "unsolicited redesign" (not the most welcoming and encouraging term) shows up, we quickly point out to problems such design would suffer in real use. However, it is not easy at all to polish the problems a MediaWiki skin might go through without enabling it in a real wiki with real users and a real collection of various extensions.
More skins available in mediawiki.org would bring movement and progress to Vector and friends. If a skin wins adoption and excitement in mediawiki.org, it will be a matter of time that other projects will request it as equally optional and experimental. This might be a motivation for designers and frontend developers currently frustrated with discussions like this one, to work on Vector or its alternatives.
I strongly agree with what Isarra wrote. She's wise.
Ryan Lane wrote:
Isarra Yos writes:
Aye, we do need to move on. But there are also lessons in what has lingered all this time - we need to look at it and understand why in order to properly address it and serve the underlying needs. This is why we iterate on what's there, and don't only make drastically new things.
Do we actually know the lessons? Are they listed anywhere? Are they valid anymore? Do modern web practices cover them?
We need to do better about this.
It's great to iterate on things when they are relatively modern. It's folly to do so when you're almost a decade behind the industry standard. The argument itself is odd because Vector has not been iterating steadily towards modern practices. It's been stagnant for years.
And this.
The reader community is massive and has no voice, except their complaints across the internet. The WMF can and should be the voice for the reader community.
This is bullshit. "Decisions are made by those who show up." If you want to be part of the discussion, all you have to do is participate in good faith. That's how I'm involved, that's how you're involved, that's how Isarra and Nemo and Risker and nearly everyone else is involved. Pro-tip: that's not only how Wikipedia works, that's how life works too.
Isn't a motto of the movement "Be bold"? What happened to that?
Have you read the English Wikipedia page lately? It's a nightmare. :-)
https://en.wikipedia.org/wiki/Wikipedia:Be_bold
I keep meaning to cut it back at some point. The various namespace restrictions are such silliness. In any case, for a long time it's been "be bold, but not reckless." A big top-down redesign (not that you're directly proposing such, I'm speaking generally) would be reckless.
The status quo is that change never happens because people are too scared to change. There's no boldness here. There's hardly even basic assertiveness.
Yeah, the community has put in place some protections to ensure that it doesn't get trampled by a bunch of product managers sitting in San Francisco. I won't apologize for that, it's a feature, not a flaw.
"The community is scared of change" seems to be a common excuse from those too scared to work with communities outside of their own.
Or an argument of those who think it's not in the readers' best interest to have editors with little to no knowledge of software engineering or UX design dictating the engineering and design of reader features.
Encyclopedias are only supposed to be written by experts, too, right? :-) We're getting into trope territory here.
There's not really a conversation. The UX lead is saying "Winter is dead, let's continue with the iterations on Vector", though there's no real iteration going on. The editor community is opposed to any change that doesn't completely agree with them, where the "them" is around 5,000 people who also can't agree with each other and aren't qualified to be making the decisions to begin with.
What would you like to see changed in Vector? Concrete suggestions. For me, I'd like to see it become a responsive skin (in the process, killing MobileFrontend) and I'd like to see some of the gradients removed (or at least re-evaluated). Those are concrete, actionable items that will likely get resolved this year. Your turn!
MZMcBride
The community you're talking about is the editor community, which is a tiny fraction of the overall community, but attempts to speak with authority over the entirety of it. The vocal portion of the editor community that speaks with this authority is even a minor fraction of the editor community. We're talking about .001% of the entire community that holds the entire movement hostage (5167 people voted in the last election, and there's 430 million monthly active readers).
The reader community is massive and has no voice, except their complaints across the internet. The WMF can and should be the voice for the reader community.
In my experience, the WMF lacks the ability (or perhaps maturity?) to be that voice. Every time someone invokes the readers, usually they do it to re-assert their personal opinions on the manner, because they're losing an argument. After all, its not like the readers are going to rise up and object that their voice is being appropriated. If it was possible for computer programmers to know what there users wanted magically, without gathering any evidence, computer programming would be an entirely different field. As far as I know, misunderstanding user requirements is one of the top reasons software projects fail. WMF has certainly severely misjudged the requirements of the editor community at times, why would they be any better at the reader community?
I've also volunteered my time for the past 10 years, but as an engineer. I care about Wikimedia more as a reader than as an editor and my experience as a reader is not great and the editor community is the primary reason for this. The WMF's hesitation to make change is heavily based on the pitchforks and torches lit by this community.
Blame is easy to throw around. You can just as easily say that the problem is due to the WMF viewing the community as a problem to be worked around, creating an antagonistic relationship that degrades everyone's interests.
-- -bawolff
bawolff <bawolff+wn@...> writes:
The reader community is massive and has no voice, except their complaints across the internet. The WMF can and should be the voice for the reader community.
In my experience, the WMF lacks the ability (or perhaps maturity?) to be that voice. Every time someone invokes the readers, usually they do it to re-assert their personal opinions on the manner, because they're losing an argument. After all, its not like the readers are going to rise up and object that their voice is being appropriated. If it was possible for computer programmers to know what there users wanted magically, without gathering any evidence, computer programming would be an entirely different field. As far as I know, misunderstanding user requirements is one of the top reasons software projects fail. WMF has certainly severely misjudged the requirements of the editor community at times, why would they be any better at the reader community?
What I'm saying is that there should be a process to make an interface change directed at readers, with stated test results, A/B tested, and adopted if testing meets the criteria of the test results. The editor community should have little to no say in the process, except to suggest experiments or question obviously incorrect test results.
The basic idea is that through proper testing of features you should be able to know an experience is better for the readers without them having a direct voice.
An example: Make search more discoverable. Add a feature or make an interface change to test this. A/B test it. See if the frequency of search usage increased. See if it adversely affected other metrics. If it helped search usage and didn't negatively affect other metrics, adopt the change.
The issue is that there will be a vocal minority of people who absolutely hate this change, no matter what it is. These people should be ignored.
Blame is easy to throw around. You can just as easily say that the problem is due to the WMF viewing the community as a problem to be worked around, creating an antagonistic relationship that degrades everyone's interests.
I think there's a lot of blame to be thrown around, but the editor community is who's being worked around and they should be worked around when changes are meant to affect readers and those changes don't directly negatively affect editor metrics.
Of course, all of this should be backed-up by data, and it's surely a failing of the WMF that their development process isn't data driven.
- Ryan
I would like to point out that, perhaps, editors have some kind of 'nostalgia' with their skin, but I also see that editors are not helped when they want to configure or to improve the design of 'their' wiki, and CSS and JS pages have to be maintained by a few admins without an expertise in web design (I'm one of these), with bad results (in other cases, those pages are directly not maintained).
Wikipedia is visually outdated and the success of many companies offering designs for Wikipedia (and mass media announcing these services) makes this problem obvious. But being outdated is not the whole problem.
Why can big pictures on the articles overflow? Wouldn't it be simple to add a "max-width:100%"? Too many failures remain after too many years...
On Tue, Jul 21, 2015 at 9:22 PM, David Abián davidabian@wikimedia.es wrote:
Why can big pictures on the articles overflow? Wouldn't it be simple to add a "max-width:100%"? Too many failures remain after too many years...
I'm trying out something like this: https://en.wikipedia.org/w/index.php?title=User%3ATheDJ%2Fvector.css&typ...
But as I expected it is breaking Template:Panorama, Template:Wide_image etc.
Once more we come to the point, where we REALLY need CSS stylesheets per template, to make sure we can simply change SOMETHING.
DJ
This is better, since it limits impact to small resolution screens...
https://en.wikipedia.org/w/index.php?title=User%3ATheDJ%2Fvector.css&typ...
DJ
On Wed, Jul 22, 2015 at 1:04 PM, Derk-Jan Hartman < d.j.hartman+wmf_ml@gmail.com> wrote:
On Tue, Jul 21, 2015 at 9:22 PM, David Abián davidabian@wikimedia.es wrote:
Why can big pictures on the articles overflow? Wouldn't it be simple to add a "max-width:100%"? Too many failures remain after too many years...
I'm trying out something like this:
https://en.wikipedia.org/w/index.php?title=User%3ATheDJ%2Fvector.css&typ...
But as I expected it is breaking Template:Panorama, Template:Wide_image etc.
Once more we come to the point, where we REALLY need CSS stylesheets per template, to make sure we can simply change SOMETHING.
DJ