On Mon, May 19, 2014 at 7:06 PM, Edward Galvez egalvez@wikimedia.orgwrote:
Hi Pine,
Thank you for your bringing this page to our attention and for raising these interesting questions. I would have to agree that the “Program evaluation basics” page is not well-designed and should be revisited. We are actually going to be redesigning the entire evaluation portal soon and this page will likely be revised and included in the new design in some way. We are also continuing to build tools and learning resources (like the learning modules [1]) on evaluation to help explain some of these concepts.
I also agree that we need to think more about how we can define “impact” within the context of Wikimedia. Before we can reach a final “impact”, there are different layers of success in terms of outputs and short-, intermediate-, and long-term outcomes that help to measure success along the way.
We have been working on this approach to evaluation—we have developed resources for mapping a program’s theory of change in order to identify measurable outcomes, both near and far. Specifically, logic models are a useful tool for drawing out the steps needed to reach long-term impact and identifying more immediate indicators for evaluation; there is a resource page within the Evaluation portal on logic models [2] and I am working on a learning module that will guide anyone through what a logic model is and how to create one. As far as the term “impact”, it is very jargonistic and can be used in many ways which can be confusing. Since we began last year, we have been working to generate a growing glossary of a shared language around evaluation [3]. That glossary page is more current and inclusive than the original “Program Evaluation basics” page you linked to. Please feel free to discuss this and any other of those terms and definitions there on the portal.
Coincidentally, we are asking the community to provide feedback on some of the initial evaluation capacity building efforts our team has engaged in thus far. We’d like to hear feedback on the metrics and methods used so we can continue towards a shared understanding of Wikimedia programs and their impacts. We invite you (or anyone!) to read about the Community Dialogue [4] and join in the discussion on the Evaluation portal Parlor [5].
As always, I’m available for any questions!
Best,
Edward
[1] https://meta.wikimedia.org/wiki/Programs:Evaluation_portal/Learning_modules
[2]
https://meta.wikimedia.org/wiki/Programs:Evaluation_portal/Library/Logic_mod...
[3] https://meta.wikimedia.org/wiki/Programs:Evaluation_portal/Library/Glossary
[4] https://meta.wikimedia.org/wiki/Programs:Evaluation_portal/Parlor/Dialogue
[5]
https://meta.wikimedia.org/wiki/Programs_talk:Evaluation_portal/Parlor/Dialo...
Interesting exchange, thanks guys.
This particular topic needs a great deal of attention - not just because of how crucial it is to measuring success, but also because it has traditionally been both difficult and sensitive. Sue and others have raised questions over the years about how we determine if the various programs run by the WMF and chapters are useful or not, and if so to what degree. The WMF and the Program Evaluation team are just beginning to take steps to answer these questions, and in my opinion much more needs to be invested in this effort. I would like to see compliance with program evaluation standards integrated into every grant of funding drawing on donor funds. To smooth the way for this increased level of scrutiny each grant of any type should include an earmark for just this purpose.
Why? Because ultimately we are where we've always been -- with clear knowledge of what "impacts" matter but difficulty in working out whether anything any movement partner does or has done helps the bottom line. Tens of millions of dollars a year get spent, but most non-core spending would be hard to justify using strict measures of impact. That doesn't mean they don't *have* impact, just that because we don't forcefully ask the questions we don't and can't get the answers.
Every project, chapter, grant, initiative and expenditure should be scrutinized with basically the same few questions:
1) Does it add to the quantity and / or quality of content? 2) Does it add readers, either by increasing interest or improving accessibility? 3) Does it add editors?
Any major expense, grant request or new initiative should be measured by the answers to these questions, and every answer should be quantifiable to some degree. I would suggest that if the answer to all three is no for any non-core expense, heavy scrutiny should be applied to ensure funds aren't being wasted. The FDC does this to some extent now, although it asks the same questions much more vaguely and in terms of strategic alignment.
The logic models are useful tools for thinking through and explaining to an audience the structure and goals of a program, but they are vulnerable to the same fuzziness that exists without the tools. They are also not well oriented to measuring performance, which is really the crux of the problem and of Pine's question. Let's look at the logic model you've used as an example from the WikiWomen's edit-a-thon[1]. Their logic model is great at explaining the goals of the program. This is a major improvement, particularly if it is standardized across all WMF-funded projects. But does it help us answer the question about impact? Using the Boulmetin Dutwin model of analysis, we can get clear information about program efficiency and program effectiveness. But we don't get anywhere on impact, despite the use of the logic model.
The risk here is basically that the movement spends millions of dollars going down the wrong road. If we spend a decade funding editing workshops and Wikipedia Education Programs, and only at the end discover that they are completely ineffective, the opportunity cost (both in financing and in volunteer energy) would be enormous. By the same token if we for a decade fund chapters whose principal activities are ultimately judged to be ineffective, the scale of volunteer disillusionment could be breathtaking and the failure could threaten the entire movement. The WMF needs to focus on common program activities, drill down deeply into each one, and actively discourage any program (across all affiliate groups) that doesn't demonstrate its impact value.
Judging by meta I think Edward and the PE team have made a great start. But it's 2014 and the WMF is still at a starting point. Proposing that funding requests include SMART goals is not good enough, and I'd love to see Lila and the board empower Edward to do a lot more, and to insist on deep cooperation from entities receiving funds. At some point in the future we can move this discussion from "does anything anyone does have any impact?" to "knowing that we *can have an impact*, how much impact is enough to justify funding?"
[1]: https://meta.wikimedia.org/wiki/File:Editathon_LM_Stierch-page-001.jpg