Stepping back...
We all seem to agree that user-set gender preference is a problematic
measure. We don't trust it. We can come up with plausible hypotheses for why
someone would mis-report their gender. And we can be almost certain it's not
a representative sample.
Do we have any ideas for what a better measure would be? Seems to me that
we're dealing with self-report data no matter what. But perhaps a more
explicit elicitation would be better? Folks have suggested a one-question
gender microsurvey before. Of course that will come with its own sources of
bias, and I don't quite see how we can control for them.
Given that it would be useful to have some data on gendered editing patterns
(whether we share it publicly or not), what are our options?
- Jonathan
On Thu, Aug 28, 2014 at 10:03 AM, Ryan Kaldari <rkaldari(a)wikimedia.org>
wrote:
And because I know someone is going to point this out... Actually,
restricting the data to only editors who have explicitly set their gender
would not completely control for changes in the rate of setting the
preference since that rate could change differently for men and women. It
would at least help to control for overall changes in the rate, for example,
due to the change in the interface that Steven mentioned.
Kaldari
On Aug 28, 2014, at 9:50 AM, Ryan Kaldari <rkaldari(a)wikimedia.org> wrote:
We could restrict the query to only look at editors who had explicitly set
their gender preference. That would control for changes in the rate of
setting the preference. The data would then only be biased by users who had
explicitly set their gender to the incorrect gender, which I imagine would
be a very small percentage.
Also, I would like to point out that even our most fundamental metrics are
affected by similar biases and inconsistencies. For example, the rate of new
editors is polluted by long-time IP editors who suddenly decide to create an
account. If there is an increase in IP editors converting to registered
editors, it can mislead us into thinking that we are suddenly attracting a
lot of new editors. This is just one of many examples I'm sure you're
already familiar with.
To answer your question though, I think if we notice something interesting
in the data (especially a downward trend), we would start a discussion about
it (as we would with any interesting data) and hopefully inspire someone to
dig deeper. Right now though we are mostly in the dark. See, for example,
Phoebe's most recent email to the gendergap list lamenting the lack of
research and data.
Kaldari
On Thu, Aug 28, 2014 at 1:43 AM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
wrote:
I think the biggest problem is this:
Let's say that we see the proportion of users who set their gender
preference to female falling. Is that because women are becoming less
likely to set their gender preference or because the ratio is actually
becoming more extreme?
Let's say that we see a trend in the messy data. What do we do about
that? Do we assume that it is a change in the actual ratio? Do we assume
that it is a change in the propensity of females to set their gender
preference and there's nothing for us to do? Or do we then decide that it
is important for us to gather good data so that we can actually know what's
going on?
-Aaron
On Thu, Aug 28, 2014 at 4:50 AM, Ryan Kaldari <rkaldari(a)wikimedia.org>
wrote:
On Tue, Aug 26, 2014 at 9:53 AM, Leila Zia <leila(a)wikimedia.org> wrote:
>
> 1. We look at the self-reported gender data and do some simple
> observations.
> Pros:
> + we will have an updated view of the gender gap problem.
> + we may spread seeds for further internal and/or external research
> about it.
> Cons:
> - If simple observations are not communicated properly, they will
> result in misinformation, that can possibly do more harm than good.
> - The results will be very limited given that we know the data is
> very limited and contains biases.
I would definitely like to avoid spreading misinformation, which is why
I proposed only looking at the percentage change per month rather than raw
numbers or raw percentages. The raw numbers are almost certainly off-base
and would be much more likely to be latched onto by the public and the
media. Percentage change per month is a less 'sexy' statistic, but might
give us better clues about what's actually going on with the gender gap over
time. It would also, for the first time, give us some window into how new
features or issues may be actively affecting the gender gap. But again, it
would only be a canary in a coal mine, not a tool to draw reliable
conclusions from. For that, we need more extensive tools and analysis.
> 2. We do extensive gender gap analysis internally.
> Proper gender gap analysis, in a way that can result in meaningful
> interventions (think products and features by us or the community) requires
> one person from R&D to work on it almost full time for a long period of time
> (at least six months, more probably a year). In this case, the question
> becomes: How should we prioritize this question? Just to give you some
> context: Which of the following areas should this one person from R&D work
> on?
> * reducing gender gap
> * increasing editor diversity in terms of nationality/language/...
> * increasing the number of active editors independent of gender
> * identifying areas Wikipedia is covered the least and finding
> editors who can contribute to those areas
> * ...
I think it's very difficult to judge how to set those priorities without
having more data. We know that the active editors number is on a downward
trajectory. Is the nationality/language diversity increasing or decreasing?
Is the gender gap increasing or decreasing? In cases where things are
actively getting worse, we should set our priorities to address them sooner,
but without knowing those trajectories it's impossible to say.
Kaldari
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Learning Strategist
Wikimedia Foundation
User:Jmorgan (WMF)
jmorgan(a)wikimedia.org
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org