On Mon, Aug 25, 2014 at 11:05 AM, Ryan Kaldari <rkaldari(a)wikimedia.org>
There is nothing stopping us, however, from analysing
using existing data. For example, we could generate graphs showing the
relative difference per month in edits by men and women and this data would
be unaffected by the unreliability of the absolute numbers (since we would
only be looking at changes in the percentages).
Using bad data here is worse than having no data. As Aaron and I
recommended when we talked in person, we should not invest is using the
gendered language preference data to track overall gender among editors.
It's a case of garbage in, garbage out. Instead, we should be investing in
more reliable ways to track gender among the editor population, if it's a
metric that we care about.