On Mon, Aug 25, 2014 at 11:41 AM, Steven Walling <swalling@wikimedia.org> wrote:

On Mon, Aug 25, 2014 at 11:05 AM, Ryan Kaldari <rkaldari@wikimedia.org> wrote:

There is nothing stopping us, however, from analysing relative trends using existing data. For example, we could generate graphs showing the relative difference per month in edits by men and women and this data would be unaffected by the unreliability of the absolute numbers (since we would only be looking at changes in the percentages).

Using bad data here is worse than having no data. As Aaron and I recommended when we talked in person, we should not invest is using the gendered language preference data to track overall gender among editors. It's a case of garbage in, garbage out. Instead, we should be investing in more reliable ways to track gender among the editor population, if it's a metric that we care about.

You can get accurate information from bad or incomplete data. For example, I can measure changes in tide levels without knowing the volume of the ocean. That's all I'm proposing doing here, measuring the change per month. Please take a look at the Trello card for a more complete description of the proposal.

Ryan Kaldari