Honestly, I disagree with pretty much everything you just said. Even if we assume the bias has remained the same, we still don't understand how it transforms the underlying data, and without that understanding any conclusions you draw will be totally invalid and tell you nothing about the gender gap.


On 25 August 2014 18:10, Ryan Kaldari <rkaldari@wikimedia.org> wrote:
On Mon, Aug 25, 2014 at 4:54 PM, Steven Walling <swalling@wikimedia.org> wrote:

On Mon, Aug 25, 2014 at 12:21 PM, Ryan Kaldari <rkaldari@wikimedia.org> wrote:
You can get accurate information from bad or incomplete data.

The issue is not merely that data are incomplete like your tides example, it's that it's biased in many ways we can't quantify.
Yes, it's biased, but do we have any reason to think that this bias has changed significantly over time? If not, we can still derive some useful information from the dataset. Personally, I doubt that users change their gender setting very often, so even if the information is significantly incorrect, it's probably a relatively constant level of incorrectness. At the very least, it should give us an idea of which direction the gender gap is traveling in – is it increasing, decreasing, or staying relatively constant. I agree we could not draw any definite conclusions from such a graph, but it would at least give us some hints and maybe lead to some more interesting questions. We've had plenty of graphs and datasets in the past that we knew were biased, but we still wanted to look at anyway. If people don't think it's worth having in a dashboard, could we at least do a one time query and see if there's anything interesting in the data?


Analytics mailing list

Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation