On Mon, Aug 25, 2014 at 4:54 PM, Steven Walling swalling@wikimedia.org wrote:
On Mon, Aug 25, 2014 at 12:21 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
You can get accurate information from bad or incomplete data.
The issue is not merely that data are incomplete like your tides example, it's that it's biased in many ways we can't quantify.
Yes, it's biased, but do we have any reason to think that this bias has changed significantly over time? If not, we can still derive some useful information from the dataset. Personally, I doubt that users change their gender setting very often, so even if the information is significantly incorrect, it's probably a relatively constant level of incorrectness. At the very least, it should give us an idea of which direction the gender gap is traveling in – is it increasing, decreasing, or staying relatively constant. I agree we could not draw any definite conclusions from such a graph, but it would at least give us some hints and maybe lead to some more interesting questions. We've had plenty of graphs and datasets in the past that we knew were biased, but we still wanted to look at anyway. If people don't think it's worth having in a dashboard, could we at least do a one time query and see if there's anything interesting in the data?
Kaldari