On 13/02/2008, Nathan nawrich@gmail.com wrote:
In order to get really convincing evidence of this type, particularly in the absence of egregious disruptive effect (with the users in question as a _source_ rather than an object) on the community or content, you'd need to get a much larger randomly compiled sample set. You would need to establish baseline rates of correlation on a list (preferably long list) of behavior types. You would then need to establish that these two accounts deviate significantly from the expected rates of correlation associated with these behavior types. I would want to see how many other pairs or groups of accounts demonstrate similar rates of correlation, if any, and whether any of these pairs or groups could be proven as linked or not linked.
Oh dear oh dear. Long list of behavior types? Not good not good at all you've just described a data dredge. No you start by considering the set of behavior types you actually think are likely. In this case that would be zero relation or standard same timezone/work patterns.
That type of analysis is unlikely to be done in this case - not least because of the difficulty in obtaining such information in an acceptable way. I'm not asking you to do a full blown dissertation on this, and I don't think its necessary. My point is that the analysis as its presented does not support the conclusions drawn because the methods used are weak.
Not good. You can't make an absolute statement like that. The closest you could get would be "you have fail to demonstrate a correlation a whatever level".
If the conclusions here are so difficult to support, and none of the accounts have done anything individually to support a community ban, then the ban should not be considered and the matter should be dropped until more evidence comes to light.
Please specify the following:
The confidence level you require (5%? 1%? 0.1%?) Your various null hypotheses (you are currently calling these behavior types)