David Goodman wrote:
What concerns me is that we might find some FA candidates in there. Can we think of an algorithm? DGG
Well, for one, there's plenty of vandalistic nonsense and copyright violations in the stacks of deleted stuff. The things we lose (and thus harm the project in ways that can't ever be truly measured) are the types that are speedied for spam or notability or because a random administrator deems them too short.
On the notabilty ones alone, depending on who you talk to, between 10% and 40% of those could be poor judgement without any research, and one can easily figure that, truly, all spammy articles could be rewritten to be encyclopedic. That's a lot of articles.
So if you can filter out the vandalism and copyvios, you might actually have a stronger group than what's assumed.
-Jeff