Brian wrote:
I just wanted to be really clear about what I mean as a specific counter-example to this just being an example of "reconstructing that rule set." Suppose you use the AbuseFilter rules on the entire history of the wiki in order to generate a dataset of positive and negative examples of vandalism edits. You should then *throw the rules away* and attempt to discover features that separate the vandalism into classes correctly, more or less in the blind.
That's precisely the case where you're attempting to reconstruct the original rule set (or some work-alike). If you had positive and negative examples that were actually "known good" examples of edits that really are vandalism, and really aren't vandalism, then yes you could turn loose an algorithm to generalize over them to discover a discriminator between the "is vandalism" and "isn't vandalism" classes. But if your labels are from the output of the existing AbuseFilter, then your training classes are really "is flagged by the AbuseFilter" and "is not flagged by the AbuseFilter", and any machine-learning algorithm will try to generalize the examples in a way that discriminates *those* classes. To the extent the AbuseFilter actually does flag vandalism accurately, you'll learn a concept approximating that of vandalism. But to the extent it doesn't (e.g. if it systematically mis-labels certain kinds of edits), you'll learn the same flaws.
That might not be useless--- you might recover a more concise rule set that replicates the original performance. But if your training data is the output of the previous rule set, you aren't going to be able to *improve* on its performance without some additional information (or built-in inductive bias).
-Mark