Wikimedia has a spam filter. It kills the worst of it. But wikien-l still has a pile THIS HIGH in the queue. So I've set the filter to zap messages with a spam score over 7. For comparison, humans seem to score no more than 2. The upside of this is that messages in the queue should be a lot easier to get to quickly.
If you send a message (and you're not on moderation) and think it didn't get through, *check the list archive first*, and if it's not there after a while please forward me a copy so I can try to work out what happened. If you're on Gmail, *check the archive first* because it never shows you messages it thinks you sent. (This is apparently a feature.)
- d.
On 5/12/07, David Gerard dgerard@gmail.com wrote:
Wikimedia has a spam filter. It kills the worst of it. But wikien-l still has a pile THIS HIGH in the queue. So I've set the filter to zap messages with a spam score over 7. For comparison, humans seem to score no more than 2.
Just out of interest, say if I were to initiate a discussion about an article relating to Prozac, mentioned the word Prozac a few times in the mail (as well as once in the header), and gave a link to a site that I thought was the source of the hypothetical spamming that my hypothetical mail was complaining about - how would I score? What if I had a penchant for CAPS and EXCLAMATION MARKS!!!!? Seriously, I'm just curious.
Cormac
On 12/05/07, Cormac Lawler cormaggio@gmail.com wrote:
On 5/12/07, David Gerard dgerard@gmail.com wrote:
Wikimedia has a spam filter. It kills the worst of it. But wikien-l still has a pile THIS HIGH in the queue. So I've set the filter to zap messages with a spam score over 7. For comparison, humans seem to score no more than 2.
Just out of interest, say if I were to initiate a discussion about an article relating to Prozac, mentioned the word Prozac a few times in the mail (as well as once in the header), and gave a link to a site that I thought was the source of the hypothetical spamming that my hypothetical mail was complaining about - how would I score? What if I had a penchant for CAPS and EXCLAMATION MARKS!!!!? Seriously, I'm just curious.
I suppose you could craft a message that was arguably on-topic and also very likely to be eaten by the spam filter, in which case it would deserve to be eaten. And if you tended to do that sort of thing inadvertently, you'd probably be used to having your mail eaten ;-p
- d.
On 5/12/07, David Gerard dgerard@gmail.com wrote:
On 12/05/07, Cormac Lawler cormaggio@gmail.com wrote:
On 5/12/07, David Gerard dgerard@gmail.com wrote:
Wikimedia has a spam filter. It kills the worst of it. But wikien-l still has a pile THIS HIGH in the queue. So I've set the filter to zap messages with a spam score over 7. For comparison, humans seem to score no more than 2.
Just out of interest, say if I were to initiate a discussion about an article relating to Prozac, mentioned the word Prozac a few times in the mail (as well as once in the header), and gave a link to a site that I thought was the source of the hypothetical spamming that my hypothetical mail was complaining about - how would I score? What if I had a penchant for CAPS and EXCLAMATION MARKS!!!!? Seriously, I'm just curious.
I suppose you could craft a message that was arguably on-topic and also very likely to be eaten by the spam filter, in which case it would deserve to be eaten. And if you tended to do that sort of thing inadvertently, you'd probably be used to having your mail eaten ;-p
Right. :-) But I did mean my mail in all (or mostly) seriousness that there could be a legitimate mail that would trigger the spam filter. I was wondering where the line was - though I suppose an answer to my mail might lend itself towards trollspam. ;-)
Cormac
On 12/05/07, Cormac Lawler cormaggio@gmail.com wrote:
On 5/12/07, David Gerard dgerard@gmail.com wrote:
I suppose you could craft a message that was arguably on-topic and also very likely to be eaten by the spam filter, in which case it would deserve to be eaten. And if you tended to do that sort of thing inadvertently, you'd probably be used to having your mail eaten ;-p
Right. :-) But I did mean my mail in all (or mostly) seriousness that there could be a legitimate mail that would trigger the spam filter. I was wondering where the line was - though I suppose an answer to my mail might lend itself towards trollspam. ;-)
With Bayesian filtering on the message body, the sort of mail from humans that I've seen get eaten by Thunderbird is messages from IT recruiters, who seem to write spam natively. GMail eats stuff from Wine-Users, presumably because it talks about the same Windows software that shows up advertised in spam.
The stuff eaten by the new spam rule is usually losing big on rules such as "sent from RBL-listed server", "sent directly from dialup" or "blatantly lies when telling me who it is" as well as Bayesian filtering on the message body.
- d.
David Gerard wrote:
With Bayesian filtering on the message body, the sort of mail from humans that I've seen get eaten by Thunderbird is messages from IT recruiters, who seem to write spam natively.
There were actually a few scams which claim to offer employment. When you give them your details, they use them for identity theft -- applying for credit in your name, for instance. So it's quite possible that the IT recruitment messages you saw were in fact spam.
GMail eats stuff from Wine-Users, presumably because it talks about the same Windows software that shows up advertised in spam.
Thunderbird once identified a long, wordy plaintext email from a university colleague as spam. Presumably it had been trained to recognise rare words as being spammy, following the keyword stuffing trend in HTML and image spam. Server-side filtering is more reliable, in my experience, especially when the filtering software has human care and feeding.
-- Tim Starling
On 13/05/07, Tim Starling tstarling@wikimedia.org wrote:
David Gerard wrote:
With Bayesian filtering on the message body, the sort of mail from humans that I've seen get eaten by Thunderbird is messages from IT recruiters, who seem to write spam natively.
There were actually a few scams which claim to offer employment. When you give them your details, they use them for identity theft -- applying for credit in your name, for instance. So it's quite possible that the IT recruitment messages you saw were in fact spam.
Oh yes - I'm talking about actual email from actual recruiters that I was expecting.
GMail eats stuff from Wine-Users, presumably because it talks about the same Windows software that shows up advertised in spam.
Thunderbird once identified a long, wordy plaintext email from a university colleague as spam. Presumably it had been trained to recognise rare words as being spammy, following the keyword stuffing trend in HTML and image spam. Server-side filtering is more reliable, in my experience, especially when the filtering software has human care and feeding.
wikien-l has had noticeably less spam to wade through. No complaints from humans (User:V1k@gr@) so far.
- d.