On Thu, 16 Aug 2012 16:50:27 -0700, Tim Starling <tstarling(a)wikimedia.org>
wrote:
On 17/08/12 04:16, Daniel Friesen wrote:
Of course. While I have the whole idea for the
ui, backend stuff, how
to handle the service, etc... I haven't done the actual
machine-learning stuff before.
I would think that the actual machine learning stuff would be the hard
part. I stopped using Thunderbird's Bayesian spam tagging feature
years ago, when it started sorting emails from smart people in with
the spam. The computer thought that the smart people were using long
words with a similar frequency to the random dictionary words that
padded out the spam messages.
I haven't worked with machine learning either, but I'm guessing it's
not as simple as feeding a pre-tagged data set into a stock Bayesian
filter library.
-- Tim Starling
Yeah, Bayesian is probably too old to use. ClueBot NG appears to be using
an
Abstract Neural Network [ANN] implementation to do it's spam testing.
From the documentation [ClueBot NG] it sounds like one of the trickier
parts
is understanding the WikiText enough to extract the words needed and whanot
out of it.
[ANN]
https://en.wikipedia.org/wiki/Artificial_neural_network
[ClueBot NG]
https://en.wikipedia.org/wiki/User:ClueBot_NG
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]