[Foundation-l] Foundation-l word cloud
Peter Gehres
in2thats12 at gmail.com
Mon Oct 4 23:32:08 UTC 2010
>
> If it is including quoted passages, a simple way to address this is to
> remove any line starting with '>' and all attachments.
>
That is what I was planning to do. I was referring to it as a problem in
reference to incidence.
I am currently working on a python implementation that strips headers and
quoted passages. One problem I have discovered is that the gzip'd archives
often contain multiple copies of the same message (matching "message-id"s in
the header). I am removing duplicates and the count after this operation
matched the count when viewed online in the archives.
-Peter
More information about the foundation-l
mailing list