[Foundation-l] Foundation-l word cloud

Peter Gehres in2thats12 at gmail.com
Mon Oct 4 23:32:08 UTC 2010


>
> If it is including quoted passages, a simple way to address this is to
> remove any line starting with '>' and all attachments.
>

That is what I was planning to do.  I was referring to it as a problem in
reference to incidence.

I am currently working on a python implementation that strips headers and
quoted passages.  One problem I have discovered is that the gzip'd archives
often contain multiple copies of the same message (matching "message-id"s in
the header).  I am removing duplicates and the count after this operation
matched the count when viewed online in the archives.

-Peter


More information about the foundation-l mailing list