Use of kmail and Spambayes

Spambayes is a spam filter program which I have found to be very successful for managing spam email at work and for the whole family at home. It is:

  • Easy to use. After initial configuration, there is close to zero effort needed by the user.
  • Effective. I rarely see a spam email.
  • Explainable. It is easy to see how the system works.
  • Scalable. It copes well with receiving several high volume mailing lists.

Spambayes integrates very nicely with kmail, my current preferred email reader. kmail is part of kde. Like many things from kde, there are many different ways to configure spambayes with kmail. This page documents my current preferred configuration.

How Do I Use It?

Through some kmail configuration magic, spambayes helps kmail to file all mail in three different categories:

  1. Good email (Ham) get delivered into a variety of folders, based on mailing lists, topics, and senders. Or into my inbox for anything that doesnt fit those categories. This has nothing to do with spambayes, except so say that I almost never see a spam email in these folders.
  2. Certain Spam. Over 80% of spam emails have no redeeming qualities, and spambayes identifies them as certain spam. I never seen them - they are delivered into the kmail folder certain in the folder group spam, and kmail deletes them after a few weeks.
  3. Probable Spam. Emails which spambayes has classified as probably spam are delivered into the folder spam. I inspect this folder every week in case spambayes has made a mistake - this has happened only once in the last year, so maybe I will stop bothering soon. After inspection, I move all these messages into spam/archive, where they will get deleted by kmail after a few months.

It initially seems strange to keep an archive of spam email - rather than deleting it immediately and save some disk space. This archive is used as part of the spambayes training process to improve its classification accuracy. Spambayes looks at them so that I dont have to. Fortunately this doesnt waste much disk space because most spams are small.

Some spambayes users advocate being selective about this training data. So far I havent found the need to be selective - I let spambayes chew through my entire email archive. Everything in the spam folder (and its sub-folders) are treated as example spam, and everything else as example ham.

Correcting Mistakes

Sometimes spambayes makes a mistake. Most commonly this is because there is conflicting evidence, spambayes decides that it is 'unsure' and (in my current configuration) the email gets delivered into my inbox as normal.

When I see this type of mistake, I either move the spam message into the spam/archive folder or delete it. This avoids confusing spambayes through its training process.

Sometimes I dont notice them, particularly if they came via a high volume mailing list, and they remain in the wrong folder for training. That doesnt seem to matter either.

kmail Filter Configuration in Detail

See these details.