From: "Toby Dickenson" <toby@tarind.com> Other Articles...

Configuring kmail and Spambayes

See these other notes explaining what is achieved with this kmail configuration.

Folder Configuration

In kmail, create the folder spam. Under there create the folders certain and archive. Set certain to expire old messages after 2 weeks, and archive to expire after 2 months.

If this is your first time using spambayes, you should go through your existing email and move as much spam as possible into the spam/archive folder.

Filter 1

First, a rule to get spambayes to classify all incoming messages. In these instructions the symbol | indicates that information needs to go in seperate fields in the filter configuration gui.

filter criteria:

size in bytes | greater than or equal to | 1

(as of version 3.3.2, kmail doesnt have a better way of saying that a filter applies to all emails. very embarassing.)

filter actions:

pipe through | /usr/local/bin/sb_bnfilter

Untick the 'on manual filtering' tick box.

Untick the 'if this filter matches, stop processing here' tick box.

This rule will pass all incoming messages into the spambayes sb_bnfilter program, which classifies the message and indicates its conclusion by returning the email with serveral extra headers. Subsequent filters will check the content of these new headers. This filter is equivalent to the first line in the standard spambayes procmail filter configuration. If you have other reasons to like procmail then you may want to perform this one step as a procmail rule as described there, and handle the rest in kmail as described here.

Filter 2 and 3

Two rules to isolate spam that scores 99% or over. These emails are certainly spam, and I never want to see them.

filter criteria:

X-Spambayes-Classification | contains | spam; 1.0

filter actions:

mark as | Read

file into folder | spam/certain

Untick the 'on manual filtering' tick box.

Next create an identical filter, but with the '1.0' changed to '0.99'.

Filter 4

A rule to move probable spam into a seperate folder for review.

This rule, and all those above, probably should be the first entries in your kmail filter configuration.

filter criteria:

X-Spambayes-Classification | contains | spam

filter actions:

mark as | Read

file into folder | spam

Untick the 'on manual filtering' tick box.

Retraining

I use this retrain.py script to retrain spambayes. It checks the kmail configuration file to determine all current mail folders, and rebuilds the spambayes training database from scratch. At work (where the workstation rarely gets switched off) I use cron to run this script overnight. At home, I added it to the kde autostart directory.

You will also need to add this entry to your ~/.spambayesrc configuration file. (or, if you dont have this file already, create a new file containg just these lines). retrain.py uses the spambayes' sb_mboxtrain.py tool. This turns off its default behaviour of modifying the files in your kmail mailbox directories:

[Headers]
include_trained = False

Filter 5

This rule isnt really critical, but I think it has saved me from at least one false positive by allowing spambayes to adapt faster than my normal 'overnight' training regime.

For efficiency reasons I do not want to update the spambayes training database after receiving every email (because I receive several high-volume mailing lists) therefore I add this filter rule after those rules that deliver mailing lists into their own folders.

This rule only applies to personal mail delivered to my inbox. All such messages are immediately added to spambayes training data as Ham examples. This helps to correctly classify subsequent messages from new senders or on a new subject. These are messages which might otherwise not have enough in common with all the Ham examples present at the previous training run.

filter criteria:

X-Spambayes-Classification | contains | ham

filter actions:

pipe through | /usr/local/bin/sb_bnfilter -g

Untick the 'on manual filtering' tick box.

Untick the 'if this filter matches, stop processing here' tick box.

Acknowledgements

Thanks to Adrian Dusa for the helpful feedback.