I love SpamAssassin - it saves me from having to delete around 100 spam e-mails per day which appear in my inbox (it’s been almost a decade that my old e-mail address is active, and it probably appears in all major e-mail address packages that spammers buy). However, as it’s the third time I need to empty and re-start the Bayes database from scratch, it seems that the autolearn feature has yet to learn to forget.

It appears that after some while (around 6 months), the autolearn starts to bias the Bayes database towards treating spam as ham, up until the point that wristwatches and elaborate male organ enhancers with widespread names start to have BAYES_50 score.

The only help seems to be clearing the database:

sa-learn --clear

Obviously, the Bayesian filter needs to be re-trained afterwards, which can either happen automatically based on the other filter mechanisms in SpamAssassin (may take a while), or manually (which is a PITA).

I guess that there is some need to forget continuously, so that new training stuff can replace old, errorneous (self) training records.

I’d appreciate any comments on this issue.