QSF: Frequently Asked Questions


If you do not see an answer to your question on this page, please try checking the manual, or use the Contact Form.


How is QSF different to other spam filters?

QSF's targets are speed, accuracy, and simplicity. So:

If QSF does not meet your needs, try looking at the resources page for other spam filters, or make a suggestion using the Contact Form.


How do I set up QSF to filter my email?

First, determine where you are going to do your filtering.

Next, work out whether you have procmail installed on the relevant machine. Doing "man procmail" should work if you have it installed.

If you have procmail, then create / edit your ~/.procmailrc file so it contains the following lines:

:0 wf
| qsf -sra

If you do not have procmail, you may have other alternatives such as maildrop. Check with your server administrator.

Update - Jan 2020 - from Anthony Campbell:
For the fdm tool, add this to fdm.conf:

action "spamfilter" rewrite "/usr/bin/qsf -s"
match all action "spamfilter"  continue
match "^Subject:.*SPAM"  action mbox "%h/Mail/spam"

Next, you need to create a new database so QSF can classify your email. To do this, collect as much recent spam as you can into one mail folder (somewhere between 100 and 2000 messages should be enough). Then collect a similar amount of non-spam in another mail folder.

These mail folders should be in mbox format. Email clients such as Mutt use it; it is one of the standard Unix mailbox formats. If, instead, you have your messages as individual files inside a directory, you can use a command line such as the following to put all the messages in DIRECTORY into one mbox file:

find DIRECTORY -type f -exec formail '{}' ';' >> NEW-MBOX-FILE

Next, run QSF in training mode on your two mbox folders:

qsf -T spam-folder non-spam-folder

From now on, any incoming mail that QSF thinks is spam should end up with an X-Spam: YES header and a subject line starting with [SPAM].


Why does initial training take so long?

When training using the -T option, QSF does not just mark all of the messages in the "spam" folder as spam, and all in the "non-spam" folder as non-spam. Instead, it goes through each message in each folder and only changes its database if it "guesses" the message's classification wrongly. Having tried this on every message, it then restarts the process, and keeps doing it until the number of messages it gets wrong falls to an acceptable number.

The reason it is done this way is to avoid overtraining the database. If too many entries are added to the database at once, the database becomes large and inflexible - it becomes more difficult to teach it new things in future.

Although the database format was recently changed to "age" tokens so that overtraining is less of a problem, the initial training process will probably always be done this way to ensure a balanced data set.