ivarch.com: Quick Spam Filter

Current version: 1.2.15 (29 March 2021) [src]

Quick Spam Filter (QSF) is an email classification filter, designed to be small, fast, and accurate, which works to classify incoming email as either spam or non-spam.

To recognise spam, QSF strips the text out of the email (using MIME decoding and HTML stripping) and then splits it into tokens (words, word pairs, URLs, and so on). These tokens are then looked up in a database and analysed using the Bayesian technique to see whether the email should be classified as spam or not.

The database is generated by a process of training - QSF is given two mailboxes, one containing known spam, and the other containing known non-spam, to train itself on. After training, if QSF misfiles any email, the message it got wrong can be fed back into the database, thus making QSF learn from its mistakes.

For a more in-depth look at the way in which QSF tokenises and classifies messages, please see the Technical Details section of the manual.

QSF is designed to be run by an MDA, such as procmail. See the FAQ for a quick-start guide.

This software is distributed under the terms of the Artistic License 2.0.

Recent Changes

1.2.15 - 29 March 2021

bugfix: correct exit status of "qsf -t" to 0 if memory limit exceeded (Zhengdao Wang)
bugfix: correct compiler warnings and toupper() misfire (Dr. David Alan Gilbert)
bugfix: report error if "qsf -T" is pointed at directories (Iain Calder)
cleanup: clean up compiler warnings related to unused vars and type mismatches

1.2.11 - 3 January 2015

bugfix: Debian #773546 - report error on malformed message (Jameson Graef Rollins)
bugfix: Debian #651881 - X-Spam-Level corruption on non-ASCII spam (Ian Zimmerman)
bugfix: MD5Final now correctly clears context (patch from David Binderman)
bugfix: removed "DESTDIR" / suffix to fix Cygwin installation
cleanup: mailbox code consolidated into single file
cleanup: moved acknowledgements out of manual page
cleanup: better "rpm" and "srpm" build targets

1.2.7 - 28 August 2007

license change to Artistic 2.0
pointless option "-l" removed

[Show full history]

To Do

Things still to do:

weight as personal if 1 database, use -d/-g for weight if more than 1 (Pavel Kolar)
strip \r, re-add afterwards, if first line is \r\n (Nora Etukudo)
autotrain option "-u", retrains based on classification
try stripping double/triple letters, eg "fffinancce" → "finance"
token for nonsense META tags (META NAME="blah blah blah")
support MH/Maildir training folders
environment variable for -d and -g
more verbosity, with profiling data
comma-separated training folders (-T spam1,spam2,spam3 nonspam1,2,3)
generate a "%age new tokens" score (eg 90% new tokens)
support for SQLite v3
allow MySQL Unix socket connections (socket location configurable)
remove pruning step from training if using 3-column database
improve efficiency of token deletion from list backend

Any assistance would be appreciated.

This project is Free Software. Support the Free Software Foundation.