Current version: 1.2.7
Quick Spam Filter (QSF) is an Open Source email classification filter,
designed to be small, fast, and accurate, which works to classify incoming
email as either spam or non-spam.
To recognise spam, QSF strips the text out of the email (using
stripping) and then splits it into tokens (words, word pairs,
and so on). These tokens are then looked up in a database and analysed using
the Bayesian technique to see whether the email should be classified as spam
The database is generated by a process of training - QSF is given
two mailboxes, one containing known spam, and the other containing known
non-spam, to train itself on. After training, if QSF misfiles any
email, the message it got wrong can be fed back into the database, thus
making QSF learn from its mistakes.
For a more in-depth look at the way in which QSF tokenises and classifies
messages, please see the Technical Details
section of the manual.
QSF is designed to be run by an
MDA, such as
procmail. See the FAQ for a quick-start
- 1.2.7 - 28 August 2007
- license change to Artistic 2.0
- pointless option "
- 1.2.6 - 4 February 2007
- bugfix (reversion): removed locking from MySQL as it makes it too slow
- 1.2.5 - 21 January 2007
This version fixes a bug in the "list" backend that caused random token deletions and broke token aging. If you were using the "list" backend, you are recommended to recreate and retrain your databases.
- bugfix: fixed random token deletion in list backend
- bugfix: added table locking to MySQL backend to maintain integrity
- bugfix: fixed MySQL database type autodetection for files
- cleanup: rpmlint fixes to spec file
- 1.2.1 - 25 October 2006
- bugfix: concurrent updates now work properly on all backends
- developers: "make bigbenchmark" generates graph data for gnuplot
- 1.2.0 - 2 October 2006
- new default backend database "list" (better than btree)
- new option "-H" to set value of X-Spam header (Michal Vitecek)
- new option "-P" to keep a plaintext mapping of hashes to tokens
- new option "-y" to use a deny-list as well as an allow-list
- new allow/deny-list syntax "@domain" to list whole domains
Things still to do:
- strip \r, re-add afterwards, if first line is \r\n (Nora Etukudo)
- autotrain option "-u", retrains based on classification
- try stripping double/triple letters, eg "fffinancce" -> "finance"
- token for nonsense META tags (META NAME="blah blah blah")
- support MH/Maildir training folders
- environment variable for -d and -g
- more verbosity, with profiling data
- comma-separated training folders (-T spam1,spam2,spam3 nonspam1,2,3)
- generate a "%age new tokens" score (eg 90% new tokens)
- support for SQLite v3
- look at http://plg.uwaterloo.ca/~trlynam/spamjig/
- allow MySQL Unix socket connections (socket location configurable)
- remove pruning step from training if using 3-column database
- improve efficiency of token deletion from list backend
Any assistance would be appreciated.
This software is OSI Certified Open Source Software.
OSI Certified is a certification mark of the
Open Source Initiative.