Current version: 1.2.15 (29 March 2021)
[src]
Quick Spam Filter (QSF) is an Open Source email classification filter,
designed to be small, fast, and accurate, which works to classify incoming
email as either spam or non-spam.
To recognise spam, QSF strips the text out of the email (using
MIME decoding
and
HTML
stripping) and then splits it into tokens (words, word pairs,
URLs,
and so on). These tokens are then looked up in a database and analysed using
the Bayesian technique to see whether the email should be classified as spam
or not.
The database is generated by a process of training - QSF is given
two mailboxes, one containing known spam, and the other containing known
non-spam, to train itself on. After training, if QSF misfiles any
email, the message it got wrong can be fed back into the database, thus
making QSF learn from its mistakes.
For a more in-depth look at the way in which QSF tokenises and classifies
messages, please see the Technical Details
section of the manual.
QSF is designed to be run by an
MDA, such as
procmail
. See the FAQ for a quick-start
guide.
This software is distributed under the terms of the
Artistic License 2.0.
Recent Changes
- 1.2.15 - 29 March 2021
-
- bugfix: correct exit status of "
qsf -t
" to 0 if memory limit exceeded (Zhengdao Wang)
- bugfix: correct compiler warnings and
toupper()
misfire (Dr. David Alan Gilbert)
- bugfix: report error if "
qsf -T
" is pointed at directories (Iain Calder)
- cleanup: clean up compiler warnings related to unused vars and type mismatches
- 1.2.11 - 3 January 2015
-
- bugfix: Debian #773546 - report error on malformed message (Jameson Graef Rollins)
- bugfix: Debian #651881 - X-Spam-Level corruption on non-ASCII spam (Ian Zimmerman)
- bugfix: MD5Final now correctly clears context (patch from David Binderman)
- bugfix: removed "DESTDIR" / suffix to fix Cygwin installation
- cleanup: mailbox code consolidated into single file
- cleanup: moved acknowledgements out of manual page
- cleanup: better "rpm" and "srpm" build targets
- 1.2.7 - 28 August 2007
-
- license change to Artistic 2.0
- pointless option "
-l
" removed
[Show full history]
To Do
Things still to do:
- weight as personal if 1 database, use -d/-g for weight if more than 1 (Pavel Kolar)
- strip \r, re-add afterwards, if first line is \r\n (Nora Etukudo)
- autotrain option "-u", retrains based on classification
- try stripping double/triple letters, eg "fffinancce" → "finance"
- token for nonsense META tags (META NAME="blah blah blah")
- support MH/Maildir training folders
- environment variable for -d and -g
- more verbosity, with profiling data
- comma-separated training folders (-T spam1,spam2,spam3 nonspam1,2,3)
- generate a "%age new tokens" score (eg 90% new tokens)
- support for SQLite v3
- allow MySQL Unix socket connections (socket location configurable)
- remove pruning step from training if using 3-column database
- improve efficiency of token deletion from list backend
Any assistance would be appreciated.
This software is OSI Certified Open Source Software.
OSI Certified is a certification mark of the
Open Source Initiative.
|
![[OSI Certification Mark]](../../images/osi-certified-60x50.png) |
|