QSF 1.2.5 released

QSF version 1.2.5 has been released. This version fixes a bug in the new list backend which causes tokens to slowly be randomly deleted on update. This can include the special token that keeps track of token aging, so databases may grow uncontrollably.

Although version 1.2.5 fixes this bug it cannot restore the lost data, so unless you rebuild your databases by retraining them from scratch they will continue to be inaccurate until the newly accumulated data starts outweighing the old.

I had wondered why certain users' databases were getting so large, but just assumed that it was due to the massive volume of email those users' accounts were processing.

These graphs show training and classification times using QSF 1.2.5 with various different backends:

The "training" graph shows how much CPU time it takes to build a new database from a set of emails, displaying CPU time versus the number of emails in the training set. The "classification" graph shows the CPU time it takes to decide whether a certain number of messages are spam or not.

As you can see, the list backend seems to be the quickest, so it's a shame it had this great big bug in it. There are further optimisations still to do - in particular, deleting multiple tokens at once (such as during a pruning cycle) is very inefficient - but they will have to wait, as it isn't critical and I'm more than busy enough.

Categories

Monthly Archives

Pages