Avatar

Short insight into the spam filtering with the new B8-filter (General)

by Micha ⌂, Saturday, May 14, 2022, 07:40 (717 days ago) @ Auge

Hello,

With flagging ham you mean all entries, that are actually not flagged as spam?

Yes.

I ask, because I am a bit confused by the two options to "report and flag as ham" or to only "flag as ham"? What's the difference?

Take a look to the basic equation. The filter needs both, HAM training data and SPAM training data. If you only flag SPAM postings, it is nothing else then a black list. However, the filter evaluates the words in an entry and estimates the probability that these words are often used in SPAM (or HAM) postings.
So, in my opinion, it is not a good choice to use Akismet in parallel with B8, if you like to train the filter.

If you click to "report and flag as ham", the word list of the postings is stored to the database, and the related HAM counter of each word is increased, cf.

mlf2_b8_wordlist --> `token`, `count_ham`, `count_spam`

Using this table one can evaluate whether a single word (the token) occurs more often in SPAM or HAM entries. Since a single word does not allow for a statistically firm decision, the probability of the whole word list is evaluated.

The option "flag as ham" means, that the posting is flagged as HAM but the word list is not used to train/improve the filter (not stored to mlf2_b8_wordlist).

/Micha

--
applied-geodesy.org - OpenSource Least-Squares Adjustment Software for Geodetic Sciences


Complete thread:

 RSS Feed of thread