Efficiency of new anti spam feature (Technics)

by Auge ⌂, Monday, February 11, 2019, 14:31 (1895 days ago) @ Micha


After playing a bit around and feeding the filter with only two similar(!) spam entries [1] the filter detected the second spam posting as that, what it is. Also a third spam entry in russian language was found as spam without a previous training.

I'm curious to see the amount of false positives and negatives and what time and count of words it needs to get stable. I think, especially the different languages are a challenge for the script and the forum operators. What is white and what is black when a forum stores valid entries of different languages and the spam also are carried out in different languages, often overlapping with the languages of the valid entries?

In a first sight it's a nice feature.

How can we provide a dataset of training data for the forum operators (in the light of different languages), so they have not to start at the point 0?

Tschö, Auge

[1]: I copied them from my forum to the development forum.

Trenne niemals Müll, denn er hat nur eine Silbe!

Complete thread:

 RSS Feed of thread