by Auge, Monday, February 11, 2019, 14:31


After playing a bit around and feeding the filter with only two similar(!) spam entries [1] the filter detected the second spam posting as that, what it is. Also a third spam entry in russian language was found as spam without a previous training.

I'm curious to see the amount of false positives and negatives and what time and count of words it needs to get stable. I think, especially the different languages are a challenge for the script and the forum operators. What is white and what is black when a forum stores valid entries of different languages and the spam also are carried out in different languages, often overlapping with the languages of the valid entries?

In a first sight it's a nice feature.

How can we provide a dataset of training data for the forum operators (in the light of different languages), so they have not to start at the point 0?

