After playing a bit around and feeding the filter with only two similar(!) spam entries  the filter detected the second spam posting as that, what it is. Also a third spam entry in russian language was found as spam without a previous training.
I'm curious to see the amount of false positives and negatives and what time and count of words it needs to get stable. I think, especially the different languages are a challenge for the script and the forum operators. What is white and what is black when a forum stores valid entries of different languages and the spam also are carried out in different languages, often overlapping with the languages of the valid entries?
In a first sight it's a nice feature.
How can we provide a dataset of training data for the forum operators (in the light of different languages), so they have not to start at the point 0?
: I copied them from my forum to the development forum.
Trenne niemals Müll, denn er hat nur eine Silbe!