by Micha ⌂, Monday, February 11, 2019, 15:46 (1841 days ago) @ Auge


But in exapmle here we write mainly in English but also in German. The spam is often in English, Russian or Ukrainian language.

If we never get/got SPAM in German, no German message will flagged as SPAM because no words are classified as SPAM.

Here it is no problem because we use Akismet and we do not store the spam messages.

??? Akismet stored the messages. However, I will remove Akismet due to protection of data privacy (in my forum). For that reason, I need to store the trainings data NOT the spam messages. Maybe it is a misinterpretation: You have to flag the entry. If it is SPAM, you can delete the entry after flagging.

At least in the first view but it may benefit from examples of spam in English language.

No, please read the theory about Bayes. A forum about e.g. flowers does not benefit from a trainings database, where HAM is derived from postings about e.g. animals. Its all about content. ;-) Of course, SPAM maybe the same but HAM isn't and you need both of them. For that reason, it makes not sense to provide a database with general entries - what ever general may be.



