Banned Word List - Ban only Exact Word (Technics)

by SDN001, Thursday, February 26, 2015, 00:38 (3319 days ago)

I saw someone ask this once before, but there was no answer. Hopefully someone can help come up with a way to get around the problem.

I added POS to my bad language list on my forum. While there are legitimate uses of this acronym, the context it was being used in by my posters was an abbreviation for another word I had banned.

The problem I encountered is that this banned any use of POS. Words like "possible," "positive" and even "post" were flagged as inappropriate. I had to remove POS from the list because I was getting too many complaints.

Having looked through this forum, it looks like the bad language filter does a regex search on the entire post, so it finds even partial matches like the ones I have above.

How can I just ban "POS" without the variations? And if I can't, how can I search for just "POS" as a single word from the search bar?

Avatar

Proposal for Solution: Banned Word List -Ban only Exact Word

by Micha ⌂, Thursday, February 26, 2015, 13:29 (3318 days ago) @ SDN001

Hi,

Having looked through this forum, it looks like the bad language filter does a regex search on the entire post, so it finds even partial matches like the ones I have above.

At the moment, the function get_not_accepted_words works with strpos.

foreach($not_accepted_words as $not_accepted_word)
     {
      if($not_accepted_word!='' && my_strpos($string, my_strtolower($not_accepted_word, CHARSET), 0, CHARSET)!==false)
       {
        $found_not_accepted_words[] = $not_accepted_word;
       }
     }

Thus, the character sequence POS will be found in position, positive etc. To prevent this characteristic, a regular expression can be used (un-tested):

foreach($not_accepted_words as $not_accepted_word)
     {
      if($not_accepted_word!='' && preg_match("/\\b".$not_accepted_word."\\b/i",$string) 
      #if($not_accepted_word!='' && my_strpos($string, my_strtolower($not_accepted_word, CHARSET), 0, CHARSET)!==false)
       {
        $found_not_accepted_words[] = $not_accepted_word;
       }
     }

The \b is the so-called word boundaries. This flag cater for an exact matching of POS (or even pos because of the i-flag) and ignors words like position, positive etc.

regards
Micha

--
applied-geodesy.org - OpenSource Least-Squares Adjustment Software for Geodetic Sciences

Avatar

Proposal for Solution: Banned Word List -Ban only Exact Word

by Micha ⌂, Thursday, February 26, 2015, 19:11 (3318 days ago) @ Micha

Hi,

I think about the problem and maybe, it is better to allow regular expressions at the bad word list. In this case, you can insert \bPOS\b as exact bad word but also words like ass (without any flags) to found words like asshole etc.

if($not_accepted_word!='' && preg_match("/".$not_accepted_word."/i",$string)

regards
Micha

--
applied-geodesy.org - OpenSource Least-Squares Adjustment Software for Geodetic Sciences

RSS Feed of thread