removal of control characters (ASCII<32) from XML (RSS) (Technics)
Hello
Once, in the process of further development of MLF1 to version 1.8, Alfie mentioned the need to remove control characters (ASCII < 32) from postings, when they should be included into the the RSS-feed. We developed a function that replaces the chars with nothing (empty string ""
). A similar function is since then (Alfie reported the issue on June, 30th 2010) part of MLF2. With the exception of line breaks (\r\n
, \r
, \n
) and the replacement of the TAB (in the code: char(9)
) with a whitespace.
While checking my bookmarks in another forum I therein found a posting, where one proposed another way to handle this issue, a regular expression. A reply contained a link to a stack-overflow-thread. This would be, even with the above mentioned exceptions, a much smaller solution but with the cost of a lesser readability.
preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/', '', $input);
The shown expression addresses the chars 0 to 8 (\x00-\x08
), 11 (\x0B
), 12 (\x0C
), 14 to 31 (\x0E-\x1F
) and additionally 127 (\x7F
) (is that necessary?). With a replace of char(9)
with " "
afterwards we could handle the issue within only a few lines of code instead the currently used monster array.
Opinions?
Tschö, Auge
--
Trenne niemals Müll, denn er hat nur eine Silbe!