removal of control characters (ASCII<32) from XML (RSS) (Technics)
Once, in the process of further development of MLF1 to version 1.8, Alfie mentioned the need to remove control characters (ASCII < 32) from postings, when they should be included into the the RSS-feed. We developed a function that replaces the chars with nothing (empty string
""). A similar function is since then (Alfie reported the issue on June, 30th 2010) part of MLF2. With the exception of line breaks (
\n) and the replacement of the TAB (in the code:
char(9)) with a whitespace.
While checking my bookmarks in another forum I therein found a posting, where one proposed another way to handle this issue, a regular expression. A reply contained a link to a stack-overflow-thread. This would be, even with the above mentioned exceptions, a much smaller solution but with the cost of a lesser readability.
preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/', '', $input);
The shown expression addresses the chars 0 to 8 (
\x00-\x08), 11 (
\x0B), 12 (
\x0C), 14 to 31 (
\x0E-\x1F) and additionally 127 (
\x7F) (is that necessary?). With a replace of
" " afterwards we could handle the issue within only a few lines of code instead the currently used monster array.
Trenne niemals Müll, denn er hat nur eine Silbe!