
removal of control characters (ASCII<32) from XML (RSS) (Technics)
Hello
While checking my bookmarks in another forum I therein found a posting, where one proposed a regular expression. A reply contained a link to a stack-overflow-thread.
preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/', '', $input);The shown expression addresses the chars 0 to 8 (
\x00-\x08
), 11 (\x0B
), 12 (\x0C
), 14 to 31 (\x0E-\x1F
) and additionally 127 (\x7F
) (is that necessary?).
Char 127 (\x7F
) is the DEL-key. I don't know, if it would break the RSS-feed.
If not, the resulting function (based on the solution for MLF2, located in includes/functions.inc.php
) would look like this:
/** * filters control characters * * @param string $string * @return string */ function filter_control_characters($string) { # remove the specified control chars (0-8, 11, 12, 14-31) from the string $string = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F]/', '', $string); # replace the control char 9 (TAB) with a space $string = str_replace(chr(9), ' ', $string); # control chars 10 and 13 (\r, \n) remains untouched return $string; }
For the MLF1-forum-special-version of Alfie (functions/funcs.output.php
) it would be:
/** * Strips all control characters from output in case of XML output * * @param string $string * @return string $string */ function outputXMLclearedString($string) { # remove the specified control chars (0-8, 11, 12, 14-31) from the string $string = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F]/', '', $string); # replace the control char 9 (TAB) with a space $string = str_replace(chr(9), ' ', $string); # control chars 10 and 13 (\r, \n) remains untouched return $string; } # End: outputXMLclearedString
So the only substantial difference is the name of the function.
Looks good! Add a comment to the source what this regex does and why – then I like it. I don’t see a case where 127 could turn up.
Char 127 (DEL) was never part of our solutions, so I removed it from the regex.
Tschö, Auge
--
Trenne niemals Müll, denn er hat nur eine Silbe!