removal of control characters (ASCII<32) from XML (RSS) (Technics)
Hi Auge,
While checking my bookmarks in another forum I therein found a posting, where one proposed another way to handle this issue, a regular expression. A reply contained a link to a stack-overflow-thread. This would be, even with the above mentioned exceptions, a much smaller solution but with the cost of a lesser readability.
preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/', '', $input);The shown expression addresses the chars 0 to 8 (
\x00-\x08
), 11 (\x0B
), 12 (\x0C
), 14 to 31 (\x0E-\x1F
) and additionally 127 (\x7F
) (is that necessary?).
Looks good! Add a comment to the source what this regex does and why – then I like it. I don’t see a case where 127 could turn up.
For readers who don’t know the background: Such characters might be embedded in a PDF. If one copies text from a PDF to a post it’s not a problem in the forum (since these characters are invisible) but screw up the feed.
--
Cheers,
Alfie (Helmut Schütz)
BEBA-Forum (v1.8β)