removal of control characters (ASCII<32) from XML (RSS) (Technics)

by Alfie ⌂, Vienna, Austria, Thursday, April 25, 2019, 21:06 (1824 days ago) @ Auge

Hi Auge,

While checking my bookmarks in another forum I therein found a posting, where one proposed another way to handle this issue, a regular expression. A reply contained a link to a stack-overflow-thread. This would be, even with the above mentioned exceptions, a much smaller solution but with the cost of a lesser readability.

preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/', '', $input);

The shown expression addresses the chars 0 to 8 (\x00-\x08), 11 (\x0B), 12 (\x0C), 14 to 31 (\x0E-\x1F) and additionally 127 (\x7F) (is that necessary?).

Looks good! Add a comment to the source what this regex does and why – then I like it. I don’t see a case where 127 could turn up.
For readers who don’t know the background: Such characters might be embedded in a PDF. If one copies text from a PDF to a post it’s not a problem in the forum (since these characters are invisible) but screw up the feed.

Alfie (Helmut Schütz)
BEBA-Forum (v1.8β)

Complete thread:

 RSS Feed of thread