Tip: XML doesn’t like control characters (\x00-\x1F)
https://marco.org/2008/06/04/tip-xml-doesnt-like-control-characters-x00-x1f
Control characters (in the range x00-x1F) aren’t allowed in XML, and most parsers will complain or fail if they’re present.
But they are valid in UTF-8. I’ve been assuming that this is fine to make sure XML’s content is valid (when the text is already supposed to be UTF-8):
$text = iconv('UTF-8', 'UTF-8//IGNORE', $text);
$node->appendChild($dom->createTextNode($text));
That’s not enough. Control characters have to be removed, too:
$text = preg_replace("#[\\x00-\\x1f]#msi", ' ', $text);
$text = iconv('UTF-8', 'UTF-8//IGNORE', $text);
$node->appendChild($dom->createTextNode($text));
I learn some obscure new knowledge every day…