Tonight’s goal: Make a simple PHP class.
- Input: a URL pointing to an HTML document.
- Output: a UTF-8 version, regardless of what encoding it’s really in.
Sounds easy, right?
Nope. Because some pages specify encoding via HTTP header, some specify via
meta tag, some specify both but they disagree, and some don’t specify at all. Sometimes, the encoding is specified with an unusual variant of its name (e.g. X-GBK, MS939). And often, the specified encoding is wrong.
But I think I got it, finally.
This is so useful, albeit to a relatively narrow range of programmers, that I feel bad not releasing it to the world, except that I assume that someone else has already done this and I just didn’t bother looking for it. (My experiences with PHP-community code are not good, so I almost always roll my own.) Any interest?