On Wed, 21 Sep 2011 11:36:32 PDT, Andrew Farmer said: > Not true - the multibyte sequences in UTF-8 text consist entirely of > high-bit characters (0xC2 - 0xF4 initial, 0x80 - 0xBF continuation). All > characters below 0x80, including ASCII control characters, are always > mapped directly to the corresponding codepoints. Well, if you want to be pedantic about it. ;) OK, they're "nonprintable characters" - which *still* should be filtered out if you're filtering out control characters (if you're tossing a hex 0x17 because it may give software indigestion, you probably should be tossing a 0x97 as well).
Attachment:
pgp3rSTIZLs2y.pgp
Description: PGP signature
_______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/