Hi Yasuo,

Yasuo Ohgaki wrote:

utf8_decode() and utf8_encode() are not needed and causing problems

than solving. https://wiki.php.net/rfc/remove_utf_8_decode_encode Proposal Document deprecation them now

Remove them from 7.2 I think only few users are using and they shouldn't have problem using

mbstring/iconv/intl functions. Any comments?

I don't agree with this. utf8_decode() and _encode() are functions which

you probably ought not to use in modern code, and the names are maybe

unhelpful (decode to what? encode from what?). But the job they do is

sometimes needed (if you're dealing with this specific legacy encoding),

and I believe they work correctly. Plus, a lot of existing code uses

them. This seems like a needless deprecation for this reason.

I would propose something else: remove them from the XML extension, and

move them somewhere more fitting, like ext/intl, ext/mbstring or maybe

ext/standard. These are generic functions which work on any text, not

just XML, and their inclusion is mutually superfluous with respect to

XML: if you're decoding XML, you don't necessarily need to convert text

to/from UTF-8, and if you're converting text to/from UTF-8, you don't

necessarily need to deal with XML. Plus, given the names alone, you'd

have no idea they're part of the XML extension.

Also, to avoid confusion, maybe they could be renamed to

iso88591_to_utf8() and utf8_to_iso88591(), with the old names kept as

aliases. I got this idea from this comment:

http://php.net/manual/en/function.utf8-encode.php#104906

Another thing to consider is that the manual perhaps ought to warn the

user that ISO-8859-1 is not Windows-1252. A lot of text on the Internet

marked as the former is actually the latter (thanks to the widespread

use of Windows), and browsers assume this. Windows-1252 contains some

extra printable characters where ISO-8859-1 has control characters, such

as the Euro sign, curly quotes, the trademark sign, and some extra

lengths of dash. So, interpreting Windows-1252 text as ISO-8859-1 will

garble such characters.

Thanks.

--

Andrea Faulds

https://ajf.me/