Wednesday, April 04, 2012

What if "character != its utf8 encoding" is overengineering?

"You shell not assume anything about the internal representation of characters in Perl" - is a mantra that has been repeated over and over by the Perl pundits for something like a decade. But there are still people who refuse to take that advice and want to peek into the internal representation of characters. What if our sophisticated approach about isolating the 'idea of a character' and its representation is a case of overengineering? People often overreact for past traumas - programming is not an exception - and the conversion from many national 'charsets' to unicode was a big event. Maybe expecting another conversion soon is such an overreaction?

Getting rid of the Latin1 internal encoding does not look like a big price for improving simplicity and getting rid of all these subtle mistakes. I think it is important that the language is understood by its users and if it is not, then maybe, instead of blaming the programmers, we could make it easier to understand? Sure it is nice to have the possibility to change the internal encoding from UTF8 to UTF16 or maybe something completely different in the future - but I have the feeling that this might be case of architecture astronautics.

No comments: