User Tools

Site Tools


why_no_utf8

Why doesn't EPIC support utf8?

A question that comes up frequently is whether epic supports utf8 or not, and if it does not, when will it be supported? The simple answer is that it does not support utf8 because of a lack of expertise at converting programs to use utf8 within the epic community. Therefore, interested contributers are having to learn all about the unicode way of doing things as they go along which is much slower than if someone who had done this before would step in and help us write the code to implement the many design changes

Converting from ascii to unicode is very invasive to a program, and there are important questions to consider when you ask what it really means to support utf8. This is not an exhaustive list but gives you an idea of the size of the effort.

Column Counting

UTF8 breaks from the longstanding tradition that one byte equals one glyph equals one column on the screen. This affects things like column counting, which is important for the input line, and for line wrapping. Much code has to be rewritten for this.

Talking to people who can't do utf8

The historical way of handling national character sets is to use code pages, which map 128 glyphs into code points 128-255. Normally this is handled by the user's terminal emulator so epic has never had to worry about the details. There will always be irc users who aren't using utf8 clients, so it will always be required for the client to support a remote target (channel or user) who can't do utf8. If you exchange messages and you're using utf8 and the other person isn't, then everything will be garbled. It is necessary for the client to be able to convert FROM utf8 TO any other encoding, and vice versa, to really support utf8.

Using utf8 when you don't have a utf8 terminal emulators

Additionally, there will always be epic users who aren't using utf8 terminal emulators. But these users would like to be able to join utf8 channels and have everything Just Work. It is necessary for epic to be able to convert FROM any input encoding TO utf8 and back again for these users.

Scripts, /echo, and backwards compatability

Finally, once you open the door to unicode, you're talking about being able to support any encoding. How will this impact things like scripts? How will the /echo's in your script output if you encode it in utf8 but the person who uses your script doesn't use a utf8 emulator? We see this problem today when people use the default vga code page for linux console, but their scripts look all weird when you use them in a latin-1 font. So there needs to be some way for scripts to convert between encodings.

Summary

I'm not trying to discourage you from thinking that epic will never have proper unicode support, but to help you understand this is not a simple matter and the lack of any outside assistance means the work will be slow and steady, because there is a large amount of code to be written. Eventually it will happen, but the only way to make it happen sooner is to help us write the code or recruit someone who will help us write the code.

why_no_utf8.txt · Last modified: 2008/12/10 19:06 (external edit)