encodings_and_locales
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | encodings_and_locales [2014/04/05 13:28] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ======Encodings and Locales====== | ||
+ | |||
+ | =====Encodings===== | ||
+ | Character encoding is how computer software translates the keys you press into data that is sent between computer programs, and how data from the program is converted back to what you see on the screen. | ||
+ | |||
+ | ====Old Style==== | ||
+ | The "old style" used a model where 1 key press == 1 byte == 1 column of data. But this limits you to 256 values, and so there were many overlapping (incompatible) "code pages", | ||
+ | |||
+ | Example of "old style" encodings: | ||
+ | * CP437 (very common for scripts) | ||
+ | * ISO-8859-1 | ||
+ | * KOI8-R | ||
+ | |||
+ | So a person who wanted to use KOI8-R to talk to Russian friends could not use a script that was encoded in CP437 with box drawing characters -- because those encodings overlapped and were incompatible. | ||
+ | |||
+ | ====New Style==== | ||
+ | The "new style" uses a model where every character maps to a 32 bit integer (Unicode), and then is converted into 1 to 6 bytes which maximizes compatibility with the "old style" for ASCII. (UTF-8). | ||
+ | |||
+ | =====Encodings and Iconv===== | ||
+ | EPIC uses the iconv system to convert between encodings. | ||
+ | |||
+ | iconv --list | ||
+ | |||
+ | |||
+ | =====Locales===== | ||
+ | Now maybe you understand what encoding you are using. | ||
+ | In Unix, this is done with " | ||
+ | |||
+ | You can see the list of locales available on your system with the | ||
+ | locale -a | ||
+ | command. | ||
+ | |||
+ | A locale looks like // | ||
+ | |||
+ | Some examples: | ||
+ | |||
+ | ^ Encoding Name ^ Encoding Explanation ^ | ||
+ | | en_US.ISO8859-15 | English - US - ISO-8859-15 (old style, no box drawing) | | ||
+ | | en_US.UTF-8 | English - US - UTF-8 (new style) | | ||
+ | | fi_FI.ISO8859-1 | Finnish - Finland - ISO-8859-1 (old style) | | ||
+ | | fi_FI.UTF-8 | Finnish - Finland - UTF-8 (new style) | | ||
+ | | ja_JP.SJIS | Japanese - Japan - Shift-JIS (old style) | | ||
+ | | ru_RU.KOI8-R | Russian - Russia - KOI8-R (old style, unix) | | ||
+ | | ru_RU.CP1251 | Russian - Russia - CP1251 (old style, windows) | | ||
+ | | ru_RU.UTF-8 | Russian - Russia - UTF-8 (new style) | | ||
+ | |||
+ | You can set the locale you are using with the LC_ALL environment variable. | ||
+ | |||
+ | | ||
+ | |||
+ | Then every program I run knows that I am using UTF-8 as my character encoding. | ||
+ | |||
+ | Some programs, like GNU Screen have problems with UTF-8. | ||
+ | |||
+ | Some programs, like XTerm, can support either the old or new style, based on menu options. | ||
+ | |||
+ | Your font also plays a role. It's one thing for the software to know what encoding you're using, but if you use an incorrect font for that encoding, you still won't see what you expect. | ||
+ | |||
+ | =====See also===== | ||
+ | This page deals with how YOU tell EPIC what YOU are using. | ||
encodings_and_locales.txt · Last modified: 2014/04/05 13:28 by 127.0.0.1