您的位置:首页 > 编程语言 > Python开发

4.2.3 标准编码

2015-11-08 08:40 621 查看
Python内置了很多编码的字符集处理,有些是使用C语言实现,有些是使用字典映射方式实现。下表按名称排序的字符集表,有些名称是可以别的名称的,比如utf-8也可以使用名称utf_8来表查找。CPython实现与其它实现有一些差别,针对一些编码字符集作了优化,如果使用这些字符集之外的字符集可能速度比较慢。优化的字符集:utf-8, utf8, latin-1, latin1, iso-8859-1, mbcs (Windows only), ascii, utf-16, and utf-32。有些字符集支持不同的语言,也有一些独立的字符集。

Codec
Aliases
Languages
ascii
646, us-ascii
English
big5
big5-tw, csbig5
Traditional Chinese
big5hkscs
big5-hkscs, hkscs
Traditional Chinese
cp037
IBM037, IBM039
English
cp273
273, IBM273, csIBM273
German
New in version 3.4.
cp424
EBCDIC-CP-HE, IBM424
Hebrew
cp437
437, IBM437
English
cp500
EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500
Western Europe
cp720
 
Arabic
cp737
 
Greek
cp775
IBM775
Baltic languages
cp850
850, IBM850
Western Europe
cp852
852, IBM852
Central and Eastern Europe
cp855
855, IBM855
Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp856
 
Hebrew
cp857
857, IBM857
Turkish
cp858
858, IBM858
Western Europe
cp860
860, IBM860
Portuguese
cp861
861, CP-IS, IBM861
Icelandic
cp862
862, IBM862
Hebrew
cp863
863, IBM863
Canadian
cp864
IBM864
Arabic
cp865
865, IBM865
Danish, Norwegian
cp866
866, IBM866
Russian
cp869
869, CP-GR, IBM869
Greek
cp874
 
Thai
cp875
 
Greek
cp932
932, ms932, mskanji, ms-kanji
Japanese
cp949
949, ms949, uhc
Korean
cp950
950, ms950
Traditional Chinese
cp1006
 
Urdu
cp1026
ibm1026
Turkish
cp1125
1125, ibm1125, cp866u, ruscii
Ukrainian
New in version 3.4.
cp1140
ibm1140
Western Europe
cp1250
windows-1250
Central and Eastern Europe
cp1251
windows-1251
Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp1252
windows-1252
Western Europe
cp1253
windows-1253
Greek
cp1254
windows-1254
Turkish
cp1255
windows-1255
Hebrew
cp1256
windows-1256
Arabic
cp1257
windows-1257
Baltic languages
cp1258
windows-1258
Vietnamese
cp65001
 
Windows only: Windows UTF-8 (CP_UTF8)
New in version 3.3.
euc_jp
eucjp, ujis, u-jis
Japanese
euc_jis_2004
jisx0213, eucjis2004
Japanese
euc_jisx0213
eucjisx0213
Japanese
euc_kr
euckr, korean, ksc5601, ks_c-5601, ks_c-5601-1987, ksx1001, ks_x-1001
Korean
gb2312
chinese, csiso58gb231280, euc- cn, euccn, eucgb2312-cn, gb2312-1980, gb2312-80, iso- ir-58
Simplified Chinese
gbk
936, cp936, ms936
Unified Chinese
gb18030
gb18030-2000
Unified Chinese
hz
hzgb, hz-gb, hz-gb-2312
Simplified Chinese
iso2022_jp
csiso2022jp, iso2022jp, iso-2022-jp
Japanese
iso2022_jp_1
iso2022jp-1, iso-2022-jp-1
Japanese
iso2022_jp_2
iso2022jp-2, iso-2022-jp-2
Japanese, Korean, Simplified Chinese, Western Europe, Greek
iso2022_jp_2004
iso2022jp-2004, iso-2022-jp-2004
Japanese
iso2022_jp_3
iso2022jp-3, iso-2022-jp-3
Japanese
iso2022_jp_ext
iso2022jp-ext, iso-2022-jp-ext
Japanese
iso2022_kr
csiso2022kr, iso2022kr, iso-2022-kr
Korean
latin_1
iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1
West Europe
iso8859_2
iso-8859-2, latin2, L2
Central and Eastern Europe
iso8859_3
iso-8859-3, latin3, L3
Esperanto, Maltese
iso8859_4
iso-8859-4, latin4, L4
Baltic languages
iso8859_5
iso-8859-5, cyrillic
Bulgarian, Byelorussian, Macedonian, Russian, Serbian
iso8859_6
iso-8859-6, arabic
Arabic
iso8859_7
iso-8859-7, greek, greek8
Greek
iso8859_8
iso-8859-8, hebrew
Hebrew
iso8859_9
iso-8859-9, latin5, L5
Turkish
iso8859_10
iso-8859-10, latin6, L6
Nordic languages
iso8859_13
iso-8859-13, latin7, L7
Baltic languages
iso8859_14
iso-8859-14, latin8, L8
Celtic languages
iso8859_15
iso-8859-15, latin9, L9
Western Europe
iso8859_16
iso-8859-16, latin10, L10
South-Eastern Europe
johab
cp1361, ms1361
Korean
koi8_r
 
Russian
koi8_u
 
Ukrainian
mac_cyrillic
maccyrillic
Bulgarian, Byelorussian, Macedonian, Russian, Serbian
mac_greek
macgreek
Greek
mac_iceland
maciceland
Icelandic
mac_latin2
maclatin2, maccentraleurope
Central and Eastern Europe
mac_roman
macroman, macintosh
Western Europe
mac_turkish
macturkish
Turkish
ptcp154
csptcp154, pt154, cp154, cyrillic-asian
Kazakh
shift_jis
csshiftjis, shiftjis, sjis, s_jis
Japanese
shift_jis_2004
shiftjis2004, sjis_2004, sjis2004
Japanese
shift_jisx0213
shiftjisx0213, sjisx0213, s_jisx0213
Japanese
utf_32
U32, utf32
all languages
utf_32_be
UTF-32BE
all languages
utf_32_le
UTF-32LE
all languages
utf_16
U16, utf16
all languages
utf_16_be
UTF-16BE
all languages
utf_16_le
UTF-16LE
all languages
utf_7
U7, unicode-1-1-utf-7
all languages
utf_8
U8, UTF, utf8
all languages
utf_8_sig
 
all languages
蔡军生 QQ:9073204 深圳
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python milang unicode 编码