您的位置：首页 > 编程语言 > Python开发

python 编码问题总结

2016-10-28 23:04 363 查看

1、

简而言之，Python 2.x里字符串有两种：str和Unicode

前者到后者要decode，后者到前者要encode,'utf-8'为例：
str.decode('utf-8') -> Unicode

str <- Unicode.encode('utf-8')

总结：Unicode就像是中间桥梁，utf-8编码，GBK编码都可以decode成unicode编码，而unicode编码可以encode成utf-8。编码其实就分为两类Unicode编码和非Unicode编码，非Unicode包含了uft-8,GBK之类的，utf-8和GBK的转换可以通过unicode来作为中间桥梁，先decode成unicode,再encode成相应的码

print "Type of '中文' is %s" % type('中文')
print "Type of '中文'.decode('utf-8') is %s" % type('中文'.decode('utf-8'))
print "Type of u '中文' is %s" % type(u'中文')
print "Type of u'中文'.encode('utf-8') is %s" % type(u'中文'.encode('utf-8'))
说明：

Type of '中文' is <type 'str'>

Type of '中文'.decode('utf-8') is <type 'unicode'>

Type of u '中文' is <type 'unicode'>

Type of u'中文'.encode('utf-8') is <type 'str'>
2、避免编码问题

建议一、使用字符编码声明，并且同一工程中的所有源代码文件使用相同的字符编码声明

#encoding=utf-8

说明：如果py文件的开头已经使用了#encoding=utf-8，那么就print
就自动将print的字符转成utf-8,

test2 = u'汉字'
print test2

#encoding=utf-8
test2 = u'汉字'
print test2

说明：这样就不会报错,否则乱码
3、读写文件

从目标文件读入，然后decode成unicode码，然后再encode成utf-8码，再存到文件中。

内置的open()方法打开文件时，read()读取的是str,str可以使用GBK,utf-8，读取后需要使用正确的编码格式进行decode()。write()写入时，如果参数是unicode，则需要使用你希望写入的编码进行encode()，如果是其他编码格式的str，则需要先用该str的编码进行decode()，转成unicode后再使用写入的编码进行encode()。如果直接将unicode作为参数传入write()方法，Python将先使用源代码文件声明的字符编码进行编码然后写入。

# coding: UTF-8

f = open('test.txt')
s = f.read()
f.close()
print type(s) # <type 'str'>
# 已知是GBK编码，解码成unicode
u = s.decode('GBK')

f = open('test.txt', 'w')
# 编码成UTF-8编码的str
s = u.encode('UTF-8')
f.write(s)
f.close()

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python 编码 unicode utf-8

相关文章推荐

新的分享

章节导航