您的位置:首页 > 其它

stringWithContentsOfURL 编码问题

2012-10-01 21:14 411 查看
实现的功能很简单,就像远程抓取www.baidu.com的网页内容,就像在浏览器里view->source看到的内容。

最初的代码:

?
编译运行,网页的内容是抓下来了,可是显示呢,无论是控制台还是textview里显示都是乱码。

既然是乱码,那就改编码吧,先是改成:

?
结果:nil page source

尝试NSUTF8StringEncoding所在定义处其它编码未果。

打印编码列表,几行code搞定,

?
打印如下:

2009-06-08 23:12:44.420 MoviePre[1243:20b] 0: Western (Mac OS Roman) == 0x1e

2009-06-08 23:12:44.421 MoviePre[1243:20b] 1: Japanese (Mac OS) == 0x80000001

2009-06-08 23:12:44.421 MoviePre[1243:20b] 2: Traditional Chinese (Mac OS) == 0x80000002

2009-06-08 23:12:44.422 MoviePre[1243:20b] 3: Korean (Mac OS) == 0x80000003

2009-06-08 23:12:44.422 MoviePre[1243:20b] 4: Arabic (Mac OS) == 0x80000004

2009-06-08 23:12:44.432 MoviePre[1243:20b] 5: Hebrew (Mac OS) == 0x80000005

2009-06-08 23:12:44.433 MoviePre[1243:20b] 6: Greek (Mac OS) == 0x80000006

2009-06-08 23:12:44.433 MoviePre[1243:20b] 7: Cyrillic (Mac OS) == 0x80000007

2009-06-08 23:12:44.436 MoviePre[1243:20b] 8: Devanagari (Mac OS) == 0x80000009

2009-06-08 23:12:44.447 MoviePre[1243:20b] 9: Gurmukhi (Mac OS) == 0x8000000a

2009-06-08 23:12:44.447 MoviePre[1243:20b] 10: Gujarati (Mac OS) == 0x8000000b

2009-06-08 23:12:44.447 MoviePre[1243:20b] 11: Thai (Mac OS) == 0x80000015

2009-06-08 23:12:44.448 MoviePre[1243:20b] 12: Simplified Chinese (Mac OS) == 0x80000019

2009-06-08 23:12:44.448 MoviePre[1243:20b] 13: Tibetan (Mac OS) == 0x8000001a

2009-06-08 23:12:44.452 MoviePre[1243:20b] 14: Central European (Mac OS) == 0x8000001d

2009-06-08 23:12:44.453 MoviePre[1243:20b] 15: Symbol (Mac OS) == 0x6

2009-06-08 23:12:44.455 MoviePre[1243:20b] 16: Dingbats (Mac OS) == 0x80000022

2009-06-08 23:12:44.455 MoviePre[1243:20b] 17: Turkish (Mac OS) == 0x80000023

2009-06-08 23:12:44.456 MoviePre[1243:20b] 18: Croatian (Mac OS) == 0x80000024

2009-06-08 23:12:44.464 MoviePre[1243:20b] 19: Icelandic (Mac OS) == 0x80000025

2009-06-08 23:12:44.467 MoviePre[1243:20b] 20: Romanian (Mac OS) == 0x80000026

2009-06-08 23:12:44.467 MoviePre[1243:20b] 21: Celtic (Mac OS) == 0x80000027

2009-06-08 23:12:44.468 MoviePre[1243:20b] 22: Gaelic (Mac OS) == 0x80000028

2009-06-08 23:12:44.469 MoviePre[1243:20b] 23: Keyboard Symbols (Mac OS) == 0x80000029

2009-06-08 23:12:44.469 MoviePre[1243:20b] 24: Farsi (Mac OS) == 0x8000008c

2009-06-08 23:12:44.470 MoviePre[1243:20b] 25: Cyrillic (Mac OS Ukrainian) == 0x80000098

2009-06-08 23:12:44.470 MoviePre[1243:20b] 26: Inuit (Mac OS) == 0x800000ec

2009-06-08 23:12:44.471 MoviePre[1243:20b] 27: Unicode (UTF-32LE) == 0x9c000100

2009-06-08 23:12:44.471 MoviePre[1243:20b] 28: Unicode (UTF-8) == 0x4

2009-06-08 23:12:44.472 MoviePre[1243:20b] 29: Unicode (UTF-16) == 0xa

2009-06-08 23:12:44.473 MoviePre[1243:20b] 30: Unicode (UTF-16BE) == 0x90000100

2009-06-08 23:12:44.473 MoviePre[1243:20b] 31: Unicode (UTF-16LE) == 0x94000100

2009-06-08 23:12:44.480 MoviePre[1243:20b] 32: Unicode (UTF-32) == 0x8c000100

2009-06-08 23:12:44.480 MoviePre[1243:20b] 33: Unicode (UTF-32BE) == 0x98000100

2009-06-08 23:12:44.481 MoviePre[1243:20b] 34: Western (ISO Latin 1) == 0x5

2009-06-08 23:12:44.481 MoviePre[1243:20b] 35: Central European (ISO Latin 2) == 0x9

2009-06-08 23:12:44.481 MoviePre[1243:20b] 36: Western (ISO Latin 3) == 0x80000203

2009-06-08 23:12:44.482 MoviePre[1243:20b] 37: Central European (ISO Latin 4) == 0x80000204

2009-06-08 23:12:44.493 MoviePre[1243:20b] 38: Cyrillic (ISO 8859-5) == 0x80000205

2009-06-08 23:12:44.493 MoviePre[1243:20b] 39: Arabic (ISO 8859-6) == 0x80000206

2009-06-08 23:12:44.494 MoviePre[1243:20b] 40: Greek (ISO 8859-7) == 0x80000207

2009-06-08 23:12:44.494 MoviePre[1243:20b] 41: Hebrew (ISO 8859-8) == 0x80000208

2009-06-08 23:12:44.495 MoviePre[1243:20b] 42: Turkish (ISO Latin 5) == 0x80000209

2009-06-08 23:12:44.495 MoviePre[1243:20b] 43: Nordic (ISO Latin 6) == 0x8000020a

2009-06-08 23:12:44.506 MoviePre[1243:20b] 44: Thai (ISO 8859-11) == 0x8000020b

2009-06-08 23:12:44.507 MoviePre[1243:20b] 45: Baltic Rim (ISO Latin 7) == 0x8000020d

2009-06-08 23:12:44.510 MoviePre[1243:20b] 46: Celtic (ISO Latin 8) == 0x8000020e

2009-06-08 23:12:44.511 MoviePre[1243:20b] 47: Western (ISO Latin 9) == 0x8000020f

2009-06-08 23:12:44.511 MoviePre[1243:20b] 48: Romanian (ISO Latin 10) == 0x80000210

2009-06-08 23:12:44.512 MoviePre[1243:20b] 49: Latin-US (DOS) == 0x80000400

2009-06-08 23:12:44.512 MoviePre[1243:20b] 50: Greek (DOS) == 0x80000405

2009-06-08 23:12:44.513 MoviePre[1243:20b] 51: Baltic Rim (DOS) == 0x80000406

2009-06-08 23:12:44.513 MoviePre[1243:20b] 52: Western (DOS Latin 1) == 0x80000410

2009-06-08 23:12:44.513 MoviePre[1243:20b] 53: Greek (DOS Greek 1) == 0x80000411

2009-06-08 23:12:44.514 MoviePre[1243:20b] 54: Central European (DOS Latin 2) == 0x80000412

2009-06-08 23:12:44.514 MoviePre[1243:20b] 55: Cyrillic (DOS) == 0x80000413

2009-06-08 23:12:44.514 MoviePre[1243:20b] 56: Turkish (DOS) == 0x80000414

2009-06-08 23:12:44.515 MoviePre[1243:20b] 57: Portuguese (DOS) == 0x80000415

2009-06-08 23:12:44.516 MoviePre[1243:20b] 58: Icelandic (DOS) == 0x80000416

2009-06-08 23:12:44.517 MoviePre[1243:20b] 59: Hebrew (DOS) == 0x80000417

2009-06-08 23:12:44.517 MoviePre[1243:20b] 60: Canadian French (DOS) == 0x80000418

2009-06-08 23:12:44.517 MoviePre[1243:20b] 61: Arabic (DOS) == 0x80000419

2009-06-08 23:12:44.518 MoviePre[1243:20b] 62: Nordic (DOS) == 0x8000041a

2009-06-08 23:12:44.518 MoviePre[1243:20b] 63: Russian (DOS) == 0x8000041b

2009-06-08 23:12:44.519 MoviePre[1243:20b] 64: Greek (DOS Greek 2) == 0x8000041c

2009-06-08 23:12:44.519 MoviePre[1243:20b] 65: Thai (Windows, DOS) == 0x8000041d

2009-06-08 23:12:44.520 MoviePre[1243:20b] 66: Japanese (Windows, DOS) == 0x8

2009-06-08 23:12:44.522 MoviePre[1243:20b] 67: Simplified Chinese (Windows, DOS) == 0x80000421

2009-06-08 23:12:44.522 MoviePre[1243:20b] 68: Korean (Windows, DOS) == 0x80000422

2009-06-08 23:12:44.524 MoviePre[1243:20b] 69: Traditional Chinese (Windows, DOS) == 0x80000423

2009-06-08 23:12:44.524 MoviePre[1243:20b] 70: Western (Windows Latin 1) == 0xc

2009-06-08 23:12:44.525 MoviePre[1243:20b] 71: Central European (Windows Latin 2) == 0xf

2009-06-08 23:12:44.525 MoviePre[1243:20b] 72: Cyrillic (Windows) == 0xb

2009-06-08 23:12:44.525 MoviePre[1243:20b] 73: Greek (Windows) == 0xd

2009-06-08 23:12:44.526 MoviePre[1243:20b] 74: Turkish (Windows Latin 5) == 0xe

2009-06-08 23:12:44.526 MoviePre[1243:20b] 75: Hebrew (Windows) == 0x80000505

2009-06-08 23:12:44.526 MoviePre[1243:20b] 76: Arabic (Windows) == 0x80000506

2009-06-08 23:12:44.527 MoviePre[1243:20b] 77: Baltic Rim (Windows) == 0x80000507

2009-06-08 23:12:44.529 MoviePre[1243:20b] 78: Vietnamese (Windows) == 0x80000508

2009-06-08 23:12:44.531 MoviePre[1243:20b] 79: Western (ASCII) == 0x1

2009-06-08 23:12:44.532 MoviePre[1243:20b] 80: Japanese (Shift JIS X0213) == 0x80000628

2009-06-08 23:12:44.533 MoviePre[1243:20b] 81: Chinese (GBK) == 0x80000631

2009-06-08 23:12:44.534 MoviePre[1243:20b] 82: Chinese (GB 18030) == 0x80000632

2009-06-08 23:12:44.534 MoviePre[1243:20b] 83: Japanese (ISO 2022-JP) == 0x15

2009-06-08 23:12:44.535 MoviePre[1243:20b] 84: Korean (ISO 2022-KR) == 0x80000840

2009-06-08 23:12:44.536 MoviePre[1243:20b] 85: Japanese (EUC) == 0x3

2009-06-08 23:12:44.536 MoviePre[1243:20b] 86: Simplified Chinese (EUC) == 0x80000930

2009-06-08 23:12:44.538 MoviePre[1243:20b] 87: Traditional Chinese (EUC) == 0x80000931

2009-06-08 23:12:44.539 MoviePre[1243:20b] 88: Korean (EUC) == 0x80000940

2009-06-08 23:12:44.540 MoviePre[1243:20b] 89: Japanese (Shift JIS) == 0x80000a01

2009-06-08 23:12:44.541 MoviePre[1243:20b] 90: Cyrillic (KOI8-R) == 0x80000a02

2009-06-08 23:12:44.541 MoviePre[1243:20b] 91: Traditional Chinese (Big 5) == 0x80000a03

2009-06-08 23:12:44.541 MoviePre[1243:20b] 92: Western (Mac Mail) == 0x80000a04

2009-06-08 23:12:44.542 MoviePre[1243:20b] 93: Simplified Chinese (HZ GB 2312) == 0x80000a05

2009-06-08 23:12:44.542 MoviePre[1243:20b] 94: Traditional Chinese (Big 5 HKSCS) == 0x80000a06

2009-06-08 23:12:44.542 MoviePre[1243:20b] 95: Ukrainian (KOI8-U) == 0x80000a08

2009-06-08 23:12:44.546 MoviePre[1243:20b] 96: Traditional Chinese (Big 5-E) == 0x80000a09

2009-06-08 23:12:44.547 MoviePre[1243:20b] 97: Western (NextStep) == 0x2

2009-06-08 23:12:44.547 MoviePre[1243:20b] 98: Non-lossy ASCII == 0x7

2009-06-08 23:12:44.548 MoviePre[1243:20b] 99: Western (EBCDIC US) == 0x80000c01

2009-06-08 23:12:44.548 MoviePre[1243:20b] 100: Western (EBCDIC Latin 1) == 0x80000c02

看到了吧,试一下几个中文编码吧,

最后我用的是第81项,代码如下:

?
无论log还是simulator均显示正常。

真机还未测试。

得到网页内容后随便加几个正则表达式就可以抓到自己想要的内容了:)

From:http://www.cocoachina.com/bbs/read.php?tid-4948.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: