判断文件编码
2016-02-09 22:30
302 查看
public static String getCharset(File file) { String charset = "GBK"; byte[] first3Bytes = new byte[3]; try { boolean checked = false; BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file)); bis.mark(0); int read = bis.read(first3Bytes, 0, 3); if (read == -1) return charset; if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) { charset = "UTF-16LE"; checked = true; } else if (first3Bytes[0] == (byte) 0xFE && first3Bytes[1] == (byte) 0xFF) { charset = "UTF-16BE"; checked = true; } else if (first3Bytes[0] == (byte) 0xEF && first3Bytes[1] == (byte) 0xBB && first3Bytes[2] == (byte) 0xBF) { charset = "UTF-8"; checked = true; } bis.reset(); if (!checked) { int loc = 0; while ((read = bis.read()) != -1) { loc++; if (read >= 0xF0) break; if (0x80 <= read && read <= 0xBF) break; if (0xC0 <= read && read <= 0xDF) { read = bis.read(); if (0x80 <= read && read <= 0xBF) continue; else break; } else if (0xE0 <= read && read <= 0xEF) { read = bis.read(); if (0x80 <= read && read <= 0xBF) { read = bis.read(); if (0x80 <= read && read <= 0xBF) { charset = "UTF-8"; break; } else break; } else break; } } } bis.close(); } catch (Exception e) { e.printStackTrace(); } return charset; }
相关文章推荐
- POJ 2479 (动态规划)
- docker(二):构建镜像
- 阿岳之_程序包管理yum&&编译篇
- HTML+CSS笔记 CSS进阶再续
- poj1466
- 路由器与本地回环地址的区别
- java学习--多线程
- atime ctime mtime
- java特种兵读书笔记(5-4)——并发之JDK
- [问题解决]Deepin环境变量设置无效解决
- HDOJ 1027Ignatius and the Princess II(全排列)
- ZOJ3791 An Easy Game(DP)
- PHP常用的文件操作(二)详细版
- PHP 文件读写操作(一)简易版
- JSP的指令元素:page; include; taglib
- 第十八天
- 深入理解JavaScript闭包【译】
- 面试笔试杂项积累-leetcode 201-205
- 对于大规模机器学习的理解和认识
- gif,png,jpg的区别