java 检测字符串中文乱码
2016-02-02 11:40
666 查看
1.检测是否为乱码
2.检查字符是否为中文
3.中文转换编码
Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS : 4E00-9FBF:CJK 统一表意符号
Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS :F900-FAFF:CJK
兼容象形文字 Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
:3400-4DBF:CJK 统一表意符号扩展 A
CJK的意思是“Chinese,Japanese,Korea”的简写 ,实际上就是指中日韩三国的象形文字的Unicode编码
Character.UnicodeBlock.GENERAL_PUNCTUATION :2000-206F:常用标点
Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION :3000-303F:CJK 符号和标点
Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS :FF00-FFEF:半角及全角形式
Character.isLetter(c):判断字符是否是字母
Character.isDigit(c):判断字符是否是数字
public static boolean isMessyCode(String strName) { Pattern p = Pattern.compile("\\s*|\t*|\r*|\n*"); Matcher m = p.matcher(strName); String after = m.replaceAll(""); String temp = after.replaceAll("\\p{P}", ""); char[] ch = temp.trim().toCharArray(); float chLength = 0 ; float count = 0; for (int i = 0; i < ch.length; i++) { char c = ch[i]; if (!Character.isLetterOrDigit(c)) { if (!isChinese(c)) { count = count + 1; } chLength++; } } float result = count / chLength ; if (result > 0.4) { return true; } else { return false; } }
2.检查字符是否为中文
private static boolean isChinese(char c) { Character.UnicodeBlock ub = Character.UnicodeBlock.of(c); if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A || ub == Character.UnicodeBlock.GENERAL_PUNCTUATION || ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION || ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS) { return true; } return false; }
3.中文转换编码
public static String toChinese(String msg){ if(isMessyCode(msg)){ try { return new String(msg.getBytes("ISO8859-1"), "UTF-8"); } catch (Exception e) { } } return msg ; }
Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS : 4E00-9FBF:CJK 统一表意符号
Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS :F900-FAFF:CJK
兼容象形文字 Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
:3400-4DBF:CJK 统一表意符号扩展 A
CJK的意思是“Chinese,Japanese,Korea”的简写 ,实际上就是指中日韩三国的象形文字的Unicode编码
Character.UnicodeBlock.GENERAL_PUNCTUATION :2000-206F:常用标点
Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION :3000-303F:CJK 符号和标点
Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS :FF00-FFEF:半角及全角形式
Character.isLetter(c):判断字符是否是字母
Character.isDigit(c):判断字符是否是数字
相关文章推荐
- struts2 第一天
- java序列化/反序列化之xstream、protobuf、protostuff 的比较与使用例子
- 十:Java之泛型
- 冒泡排序的原理及java代码实现
- SpringData学习笔记
- Java基本类型
- javaweb实现单点登录和统计实时访问量的一点思路
- 浅析Java虚拟机结构与机制
- Eclipse将引用了第三方jar包的Java,Android项目打包成jar文件的两种方法(Fat Jar)
- JAVA8 十大新特性详解
- 关于if else 和 三目运算符的效率问题(java 版)
- MyEclipse Spring 学习总结一 Spring IOC容器
- how spring resolves a request
- Java 重写(Override)与重载(Overload)
- JAVA设计模式初探之桥接模式
- Java 继承
- java中将数组、对象、Map、List转换成JSON数据
- java计算时差
- Java泛型:泛型类、泛型接口和泛型方法
- java semaphore 信号量,流控程序