java 使用sourceforge.pinyin4j查询汉字拼音
2014-12-15 14:07
591 查看
在我们的系统中,可能经常需要按首字母排序一些信息(比如淘宝商城的品牌列表字母序排列),那么我们就需要一个能够根据汉字查询对应的拼音,取出拼音的首字母即可。
我们使用sourceforge.pinyin4j开源包来完成我们的功能。
使用很简单:
提供的工具类是下面这个PinyinHelper.java help类,里面有所有开放的API,有几个方法是对应转换成不同的拼音系统,关于拼音系统大家可以查看 http://wenku.baidu.com/view/28dda445b307e87101f696f9.html
[java] view
plaincopy
/**
* This file is part of pinyin4j (http://sourceforge.net/projects/pinyin4j/)
* and distributed under GNU GENERAL PUBLIC LICENSE (GPL).
*
* pinyin4j is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* pinyin4j is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with pinyin4j.
*/
package net.sourceforge.pinyin4j;
import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;
/**
* A class provides several utility functions to convert Chinese characters
* (both Simplified and Tranditional) into various Chinese Romanization
* representations
*
* @author Li Min (xmlerlimin@gmail.com)
*/
public class PinyinHelper
{
/**
* Get all unformmatted Hanyu Pinyin presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* <p>
* For example, <br/> If the input is '间', the return will be an array with
* two Hanyu Pinyin strings: <br/> "jian1" <br/> "jian4" <br/> <br/> If the
* input is '李', the return will be an array with single Hanyu Pinyin
* string: <br/> "li3"
*
* <p>
* <b>Special Note</b>: If the return is "none0", that means the input
* Chinese character exists in Unicode CJK talbe, however, it has no
* pronounciation in Chinese
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Hanyu Pinyin
* presentations with tone numbers; null for non-Chinese character
*
*/
static public String[] toHanyuPinyinStringArray(char ch)
{
return getUnformattedHanyuPinyinStringArray(ch);
}
/**
* Get all Hanyu Pinyin presentations of a single Chinese character (both
* Simplified and Tranditional)
*
* <p>
* For example, <br/> If the input is '间', the return will be an array with
* two Hanyu Pinyin strings: <br/> "jian1" <br/> "jian4" <br/> <br/> If the
* input is '李', the return will be an array with single Hanyu Pinyin
* string: <br/> "li3"
*
* <p>
* <b>Special Note</b>: If the return is "none0", that means the input
* Chinese character is in Unicode CJK talbe, however, it has no
* pronounciation in Chinese
*
* @param ch
* the given Chinese character
* @param outputFormat
* describes the desired format of returned Hanyu Pinyin String
*
* @return a String array contains all Hanyu Pinyin presentations with tone
* numbers; return null for non-Chinese character
*
* @throws BadHanyuPinyinOutputFormatCombination
* if certain combination of output formats happens
*
* @see HanyuPinyinOutputFormat
* @see BadHanyuPinyinOutputFormatCombination
*
*/
static public String[] toHanyuPinyinStringArray(char ch,
HanyuPinyinOutputFormat outputFormat)
throws BadHanyuPinyinOutputFormatCombination
{
return getFormattedHanyuPinyinStringArray(ch, outputFormat);
}
/**
* Return the formatted Hanyu Pinyin representations of the given Chinese
* character (both in Simplified and Tranditional) in array format.
*
* @param ch
* the given Chinese character
* @param outputFormat
* Describes the desired format of returned Hanyu Pinyin string
* @return The formatted Hanyu Pinyin representations of the given codepoint
* in array format; null if no record is found in the hashtable.
*/
static private String[] getFormattedHanyuPinyinStringArray(char ch,
HanyuPinyinOutputFormat outputFormat)
throws BadHanyuPinyinOutputFormatCombination
{
String[] pinyinStrArray = getUnformattedHanyuPinyinStringArray(ch);
if (null != pinyinStrArray)
{
for (int i = 0; i < pinyinStrArray.length; i++)
{
pinyinStrArray[i] = PinyinFormatter.formatHanyuPinyin(pinyinStrArray[i], outputFormat);
}
return pinyinStrArray;
} else
return null;
}
/**
* Delegate function
*
* @param ch
* the given Chinese character
* @return unformatted Hanyu Pinyin strings; null if the record is not found
*/
private static String[] getUnformattedHanyuPinyinStringArray(char ch)
{
return ChineseToPinyinResource.getInstance().getHanyuPinyinStringArray(ch);
}
/**
* Get all unformmatted Tongyong Pinyin presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Tongyong Pinyin
* presentations with tone numbers; null for non-Chinese character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toTongyongPinyinStringArray(char ch)
{
return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.TONGYONG_PINYIN);
}
/**
* Get all unformmatted Wade-Giles presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Wade-Giles presentations
* with tone numbers; null for non-Chinese character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toWadeGilesPinyinStringArray(char ch)
{
return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.WADEGILES_PINYIN);
}
/**
* Get all unformmatted MPS2 (Mandarin Phonetic Symbols 2) presentations of
* a single Chinese character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted MPS2 (Mandarin Phonetic
* Symbols 2) presentations with tone numbers; null for non-Chinese
* character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toMPS2PinyinStringArray(char ch)
{
return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.MPS2_PINYIN);
}
/**
* Get all unformmatted Yale Pinyin presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Yale Pinyin
* presentations with tone numbers; null for non-Chinese character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toYalePinyinStringArray(char ch)
{
return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.YALE_PINYIN);
}
/**
* @param ch
* the given Chinese character
* @param targetPinyinSystem
* indicates target Chinese Romanization system should be
* converted to
* @return string representations of target Chinese Romanization system
* corresponding to the given Chinese character in array format;
* null if error happens
*
* @see PinyinRomanizationType
*/
private static String[] convertToTargetPinyinStringArray(char ch,
PinyinRomanizationType targetPinyinSystem)
{
String[] hanyuPinyinStringArray = getUnformattedHanyuPinyinStringArray(ch);
if (null != hanyuPinyinStringArray)
{
String[] targetPinyinStringArray = new String[hanyuPinyinStringArray.length];
for (int i = 0; i < hanyuPinyinStringArray.length; i++)
{
targetPinyinStringArray[i] = PinyinRomanizationTranslator.convertRomanizationSystem(hanyuPinyinStringArray[i], PinyinRomanizationType.HANYU_PINYIN, targetPinyinSystem);
}
return targetPinyinStringArray;
} else
return null;
}
/**
* Get all unformmatted Gwoyeu Romatzyh presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Gwoyeu Romatzyh
* presentations with tone numbers; null for non-Chinese character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toGwoyeuRomatzyhStringArray(char ch)
{
return convertToGwoyeuRomatzyhStringArray(ch);
}
/**
* @param ch
* the given Chinese character
*
* @return Gwoyeu Romatzyh string representations corresponding to the given
* Chinese character in array format; null if error happens
*
* @see PinyinRomanizationType
*/
private static String[] convertToGwoyeuRomatzyhStringArray(char ch)
{
String[] hanyuPinyinStringArray = getUnformattedHanyuPinyinStringArray(ch);
if (null != hanyuPinyinStringArray)
{
String[] targetPinyinStringArray = new String[hanyuPinyinStringArray.length];
for (int i = 0; i < hanyuPinyinStringArray.length; i++)
{
targetPinyinStringArray[i] = GwoyeuRomatzyhTranslator.convertHanyuPinyinToGwoyeuRomatzyh(hanyuPinyinStringArray[i]);
}
return targetPinyinStringArray;
} else
return null;
}
/**
* Get a string which all Chinese characters are replaced by corresponding
* main (first) Hanyu Pinyin representation.
*
* <p>
* <b>Special Note</b>: If the return contains "none0", that means that
* Chinese character is in Unicode CJK talbe, however, it has not
* pronounciation in Chinese. <b> This interface will be removed in next
* release. </b>
*
* @param str
* A given string contains Chinese characters
* @param outputFormat
* Describes the desired format of returned Hanyu Pinyin string
* @param seperater
* The string is appended after a Chinese character (excluding
* the last Chinese character at the end of sentence). <b>Note!
* Seperater will not appear after a non-Chinese character</b>
* @return a String identical to the original one but all recognizable
* Chinese characters are converted into main (first) Hanyu Pinyin
* representation
*
* @deprecated DO NOT use it again because the first retrived pinyin string
* may be a wrong pronouciation in a certain sentence context.
* <b> This interface will be removed in next release. </b>
*/
static public String toHanyuPinyinString(String str,
HanyuPinyinOutputFormat outputFormat, String seperater)
throws BadHanyuPinyinOutputFormatCombination
{
StringBuffer resultPinyinStrBuf = new StringBuffer();
for (int i = 0; i < str.length(); i++)
{
String mainPinyinStrOfChar = getFirstHanyuPinyinString(str.charAt(i), outputFormat);
if (null != mainPinyinStrOfChar)
{
resultPinyinStrBuf.append(mainPinyinStrOfChar);
if (i != str.length() - 1)
{ // avoid appending at the end
resultPinyinStrBuf.append(seperater);
}
} else
{
resultPinyinStrBuf.append(str.charAt(i));
}
}
return resultPinyinStrBuf.toString();
}
/**
* Get the first Hanyu Pinyin of a Chinese character <b> This function will
* be removed in next release. </b>
*
* @param ch
* The given Unicode character
* @param outputFormat
* Describes the desired format of returned Hanyu Pinyin string
* @return Return the first Hanyu Pinyin of given Chinese character; return
* null if the input is not a Chinese character
*
* @deprecated DO NOT use it again because the first retrived pinyin string
* may be a wrong pronouciation in a certain sentence context.
* <b> This function will be removed in next release. </b>
*/
static private String getFirstHanyuPinyinString(char ch,
HanyuPinyinOutputFormat outputFormat)
throws BadHanyuPinyinOutputFormatCombination
{
String[] pinyinStrArray = getFormattedHanyuPinyinStringArray(ch, outputFormat);
if ((null != pinyinStrArray) && (pinyinStrArray.length > 0))
{
return pinyinStrArray[0];
} else
{
return null;
}
}
// ! Hidden constructor
private PinyinHelper()
{
}
}
拼音系统列表如下:
[java] view
plaincopy
/**
* This file is part of pinyin4j (http://sourceforge.net/projects/pinyin4j/)
* and distributed under GNU GENERAL PUBLIC LICENSE (GPL).
*
* pinyin4j is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* pinyin4j is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with pinyin4j.
*/
/**
*
*/
package net.sourceforge.pinyin4j;
/**
* The class describes variable Chinese Pinyin Romanization System
*
* @author Li Min (xmlerlimin@gmail.com)
*
*/
class PinyinRomanizationType
{
/**
* Hanyu Pinyin system
*/
static final PinyinRomanizationType HANYU_PINYIN = new PinyinRomanizationType("Hanyu");
/**
* Wade-Giles Pinyin system
*/
static final PinyinRomanizationType WADEGILES_PINYIN = new PinyinRomanizationType("Wade");
/**
* Mandarin Phonetic Symbols 2 (MPS2) Pinyin system
*/
static final PinyinRomanizationType MPS2_PINYIN = new PinyinRomanizationType("MPSII");
/**
* Yale Pinyin system
*/
static final PinyinRomanizationType YALE_PINYIN = new PinyinRomanizationType("Yale");
/**
* Tongyong Pinyin system
*/
static final PinyinRomanizationType TONGYONG_PINYIN = new PinyinRomanizationType("Tongyong");
/**
* Gwoyeu Romatzyh system
*/
static final PinyinRomanizationType GWOYEU_ROMATZYH = new PinyinRomanizationType("Gwoyeu");
/**
* Constructor
*/
protected PinyinRomanizationType(String tagName)
{
setTagName(tagName);
}
/**
* @return Returns the tagName.
*/
String getTagName()
{
return tagName;
}
/**
* @param tagName
* The tagName to set.
*/
protected void setTagName(String tagName)
{
this.tagName = tagName;
}
protected String tagName;
}
我们使用的API demo如下:
[java] view
plaincopy
package demo;
import net.sourceforge.pinyin4j.PinyinHelper;
import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;
import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;
import net.sourceforge.pinyin4j.format.HanyuPinyinVCharType;
import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;
public class MyPinyinDemo {
/**
* @param args
* @throws BadHanyuPinyinOutputFormatCombination
*/
public static void main(String[] args) throws BadHanyuPinyinOutputFormatCombination {
char chineseCharacter = "绿".charAt(0);
HanyuPinyinOutputFormat outputFormat = new HanyuPinyinOutputFormat();
outputFormat.setToneType(HanyuPinyinToneType.WITH_TONE_NUMBER); // 输出的声调为数字:第一声为1,第二声为2,第三声为3,第四声为4 如:lu:4
// outputFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE); // 输出拼音不带声调 如:lu:
// outputFormat.setToneType(HanyuPinyinToneType.WITH_TONE_MARK); // 输出声调在拼音字母上 如:lǜ
outputFormat.setVCharType(HanyuPinyinVCharType.WITH_U_AND_COLON); //ǜ的输出格式设置 'ü' 输出为 "u:"
// outputFormat.setVCharType(HanyuPinyinVCharType.WITH_U_UNICODE); //ǜ的输出格式设置 'ü' 输出为 "ü" in Unicode form
// outputFormat.setVCharType(HanyuPinyinVCharType.WITH_V); //ǜ的输出格式设置 'ü' 输出为 "v"
outputFormat.setCaseType(HanyuPinyinCaseType.UPPERCASE); //输出拼音为大写
// outputFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE); //输出拼音为小写
String[] pinyinArray = PinyinHelper.toHanyuPinyinStringArray(chineseCharacter, outputFormat); //汉字拼音
for(String str: pinyinArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String pinyinstr = PinyinHelper.toHanyuPinyinString("绿色", outputFormat, "|");
System.out.println(pinyinstr);
//其他拼音系统的输出
String[] GwoyeuRomatzyhStringArray = PinyinHelper.toGwoyeuRomatzyhStringArray(chineseCharacter);
for(String str: GwoyeuRomatzyhStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String[] MPS2PinyinStringArray = PinyinHelper.toMPS2PinyinStringArray(chineseCharacter);
for(String str: MPS2PinyinStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String[] TongyongPinyinStringArray = PinyinHelper.toTongyongPinyinStringArray(chineseCharacter);
for(String str: TongyongPinyinStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String[] WadeGilesPinyinStringArray = PinyinHelper.toWadeGilesPinyinStringArray(chineseCharacter);
for(String str: WadeGilesPinyinStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String[] YalePinyinStringArray = PinyinHelper.toYalePinyinStringArray(chineseCharacter);
for(String str: YalePinyinStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
}
}
输出:
[html] view
plaincopy
LU:4
LU4
LU:4|SE4
liuh
luh
liu4
lu4
lyu4
lu4
lu:4
lu4
lyu4
lu4
这个拼音包里还自带了一个demo, Pinyin4jAppletDemo.java
至于实现,其实很简单,就是有一个词典,汉字跟拼音的对应关系文件词典,unicode_to_hanyu_pinyin.txt是汉字的unicode字符对应的拼音对应表,pinyin_mapping.xml是汉语拼音系统跟其他系统的对照表,pinyin_Gwoyeu_mapping.xml是汉语系统跟Gwoyeu拼音系统的对照列表。格式参考如下,其实整理完这些之后就很容易实现了。
[html] view
plaincopy
<?xml version="1.0"?>
<pinyin_mapping>
<item>
<Hanyu>a</Hanyu>
<Wade>a</Wade>
<MPSII>a</MPSII>
<Yale>a</Yale>
<Tongyong>a</Tongyong>
</item>
<item>
<Hanyu>ai</Hanyu>
<Wade>ai</Wade>
<MPSII>ai</MPSII>
<Yale>ai</Yale>
<Tongyong>ai</Tongyong>
</item>
[html] view
plaincopy
<pinyin_gwoyeu_mapping>
<item>
<Hanyu>a</Hanyu>
<Gwoyeu_I>a</Gwoyeu_I>
<Gwoyeu_II>ar</Gwoyeu_II>
<Gwoyeu_III>aa</Gwoyeu_III>
<Gwoyeu_IV>ah</Gwoyeu_IV>
<Gwoyeu_V>.a</Gwoyeu_V>
</item>
<item>
<Hanyu>ai</Hanyu>
<Gwoyeu_I>ai</Gwoyeu_I>
<Gwoyeu_II>air</Gwoyeu_II>
<Gwoyeu_III>ae</Gwoyeu_III>
<Gwoyeu_IV>ay</Gwoyeu_IV>
<Gwoyeu_V>.ai</Gwoyeu_V>
</item>
我们使用sourceforge.pinyin4j开源包来完成我们的功能。
使用很简单:
提供的工具类是下面这个PinyinHelper.java help类,里面有所有开放的API,有几个方法是对应转换成不同的拼音系统,关于拼音系统大家可以查看 http://wenku.baidu.com/view/28dda445b307e87101f696f9.html
[java] view
plaincopy
/**
* This file is part of pinyin4j (http://sourceforge.net/projects/pinyin4j/)
* and distributed under GNU GENERAL PUBLIC LICENSE (GPL).
*
* pinyin4j is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* pinyin4j is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with pinyin4j.
*/
package net.sourceforge.pinyin4j;
import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;
/**
* A class provides several utility functions to convert Chinese characters
* (both Simplified and Tranditional) into various Chinese Romanization
* representations
*
* @author Li Min (xmlerlimin@gmail.com)
*/
public class PinyinHelper
{
/**
* Get all unformmatted Hanyu Pinyin presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* <p>
* For example, <br/> If the input is '间', the return will be an array with
* two Hanyu Pinyin strings: <br/> "jian1" <br/> "jian4" <br/> <br/> If the
* input is '李', the return will be an array with single Hanyu Pinyin
* string: <br/> "li3"
*
* <p>
* <b>Special Note</b>: If the return is "none0", that means the input
* Chinese character exists in Unicode CJK talbe, however, it has no
* pronounciation in Chinese
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Hanyu Pinyin
* presentations with tone numbers; null for non-Chinese character
*
*/
static public String[] toHanyuPinyinStringArray(char ch)
{
return getUnformattedHanyuPinyinStringArray(ch);
}
/**
* Get all Hanyu Pinyin presentations of a single Chinese character (both
* Simplified and Tranditional)
*
* <p>
* For example, <br/> If the input is '间', the return will be an array with
* two Hanyu Pinyin strings: <br/> "jian1" <br/> "jian4" <br/> <br/> If the
* input is '李', the return will be an array with single Hanyu Pinyin
* string: <br/> "li3"
*
* <p>
* <b>Special Note</b>: If the return is "none0", that means the input
* Chinese character is in Unicode CJK talbe, however, it has no
* pronounciation in Chinese
*
* @param ch
* the given Chinese character
* @param outputFormat
* describes the desired format of returned Hanyu Pinyin String
*
* @return a String array contains all Hanyu Pinyin presentations with tone
* numbers; return null for non-Chinese character
*
* @throws BadHanyuPinyinOutputFormatCombination
* if certain combination of output formats happens
*
* @see HanyuPinyinOutputFormat
* @see BadHanyuPinyinOutputFormatCombination
*
*/
static public String[] toHanyuPinyinStringArray(char ch,
HanyuPinyinOutputFormat outputFormat)
throws BadHanyuPinyinOutputFormatCombination
{
return getFormattedHanyuPinyinStringArray(ch, outputFormat);
}
/**
* Return the formatted Hanyu Pinyin representations of the given Chinese
* character (both in Simplified and Tranditional) in array format.
*
* @param ch
* the given Chinese character
* @param outputFormat
* Describes the desired format of returned Hanyu Pinyin string
* @return The formatted Hanyu Pinyin representations of the given codepoint
* in array format; null if no record is found in the hashtable.
*/
static private String[] getFormattedHanyuPinyinStringArray(char ch,
HanyuPinyinOutputFormat outputFormat)
throws BadHanyuPinyinOutputFormatCombination
{
String[] pinyinStrArray = getUnformattedHanyuPinyinStringArray(ch);
if (null != pinyinStrArray)
{
for (int i = 0; i < pinyinStrArray.length; i++)
{
pinyinStrArray[i] = PinyinFormatter.formatHanyuPinyin(pinyinStrArray[i], outputFormat);
}
return pinyinStrArray;
} else
return null;
}
/**
* Delegate function
*
* @param ch
* the given Chinese character
* @return unformatted Hanyu Pinyin strings; null if the record is not found
*/
private static String[] getUnformattedHanyuPinyinStringArray(char ch)
{
return ChineseToPinyinResource.getInstance().getHanyuPinyinStringArray(ch);
}
/**
* Get all unformmatted Tongyong Pinyin presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Tongyong Pinyin
* presentations with tone numbers; null for non-Chinese character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toTongyongPinyinStringArray(char ch)
{
return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.TONGYONG_PINYIN);
}
/**
* Get all unformmatted Wade-Giles presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Wade-Giles presentations
* with tone numbers; null for non-Chinese character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toWadeGilesPinyinStringArray(char ch)
{
return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.WADEGILES_PINYIN);
}
/**
* Get all unformmatted MPS2 (Mandarin Phonetic Symbols 2) presentations of
* a single Chinese character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted MPS2 (Mandarin Phonetic
* Symbols 2) presentations with tone numbers; null for non-Chinese
* character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toMPS2PinyinStringArray(char ch)
{
return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.MPS2_PINYIN);
}
/**
* Get all unformmatted Yale Pinyin presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Yale Pinyin
* presentations with tone numbers; null for non-Chinese character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toYalePinyinStringArray(char ch)
{
return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.YALE_PINYIN);
}
/**
* @param ch
* the given Chinese character
* @param targetPinyinSystem
* indicates target Chinese Romanization system should be
* converted to
* @return string representations of target Chinese Romanization system
* corresponding to the given Chinese character in array format;
* null if error happens
*
* @see PinyinRomanizationType
*/
private static String[] convertToTargetPinyinStringArray(char ch,
PinyinRomanizationType targetPinyinSystem)
{
String[] hanyuPinyinStringArray = getUnformattedHanyuPinyinStringArray(ch);
if (null != hanyuPinyinStringArray)
{
String[] targetPinyinStringArray = new String[hanyuPinyinStringArray.length];
for (int i = 0; i < hanyuPinyinStringArray.length; i++)
{
targetPinyinStringArray[i] = PinyinRomanizationTranslator.convertRomanizationSystem(hanyuPinyinStringArray[i], PinyinRomanizationType.HANYU_PINYIN, targetPinyinSystem);
}
return targetPinyinStringArray;
} else
return null;
}
/**
* Get all unformmatted Gwoyeu Romatzyh presentations of a single Chinese
* character (both Simplified and Tranditional)
*
* @param ch
* the given Chinese character
*
* @return a String array contains all unformmatted Gwoyeu Romatzyh
* presentations with tone numbers; null for non-Chinese character
*
* @see #toHanyuPinyinStringArray(char)
*
*/
static public String[] toGwoyeuRomatzyhStringArray(char ch)
{
return convertToGwoyeuRomatzyhStringArray(ch);
}
/**
* @param ch
* the given Chinese character
*
* @return Gwoyeu Romatzyh string representations corresponding to the given
* Chinese character in array format; null if error happens
*
* @see PinyinRomanizationType
*/
private static String[] convertToGwoyeuRomatzyhStringArray(char ch)
{
String[] hanyuPinyinStringArray = getUnformattedHanyuPinyinStringArray(ch);
if (null != hanyuPinyinStringArray)
{
String[] targetPinyinStringArray = new String[hanyuPinyinStringArray.length];
for (int i = 0; i < hanyuPinyinStringArray.length; i++)
{
targetPinyinStringArray[i] = GwoyeuRomatzyhTranslator.convertHanyuPinyinToGwoyeuRomatzyh(hanyuPinyinStringArray[i]);
}
return targetPinyinStringArray;
} else
return null;
}
/**
* Get a string which all Chinese characters are replaced by corresponding
* main (first) Hanyu Pinyin representation.
*
* <p>
* <b>Special Note</b>: If the return contains "none0", that means that
* Chinese character is in Unicode CJK talbe, however, it has not
* pronounciation in Chinese. <b> This interface will be removed in next
* release. </b>
*
* @param str
* A given string contains Chinese characters
* @param outputFormat
* Describes the desired format of returned Hanyu Pinyin string
* @param seperater
* The string is appended after a Chinese character (excluding
* the last Chinese character at the end of sentence). <b>Note!
* Seperater will not appear after a non-Chinese character</b>
* @return a String identical to the original one but all recognizable
* Chinese characters are converted into main (first) Hanyu Pinyin
* representation
*
* @deprecated DO NOT use it again because the first retrived pinyin string
* may be a wrong pronouciation in a certain sentence context.
* <b> This interface will be removed in next release. </b>
*/
static public String toHanyuPinyinString(String str,
HanyuPinyinOutputFormat outputFormat, String seperater)
throws BadHanyuPinyinOutputFormatCombination
{
StringBuffer resultPinyinStrBuf = new StringBuffer();
for (int i = 0; i < str.length(); i++)
{
String mainPinyinStrOfChar = getFirstHanyuPinyinString(str.charAt(i), outputFormat);
if (null != mainPinyinStrOfChar)
{
resultPinyinStrBuf.append(mainPinyinStrOfChar);
if (i != str.length() - 1)
{ // avoid appending at the end
resultPinyinStrBuf.append(seperater);
}
} else
{
resultPinyinStrBuf.append(str.charAt(i));
}
}
return resultPinyinStrBuf.toString();
}
/**
* Get the first Hanyu Pinyin of a Chinese character <b> This function will
* be removed in next release. </b>
*
* @param ch
* The given Unicode character
* @param outputFormat
* Describes the desired format of returned Hanyu Pinyin string
* @return Return the first Hanyu Pinyin of given Chinese character; return
* null if the input is not a Chinese character
*
* @deprecated DO NOT use it again because the first retrived pinyin string
* may be a wrong pronouciation in a certain sentence context.
* <b> This function will be removed in next release. </b>
*/
static private String getFirstHanyuPinyinString(char ch,
HanyuPinyinOutputFormat outputFormat)
throws BadHanyuPinyinOutputFormatCombination
{
String[] pinyinStrArray = getFormattedHanyuPinyinStringArray(ch, outputFormat);
if ((null != pinyinStrArray) && (pinyinStrArray.length > 0))
{
return pinyinStrArray[0];
} else
{
return null;
}
}
// ! Hidden constructor
private PinyinHelper()
{
}
}
拼音系统列表如下:
[java] view
plaincopy
/**
* This file is part of pinyin4j (http://sourceforge.net/projects/pinyin4j/)
* and distributed under GNU GENERAL PUBLIC LICENSE (GPL).
*
* pinyin4j is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* pinyin4j is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with pinyin4j.
*/
/**
*
*/
package net.sourceforge.pinyin4j;
/**
* The class describes variable Chinese Pinyin Romanization System
*
* @author Li Min (xmlerlimin@gmail.com)
*
*/
class PinyinRomanizationType
{
/**
* Hanyu Pinyin system
*/
static final PinyinRomanizationType HANYU_PINYIN = new PinyinRomanizationType("Hanyu");
/**
* Wade-Giles Pinyin system
*/
static final PinyinRomanizationType WADEGILES_PINYIN = new PinyinRomanizationType("Wade");
/**
* Mandarin Phonetic Symbols 2 (MPS2) Pinyin system
*/
static final PinyinRomanizationType MPS2_PINYIN = new PinyinRomanizationType("MPSII");
/**
* Yale Pinyin system
*/
static final PinyinRomanizationType YALE_PINYIN = new PinyinRomanizationType("Yale");
/**
* Tongyong Pinyin system
*/
static final PinyinRomanizationType TONGYONG_PINYIN = new PinyinRomanizationType("Tongyong");
/**
* Gwoyeu Romatzyh system
*/
static final PinyinRomanizationType GWOYEU_ROMATZYH = new PinyinRomanizationType("Gwoyeu");
/**
* Constructor
*/
protected PinyinRomanizationType(String tagName)
{
setTagName(tagName);
}
/**
* @return Returns the tagName.
*/
String getTagName()
{
return tagName;
}
/**
* @param tagName
* The tagName to set.
*/
protected void setTagName(String tagName)
{
this.tagName = tagName;
}
protected String tagName;
}
我们使用的API demo如下:
[java] view
plaincopy
package demo;
import net.sourceforge.pinyin4j.PinyinHelper;
import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;
import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;
import net.sourceforge.pinyin4j.format.HanyuPinyinVCharType;
import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;
public class MyPinyinDemo {
/**
* @param args
* @throws BadHanyuPinyinOutputFormatCombination
*/
public static void main(String[] args) throws BadHanyuPinyinOutputFormatCombination {
char chineseCharacter = "绿".charAt(0);
HanyuPinyinOutputFormat outputFormat = new HanyuPinyinOutputFormat();
outputFormat.setToneType(HanyuPinyinToneType.WITH_TONE_NUMBER); // 输出的声调为数字:第一声为1,第二声为2,第三声为3,第四声为4 如:lu:4
// outputFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE); // 输出拼音不带声调 如:lu:
// outputFormat.setToneType(HanyuPinyinToneType.WITH_TONE_MARK); // 输出声调在拼音字母上 如:lǜ
outputFormat.setVCharType(HanyuPinyinVCharType.WITH_U_AND_COLON); //ǜ的输出格式设置 'ü' 输出为 "u:"
// outputFormat.setVCharType(HanyuPinyinVCharType.WITH_U_UNICODE); //ǜ的输出格式设置 'ü' 输出为 "ü" in Unicode form
// outputFormat.setVCharType(HanyuPinyinVCharType.WITH_V); //ǜ的输出格式设置 'ü' 输出为 "v"
outputFormat.setCaseType(HanyuPinyinCaseType.UPPERCASE); //输出拼音为大写
// outputFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE); //输出拼音为小写
String[] pinyinArray = PinyinHelper.toHanyuPinyinStringArray(chineseCharacter, outputFormat); //汉字拼音
for(String str: pinyinArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String pinyinstr = PinyinHelper.toHanyuPinyinString("绿色", outputFormat, "|");
System.out.println(pinyinstr);
//其他拼音系统的输出
String[] GwoyeuRomatzyhStringArray = PinyinHelper.toGwoyeuRomatzyhStringArray(chineseCharacter);
for(String str: GwoyeuRomatzyhStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String[] MPS2PinyinStringArray = PinyinHelper.toMPS2PinyinStringArray(chineseCharacter);
for(String str: MPS2PinyinStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String[] TongyongPinyinStringArray = PinyinHelper.toTongyongPinyinStringArray(chineseCharacter);
for(String str: TongyongPinyinStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String[] WadeGilesPinyinStringArray = PinyinHelper.toWadeGilesPinyinStringArray(chineseCharacter);
for(String str: WadeGilesPinyinStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
String[] YalePinyinStringArray = PinyinHelper.toYalePinyinStringArray(chineseCharacter);
for(String str: YalePinyinStringArray){ //多音字输出,会返回多音字的格式
System.out.println(str);
}
}
}
输出:
[html] view
plaincopy
LU:4
LU4
LU:4|SE4
liuh
luh
liu4
lu4
lyu4
lu4
lu:4
lu4
lyu4
lu4
这个拼音包里还自带了一个demo, Pinyin4jAppletDemo.java
至于实现,其实很简单,就是有一个词典,汉字跟拼音的对应关系文件词典,unicode_to_hanyu_pinyin.txt是汉字的unicode字符对应的拼音对应表,pinyin_mapping.xml是汉语拼音系统跟其他系统的对照表,pinyin_Gwoyeu_mapping.xml是汉语系统跟Gwoyeu拼音系统的对照列表。格式参考如下,其实整理完这些之后就很容易实现了。
[html] view
plaincopy
<?xml version="1.0"?>
<pinyin_mapping>
<item>
<Hanyu>a</Hanyu>
<Wade>a</Wade>
<MPSII>a</MPSII>
<Yale>a</Yale>
<Tongyong>a</Tongyong>
</item>
<item>
<Hanyu>ai</Hanyu>
<Wade>ai</Wade>
<MPSII>ai</MPSII>
<Yale>ai</Yale>
<Tongyong>ai</Tongyong>
</item>
[html] view
plaincopy
<pinyin_gwoyeu_mapping>
<item>
<Hanyu>a</Hanyu>
<Gwoyeu_I>a</Gwoyeu_I>
<Gwoyeu_II>ar</Gwoyeu_II>
<Gwoyeu_III>aa</Gwoyeu_III>
<Gwoyeu_IV>ah</Gwoyeu_IV>
<Gwoyeu_V>.a</Gwoyeu_V>
</item>
<item>
<Hanyu>ai</Hanyu>
<Gwoyeu_I>ai</Gwoyeu_I>
<Gwoyeu_II>air</Gwoyeu_II>
<Gwoyeu_III>ae</Gwoyeu_III>
<Gwoyeu_IV>ay</Gwoyeu_IV>
<Gwoyeu_V>.ai</Gwoyeu_V>
</item>
相关文章推荐
- java 使用sourceforge.pinyin4j查询汉字拼音
- java 使用sourceforge.pinyin4j查询汉字拼音
- java 使用sourceforge.pinyin4j查询汉字拼音
- java使用PinYin4j将汉字转换为拼音
- 如何使用pinyin4j的Java库进行汉字转拼音?
- Java下将汉字转换为拼音的包pinyin4j
- Java下将汉字转换为拼音的包pinyin4j
- 使用Java取得汉字的拼音首字母
- Java下将汉字转换为拼音的包pinyin4j
- Java下将汉字转换为拼音的包pinyin4j
- 使用Java取得汉字的拼音首字母(转)
- 使用Java取得汉字的拼音首字母(转)
- Java汉字转拼音pinyin4j用法
- Java下将汉字转换为拼音的包pinyin4j
- java汉字转拼音,使用pingyin4j
- Java下将汉字转换为拼音的包pinyin4j
- Java下将汉字转换为拼音的包pinyin4j
- Java下将汉字转换为拼音的包pinyin4j
- Java下将汉字转换为拼音的包pinyin4j
- 使用PinYin4J汉字转拼音