解决抽词word2003之前版本文档报错问题
2010-12-22 15:59
501 查看
抽词扩展名为.doc的word文档报一系列异常如下:
异常一:
java.lang.IndexOutOfBoundsException: Index: 10, Size: 7
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.poi.hwpf.model.PlexOfCps.getProperty(PlexOfCps.java:70)
at org.apache.poi.hwpf.usermodel.HeaderStories.getAt(HeaderStories.java:155)
at org.apache.poi.hwpf.usermodel.HeaderStories.getFirstHeader(HeaderStories.java:87)
at org.apache.poi.hwpf.extractor.WordExtractor.getHeaderText(WordExtractor.java:178)
at org.apache.poi.hwpf.extractor.WordExtractor.getText(WordExtractor.java:254)
at com.index.extractor.impl.WordFileTextExtractor.getText(WordFileTextExtractor.java:23)
at test.TextConvert.convert(TextConvert.java:147)
at test.TextConvert.getEFiles(TextConvert.java:111)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.go(TextConvert.java:47)
at test.TextConvert.main(TextConvert.java:42)
异常二:
java.lang.ArrayIndexOutOfBoundsException: 218636
at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:45)
at org.apache.poi.hwpf.model.ListLevel.<init>(ListLevel.java:120)
at org.apache.poi.hwpf.model.ListFormatOverrideLevel.<init>(ListFormatOverrideLevel.java:48)
at org.apache.poi.hwpf.model.ListTables.<init>(ListTables.java:88)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:267)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:157)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:62)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:54)
at com.index.extractor.impl.WordFileTextExtractor.getText(WordFileTextExtractor.java:22)
at test.TextConvert.convert(TextConvert.java:147)
at test.TextConvert.getEFiles(TextConvert.java:111)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.go(TextConvert.java:47)
at test.TextConvert.main(TextConvert.java:42)
异常三:
java.lang.NullPointerException
at org.apache.poi.hwpf.sprm.ParagraphSprmUncompressor.uncompressPAP(ParagraphSprmUncompressor.java:47)
at org.apache.poi.hwpf.model.StyleSheet.createPap(StyleSheet.java:241)
at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:116)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:260)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:157)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:62)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:54)
at com.index.extractor.impl.WordFileTextExtractor.getText(WordFileTextExtractor.java:22)
at test.TextConvert.convert(TextConvert.java:147)
at test.TextConvert.getEFiles(TextConvert.java:111)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.go(TextConvert.java:47)
at test.TextConvert.main(TextConvert.java:42)
程序抽词主要调用源代码:
WordExtractor extractor = new WordExtractor(in);
str = extractor.getText();
其实就只有简单的两行代码,但在poi3.7源代码中发现一个类 Word6Extractor,其注释说明如下:
Class to extract the text from old (Word 6 / Word 95) Word Documents.
*
* This should only be used on the older files, for most uses you
* should call {@link WordExtractor} which deals properly
* with HWPF.
修改代码为:
Word6Extractor extractor = new Word6Extractor(in);
str = extractor.getText();
再次运行错误都不见了。
异常一:
java.lang.IndexOutOfBoundsException: Index: 10, Size: 7
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.poi.hwpf.model.PlexOfCps.getProperty(PlexOfCps.java:70)
at org.apache.poi.hwpf.usermodel.HeaderStories.getAt(HeaderStories.java:155)
at org.apache.poi.hwpf.usermodel.HeaderStories.getFirstHeader(HeaderStories.java:87)
at org.apache.poi.hwpf.extractor.WordExtractor.getHeaderText(WordExtractor.java:178)
at org.apache.poi.hwpf.extractor.WordExtractor.getText(WordExtractor.java:254)
at com.index.extractor.impl.WordFileTextExtractor.getText(WordFileTextExtractor.java:23)
at test.TextConvert.convert(TextConvert.java:147)
at test.TextConvert.getEFiles(TextConvert.java:111)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.go(TextConvert.java:47)
at test.TextConvert.main(TextConvert.java:42)
异常二:
java.lang.ArrayIndexOutOfBoundsException: 218636
at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:45)
at org.apache.poi.hwpf.model.ListLevel.<init>(ListLevel.java:120)
at org.apache.poi.hwpf.model.ListFormatOverrideLevel.<init>(ListFormatOverrideLevel.java:48)
at org.apache.poi.hwpf.model.ListTables.<init>(ListTables.java:88)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:267)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:157)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:62)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:54)
at com.index.extractor.impl.WordFileTextExtractor.getText(WordFileTextExtractor.java:22)
at test.TextConvert.convert(TextConvert.java:147)
at test.TextConvert.getEFiles(TextConvert.java:111)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.go(TextConvert.java:47)
at test.TextConvert.main(TextConvert.java:42)
异常三:
java.lang.NullPointerException
at org.apache.poi.hwpf.sprm.ParagraphSprmUncompressor.uncompressPAP(ParagraphSprmUncompressor.java:47)
at org.apache.poi.hwpf.model.StyleSheet.createPap(StyleSheet.java:241)
at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:116)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:260)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:157)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:62)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:54)
at com.index.extractor.impl.WordFileTextExtractor.getText(WordFileTextExtractor.java:22)
at test.TextConvert.convert(TextConvert.java:147)
at test.TextConvert.getEFiles(TextConvert.java:111)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.go(TextConvert.java:47)
at test.TextConvert.main(TextConvert.java:42)
程序抽词主要调用源代码:
WordExtractor extractor = new WordExtractor(in);
str = extractor.getText();
其实就只有简单的两行代码,但在poi3.7源代码中发现一个类 Word6Extractor,其注释说明如下:
Class to extract the text from old (Word 6 / Word 95) Word Documents.
*
* This should only be used on the older files, for most uses you
* should call {@link WordExtractor} which deals properly
* with HWPF.
修改代码为:
Word6Extractor extractor = new Word6Extractor(in);
str = extractor.getText();
再次运行错误都不见了。
相关文章推荐
- 【ubuntu】解决窗口管理器 不支持透明问题(11.04之前版本不支持)
- 【ubuntu】解决窗口管理器 不支持透明问题(11.04之前版本不支持)
- 【ubuntu】解决窗口管理器 不支持透明问题(11.04之前版本不支持)
- 【ubuntu】解决窗口管理器 不支持透明问题(11.04之前版本不支持)
- 使用webView访问https的url-处理SslError解决android2.2版本之前的https请求问题
- iOS应用程序开发——解决iOS7之前版本与之后版本下app启动图片跳动问题
- 由于IE版本问题导致SAP的F1帮助文档显示不出来的解决办法
- vc6.0关于code jock在打开多文档debug版本崩溃的问题解决
- 使用Source Safe for SQL Server解决数据库版本管理问题(转载)
- svn冲突问题详解 SVN版本冲突解决详解
- ie9始终提示文档预览需要最新版本的Flash Player支持的解决方法:
- 解决SharePoint2010打开文档需要重复验证登录问题
- 解决PHP5.6版本“No input file specified”的问题
- VS“当前不会命中断点。源代码与原始版本不同”的问题的有效解决办法
- NAnt0.92版本首次在windows 8.1的机子上运行报错的问题解决
- jquery不支持toggle()高(新)版本的问题解决
- php calender(日历)二个版本代码示例(解决2038问题)
- 用pyenv解决在centos7下多版本python共存问题
- 解决cocos2d-X 2.0版本后创建的Android项目提示org.cocos2dx.lib.Cocos2dxActivity找不到问题
- 解决Jetty下EL版本冲突的问题