POI解析DOC,转换为HTML
2013-01-29 10:09
274 查看
环境
poi-3.9代码
public class WordToHtml { private static final String encoding = "UTF-8"; public static String convert2Html(String wordPath) throws FileNotFoundException, TransformerException, IOException, ParserConfigurationException { if( wordPath == null || "".equals(wordPath) ) return ""; File file = new File(wordPath); if( file.exists() && file.isFile() ) return convert2Html(new FileInputStream(file)); else return ""; } public static String convert2Html(InputStream is) throws TransformerException, IOException, ParserConfigurationException { HWPFDocument wordDocument = new HWPFDocument(is); WordToHtmlConverter converter = new WordToHtmlConverter( DocumentBuilderFactory.newInstance().newDocumentBuilder() .newDocument()); // 添加图片前缀,以防图片重复覆盖 SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss"); final String prefix = sdf.format(new Date()); converter.setPicturesManager(new PicturesManager() { public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches, float heightInches) { return prefix + "_" + suggestedName; } }); converter.processDocument(wordDocument); List<Picture> pics = wordDocument.getPicturesTable().getAllPictures(); if (pics != null) { for(Picture pic : pics) { try { pic.writeImageContent(new FileOutputStream( "/" + prefix + "_" + pic.suggestFullFileName())); } catch (FileNotFoundException e) { e.printStackTrace(); } } } StringWriter writer = new StringWriter(); Transformer serializer = TransformerFactory.newInstance().newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, encoding); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform( new DOMSource(converter.getDocument()), new StreamResult(writer) ); writer.close(); return writer.toString(); } }
相关文章推荐
- 文档管理系统 之一 doc、xls、ppt文档转换成html及System.Runtime.InteropServices.COMException (0x80004005): 360的问题
- 文档管理系统 之一 doc、xls、ppt文档转换成html及System.Runtime.InteropServices.COMException (0x80004005): 360的问题
- apache poi操作office文档---- POI Word DOC格式转Html
- java将文档转换成html页面代码 (doc中的图片生成资源文件)
- 使用POI将office(doc/docx/ppt/pptx/xls/xlsx)文件转html格式(附带源码)
- poi将word转换为html (对于word部分格式支持不是很好)
- Java使用poi将word转换为html
- 左右移动转换文字特效HTML代码解析
- 利用POI将word转换成html实现在线阅读
- python学习:HTML转换成doc
- poi word 转html (.DOC .DOCX )
- POI实现DOC/DOCX转HTML
- 使用java框架POI将word转换成html格式
- poi 将excel转换为html的java代码
- 利用POI将word转换成html实现在线阅读
- 读取excel(POI)【转换为html】
- 把Doc文档转换成HTML等其它格式
- POI实现DOC/DOCX转HTML
- 把Doc文档转换成HTML等其它格式
- openoffic+java+spring 多线程 转换doc,ppt,xls -> html/pdf