【JAVA】将PDF转化成图片
2012-12-11 22:45
399 查看
很多的开源库可以完成这个任务,但是效果都不理想,主要会产生以下问题:
1)分辨率很低,就是最后转化成的图片是缩略图,放大之后看不清楚;
2)不能支持中文,所有中文一律乱码;
3)图片各种混淆:大体是对的,但是个别元素出现混乱,尺寸位置不对等等;
上面说到的三种问题在各种类库中或多或少都存在。
A)实验下来效果最好的是QOPPA的开源库,在免费部分的测试情况下可以得到极为良好的效果。但是它最为不好的就是无条件加了水印,而且每行图片上面还加了网址,导致完全不可用了,而付费部分需要$400,实在买不起,所以虽然效果良好,但是还是不可行的。
B)再看ICEPDF这个开源库,这个库是最差的(也可能是我用的不好)
可以看到上面提到的问题1)和2)它全犯了!
C)再看PDFView这个开源库,转出效果如下:
怎么样,效果其实还可以的,对吧?但是问题是
1)分辨率低;2)背景么的了;
还有一个问题这张图片看不出来,我换一张:
插图无缘无故被放大了,遮住了文字,是问题3)。
D)再看JPEDAL开源库,这个库的普通转换效果如下:
这个不错吧,除了分辨率低,文字,背景全在,但是它也有上面的问题,插入的图片尺寸不对(这里就不展示了)
它最好的地方在于它还提供了高分辨率的转换,如下:
怎么样,够大够气派吧?超高分辨率,清晰的一米,剩下唯一的不好就是图片尺寸问题。
E)还有一个开源库叫PDFBox,还有PDFRender,这两个类库的问题都是分辨率不够高,PDFBox对中文的支持也存在很多问题,让人头疼。
我最后打算采用的方法就是D)的高分辨率转化方法。把测试代码展示如下:
1)转换类:
2)测试类:
其余的代码就不贴了,要用的话网上也有,读者自己搜一下,然后我遇到的最大的问题就是jar包的下载,各种难找。
1)分辨率很低,就是最后转化成的图片是缩略图,放大之后看不清楚;
2)不能支持中文,所有中文一律乱码;
3)图片各种混淆:大体是对的,但是个别元素出现混乱,尺寸位置不对等等;
上面说到的三种问题在各种类库中或多或少都存在。
A)实验下来效果最好的是QOPPA的开源库,在免费部分的测试情况下可以得到极为良好的效果。但是它最为不好的就是无条件加了水印,而且每行图片上面还加了网址,导致完全不可用了,而付费部分需要$400,实在买不起,所以虽然效果良好,但是还是不可行的。
B)再看ICEPDF这个开源库,这个库是最差的(也可能是我用的不好)
可以看到上面提到的问题1)和2)它全犯了!
C)再看PDFView这个开源库,转出效果如下:
怎么样,效果其实还可以的,对吧?但是问题是
1)分辨率低;2)背景么的了;
还有一个问题这张图片看不出来,我换一张:
插图无缘无故被放大了,遮住了文字,是问题3)。
D)再看JPEDAL开源库,这个库的普通转换效果如下:
这个不错吧,除了分辨率低,文字,背景全在,但是它也有上面的问题,插入的图片尺寸不对(这里就不展示了)
它最好的地方在于它还提供了高分辨率的转换,如下:
怎么样,够大够气派吧?超高分辨率,清晰的一米,剩下唯一的不好就是图片尺寸问题。
E)还有一个开源库叫PDFBox,还有PDFRender,这两个类库的问题都是分辨率不够高,PDFBox对中文的支持也存在很多问题,让人头疼。
我最后打算采用的方法就是D)的高分辨率转化方法。把测试代码展示如下:
1)转换类:
/** * Example to convert PDF to highest possible quality images * Also consider org.jpedal.examples.images.ConvertPagesToImages for faster if quality less important * * Useful blog article at http://www.jpedal.org/PDFblog/2009/07/pdf-to-image-quality/ * * It can run from jar directly using the command * * java -cp libraries_needed org/jpedal/examples/images/ConvertPagesToHiResImages pdfFilepath inputValues * * where inputValues is 2 values * * First Parameter: The full path including the name and extension of the target PDF file. * Second Parameter: The output image file type. Choices are tif, jpg and png. * * See also http://www.jpedal.org/javadoc/org/jpedal/constants/JPedalSettings.html for settings to customise */ /** * =========================================== * Java Pdf Extraction Decoding Access Library * =========================================== * * Project Info: http://www.jpedal.org * (C) Copyright 1997-2012, IDRsolutions and Contributors. * * This file is part of JPedal * This source code is copyright IDRSolutions 2012 * * --------------- * ConvertPagesToHiResImages.java * --------------- */ import java.awt.Graphics2D; import java.awt.image.BufferedImage; import java.io.BufferedOutputStream; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStream; import java.util.*; import javax.imageio.IIOImage; import javax.imageio.ImageIO; import javax.imageio.ImageTypeSpecifier; import javax.imageio.metadata.IIOMetadata; import javax.imageio.plugins.jpeg.JPEGImageWriteParam; import javax.imageio.stream.ImageOutputStream; import com.sun.imageio.plugins.jpeg.JPEGImageWriter; import org.jpedal.PdfDecoder; import org.jpedal.color.ColorSpaces; import org.jpedal.exception.PdfException; import org.jpedal.fonts.FontMappings; import org.jpedal.objects.PdfPageData; import org.jpedal.utils.LogWriter; import org.jpedal.io.ColorSpaceConvertor; import org.jpedal.io.JAIHelper; import org.jpedal.constants.JPedalSettings; import org.jpedal.constants.PageInfo; import org.w3c.dom.Element; public class ConvertPagesToHiResImages { private static boolean debug = false; /**correct separator for OS */ final static String separator = System.getProperty( "file.separator" ); //only used if between 0 and 1 private float JPEGcompression=-1f; public ConvertPagesToHiResImages() {} public static void main(String[] args) throws Exception { /* * Change these variables */ String fileType,pdfFile; if(args!=null && args.length>1){ pdfFile = args[0]; fileType= args[1]; //open the file if(pdfFile.toLowerCase().endsWith(".pdf") && (fileType.equals("jpg") || fileType.equals("jpeg") || fileType.equals("png") || fileType.equals("tiff") || fileType.equals("tif"))){ new ConvertPagesToHiResImages(fileType, pdfFile); }else{ //open the directory File testDir=new File(pdfFile); if(testDir.isDirectory()){ /** * get list of files and check directory */ String[] files = null; File inputFiles; /**make sure name ends with a deliminator for correct path later*/ if (!pdfFile.endsWith(separator)) pdfFile = pdfFile + separator; try { inputFiles = new File(pdfFile); files = inputFiles.list(); } catch (Exception ee) { LogWriter.writeLog("Exception trying to access file " + ee.getMessage()); } /**now work through all pdf files*/ for (String file : files) { if (file.toLowerCase().endsWith(".pdf")) new ConvertPagesToHiResImages(fileType, pdfFile + file); } }else System.out.println("The file to be processed has to be a pdf and the output filetype can only be jpg,png or tiff"); } }else{ System.out.println("Not enough arguments passed in! Usage: \"C:\\examples\\1.pdf\" \"jpg\""); } } /** * main constructor to convert PDF to img * @param fileType * @param pdfFile * @throws Exception */ public ConvertPagesToHiResImages(String fileType, String pdfFile) throws Exception { long startTime=System.currentTimeMillis(); String outputPath = pdfFile.substring(0, pdfFile.toLowerCase().indexOf(".pdf")) + separator; File outputPathFile = new File(outputPath); if (!outputPathFile.exists() || !outputPathFile.isDirectory()) { if (!outputPathFile.mkdirs()) { if(debug) System.err.println("Can't create directory " + outputPath); } } //PdfDecoder object provides the conversion final PdfDecoder decoder = new PdfDecoder(true); //mappings for non-embedded fonts to use // FontMappings.setFontReplacements(); decoder.openPdfFile(pdfFile); /** * this process is very flaxible to we create a Map and pass in values to select what sort * of results we want. There is a choice between methods used and image size. Larger images use more * memory and are slower but look better */ Map mapValues = new HashMap(); /** USEFUL OPTIONS*/ //do not scale above this figure mapValues.put(JPedalSettings.EXTRACT_AT_BEST_QUALITY_MAXSCALING, 2); //alternatively secify a page size (aspect ratio preserved so will do best fit) //set a page size (JPedal will put best fit to this) mapValues.put(JPedalSettings.EXTRACT_AT_PAGE_SIZE, new String[]{"2000","1600"}); //which takes priority (default is false) mapValues.put(JPedalSettings.PAGE_SIZE_OVERRIDES_IMAGE, Boolean.TRUE); PdfDecoder.modifyJPedalParameters(mapValues); if(debug) System.out.println("pdf : " + pdfFile); try{ /** * allow output to multiple images with different values on each * * Note we REMOVE shapes as it is a new feature and we do not want to break existing functions */ String separation=System.getProperty("org.jpedal.separation"); if(separation!=null){ Object[] sepValues=new Object[]{7,"",Boolean.FALSE}; //default of normal if(separation.equals("all")){ sepValues=new Object[]{PdfDecoder.RENDERIMAGES,"image_and_shapes",Boolean.FALSE, PdfDecoder.RENDERIMAGES + PdfDecoder.REMOVE_RENDERSHAPES,"image_without_shapes",Boolean.FALSE, PdfDecoder.RENDERTEXT,"text_and_shapes",Boolean.TRUE, 7,"all",Boolean.FALSE, PdfDecoder.RENDERTEXT + PdfDecoder.REMOVE_RENDERSHAPES,"text_without_shapes",Boolean.TRUE }; } int sepCount =sepValues.length; for(int seps=0;seps<sepCount;seps=seps+3){ decoder.setRenderMode((Integer) sepValues[seps]); extractPageAsImage(fileType, outputPath, decoder,"_"+sepValues[seps+1], (Boolean) sepValues[seps + 2]); //boolean makes last transparent so we can see white text } }else //just get the page extractPageAsImage(fileType, outputPath, decoder,"",false); } finally { decoder.closePdfFile(); } System.out.println("time="+(System.currentTimeMillis()-startTime)/1000); } /** * convenience method to get a page as a BufferedImage quickly * - for bulk conversion, use the other methods */ public static BufferedImage getHiresPage(int pageNo, int scaling, String pdfFile){ BufferedImage imageToSave = null; final PdfDecoder decoder = new PdfDecoder(true); try{ //mappings for non-embedded fonts to use // FontMappings.setFontReplacements(); decoder.openPdfFile(pdfFile); PdfPageData pageData = decoder.getPdfPageData(); int width=scaling*pageData.getCropBoxWidth(pageNo); int height=scaling*pageData.getCropBoxHeight(pageNo); Map mapValues = new HashMap(); /** USEFUL OPTIONS*/ //do not scale above this figure mapValues.put(JPedalSettings.EXTRACT_AT_BEST_QUALITY_MAXSCALING, 2); //alternatively secify a page size (aspect ratio preserved so will do best fit) //set a page size (JPedal will put best fit to this) mapValues.put(JPedalSettings.EXTRACT_AT_PAGE_SIZE, new String[]{String.valueOf(width),String.valueOf(height)}); //which takes priority (default is false) mapValues.put(JPedalSettings.PAGE_SIZE_OVERRIDES_IMAGE, Boolean.TRUE); PdfDecoder.modifyJPedalParameters(mapValues); imageToSave = decoder.getPageAsHiRes(pageNo,null,false); } catch (PdfException e) { e.printStackTrace(); } finally { decoder.closePdfFile(); } return imageToSave; } /** * actual conversion of a PDF page into an image * @param fileType * @param outputPath * @param decoder * @param prefix * @param isTransparent * @throws PdfException * @throws IOException */ private void extractPageAsImage(String fileType, String outputPath, PdfDecoder decoder, String prefix, boolean isTransparent) throws PdfException, IOException { //page range int start=1, end=decoder.getPageCount(); //container if the user is creating a multi-image tiff BufferedImage[] multiPages = new BufferedImage[1 + (end-start)]; /** * set of JVM flags which allow user control on process */ //////////////////TIFF OPTIONS///////////////////////////////////////// String multiPageFlag=System.getProperty("org.jpedal.multipage_tiff"); boolean isSingleOutputFile=multiPageFlag!=null && multiPageFlag.toLowerCase().equals("true"); String tiffFlag=System.getProperty("org.jpedal.compress_tiff"); boolean compressTiffs = tiffFlag!=null && tiffFlag.toLowerCase().equals("true"); //////////////////JPEG OPTIONS///////////////////////////////////////// //allow user to specify value String rawJPEGComp=System.getProperty("org.jpedal.compression_jpeg"); if(rawJPEGComp!=null){ try{ JPEGcompression=Float.parseFloat(rawJPEGComp); }catch(Exception e){ e.printStackTrace(); } if(JPEGcompression<0 || JPEGcompression>1) throw new RuntimeException("Invalid value for JPEG compression - must be between 0 and 1"); } String jpgFlag=System.getProperty("org.jpedal.jpeg_dpi"); /////////////////////////////////////////////////////////////////////// for (int pageNo = start; pageNo < end+1; pageNo++) { if(debug) System.out.println("page : " + pageNo); /** * example1 - ask JPedal to return from decoding if file takes too long (time is in millis) * will reset after exit so call for each page */ //decoder.setPageDecodeStatus(DecodeStatus.Timeout,new Integer(20) ); /** * example2 thread which will ask JPedal to time out and return from decoding * will reset after exit so call for each page */ /** Thread a=new Thread(){ public void run() { while(true){ //simulate 2 second delay try { Thread.sleep(2000); } catch (InterruptedException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } //tell JPedal to exit asap decoder.setPageDecodeStatus(DecodeStatus.Timeout, Boolean.TRUE); } } }; //simulate a second thread a.start(); //see code after line decoder.getPageAsHiRes(pageNo); for tracking whether JPedal timed out and returned /**/ /** * If you are using decoder.getPageAsHiRes() after passing additional parameters into JPedal using the static method * PdfDecoder.modifyJPedalParameters(), then getPageAsHiRes() wont necessarily be thread safe. If you want to use * getPageAsHiRes() and pass in additional parameters, in a thread safe mannor, please use the method * getPageAsHiRes(int pageIndex, Map params) or getPageAsHiRes(int pageIndex, Map params, boolean isTransparent) and * pass the additional parameters in directly to the getPageAsHiRes() method without calling PdfDecoder.modifyJPedalParameters() * first. * * Please see org/jpedal/examples/images/ConvertPagesToImages.java.html for more details on how to use HiRes image conversion */ BufferedImage imageToSave = decoder.getPageAsHiRes(pageNo,null,isTransparent); String imageFormat = System.getProperty("org.jpedal.imageType"); if(imageFormat!=null){ if(isNumber(imageFormat)){ int iFormat = Integer.parseInt(imageFormat); if(iFormat>-1 && iFormat<14){ BufferedImage tempImage = new BufferedImage(imageToSave.getWidth(), imageToSave.getHeight(), iFormat); Graphics2D g = tempImage.createGraphics(); g.drawImage(imageToSave, null, null); imageToSave = tempImage; }else{ System.err.println("Image Type is not valid. Value should be a digit between 0 - 13 based on the BufferedImage TYPE variables."); } }else{ System.err.println("Image Type provided is not an Integer. Value should be a digit between 0 - 13 based on the BufferedImage TYPE variables."); } } //show status flag /** if(decoder.getPageDecodeStatus(DecodeStatus.Timeout)) System.out.println("Timeout on decoding"); else System.out.println("Done"); /**/ decoder.flushObjectValues(true); //System.out.println("w="+imageToSave.getWidth()+" h="+imageToSave.getHeight()); //image needs to be sRGB for JPEG if(fileType.equals("jpg")) imageToSave = ColorSpaceConvertor.convertToRGB(imageToSave); String outputFileName; if(isSingleOutputFile) outputFileName = outputPath+ "allPages"+prefix+ '.' + fileType; else{ /** * create a name with zeros for if more than 9 pages appears in correct order */ String pageAsString=String.valueOf(pageNo); String maxPageSize=String.valueOf(end); int padding=maxPageSize.length()-pageAsString.length(); for(int ii=0;ii<padding;ii++) pageAsString='0'+pageAsString; outputFileName = outputPath + "page" + pageAsString +prefix + '.' + fileType; } //if just gray we can reduce memory usage by converting image to Grayscale /** * see what Colorspaces used and reduce image if appropriate * (only does Gray at present) * * Can return null value if not sure */ Iterator colorspacesUsed=decoder.getPageInfo(PageInfo.COLORSPACES); int nextID; boolean isGrayOnly=colorspacesUsed!=null; //assume true and disprove while(colorspacesUsed!=null && colorspacesUsed.hasNext()){ nextID= (Integer) (colorspacesUsed.next()); if(nextID!= ColorSpaces.DeviceGray && nextID!=ColorSpaces.CalGray) isGrayOnly=false; } //draw onto GRAY image to reduce colour depth //(converts ARGB to gray) if(isGrayOnly){ BufferedImage image_to_save2=new BufferedImage(imageToSave.getWidth(),imageToSave.getHeight(), BufferedImage.TYPE_BYTE_GRAY); image_to_save2.getGraphics().drawImage(imageToSave,0,0,null); imageToSave = image_to_save2; } //put image in array if multi-images (we save on last page in code below) if(isSingleOutputFile) multiPages[pageNo-start] = imageToSave; //we save the image out here if (imageToSave != null) { /**BufferedImage does not support any dpi concept. A higher dpi can be created * using JAI to convert to a higher dpi image*/ //shrink the page to 50% with graphics2D transformation //- add your own parameters as needed //you may want to replace null with a hints object if you //want to fine tune quality. /** example 1 biliniear scaling AffineTransform scale = new AffineTransform(); scale.scale(.5, .5); //50% as a decimal AffineTransformOp scalingOp =new AffineTransformOp(scale, null); image_to_save =scalingOp.filter(image_to_save, null); */ if(JAIHelper.isJAIused()) JAIHelper.confirmJAIOnClasspath(); if(JAIHelper.isJAIused() && fileType.startsWith("tif")){ com.sun.media.jai.codec.TIFFEncodeParam params = new com.sun.media.jai.codec.TIFFEncodeParam(); if(compressTiffs) params.setCompression(com.sun.media.jai.codec.TIFFEncodeParam.COMPRESSION_PACKBITS); if(!isSingleOutputFile){ BufferedOutputStream os = new BufferedOutputStream(new FileOutputStream(outputFileName)); javax.media.jai.JAI.create("encode", imageToSave, os, "TIFF", params); }else if(isSingleOutputFile && pageNo == end){ OutputStream out = new BufferedOutputStream(new FileOutputStream(outputFileName)); com.sun.media.jai.codec.ImageEncoder encoder = com.sun.media.jai.codec.ImageCodec.createImageEncoder("TIFF", out, params); List vector = new ArrayList(); vector.addAll(Arrays.asList(multiPages).subList(1, multiPages.length)); params.setExtraImages(vector.iterator()); encoder.encode(multiPages[0]); out.close(); } }else if(isSingleOutputFile){ //non-JAI } else if ((jpgFlag != null || rawJPEGComp!=null) && fileType.startsWith("jp") && JAIHelper.isJAIused()) { saveAsJPEG(jpgFlag, imageToSave, JPEGcompression, new BufferedOutputStream(new FileOutputStream(outputFileName))); } else { BufferedOutputStream bos= new BufferedOutputStream(new FileOutputStream(new File(outputFileName))); ImageIO.write(imageToSave, fileType, bos); bos.flush(); bos.close(); } //if you just want to save the image, use something like //javax.imageio.ImageIO.write((java.awt.image.RenderedImage)image_to_save,"png",new java.io.FileOutputStream(output_dir + page + image_name+".png")); } imageToSave.flush(); if(debug){ System.out.println("Created : " + outputFileName); } } } /**test to see if string or number*/ private static boolean isNumber(String value) { //assume true and see if proved wrong boolean isNumber=true; int charCount=value.length(); for(int i=0;i<charCount;i++){ char c=value.charAt(i); if((c<'0')|(c>'9')){ isNumber=false; i=charCount; } } return isNumber; } private static void saveAsJPEG(String jpgFlag,BufferedImage image_to_save, float JPEGcompression, BufferedOutputStream fos) throws IOException { //useful documentation at http://docs.oracle.com/javase/7/docs/api/javax/imageio/metadata/doc-files/jpeg_metadata.html //useful example program at http://johnbokma.com/java/obtaining-image-metadata.html to output JPEG data //old jpeg class //com.sun.image.codec.jpeg.JPEGImageEncoder jpegEncoder = com.sun.image.codec.jpeg.JPEGCodec.createJPEGEncoder(fos); //com.sun.image.codec.jpeg.JPEGEncodeParam jpegEncodeParam = jpegEncoder.getDefaultJPEGEncodeParam(image_to_save); // Image writer JPEGImageWriter imageWriter = (JPEGImageWriter) ImageIO.getImageWritersBySuffix("jpeg").next(); ImageOutputStream ios = ImageIO.createImageOutputStream(fos); imageWriter.setOutput(ios); //and metadata IIOMetadata imageMetaData = imageWriter.getDefaultImageMetadata(new ImageTypeSpecifier(image_to_save), null); if (jpgFlag != null){ int dpi = 96; try { dpi = Integer.parseInt(jpgFlag); } catch (Exception e) { e.printStackTrace(); } //old metadata //jpegEncodeParam.setDensityUnit(com.sun.image.codec.jpeg.JPEGEncodeParam.DENSITY_UNIT_DOTS_INCH); //jpegEncodeParam.setXDensity(dpi); //jpegEncodeParam.setYDensity(dpi); //new metadata Element tree = (Element) imageMetaData.getAsTree("javax_imageio_jpeg_image_1.0"); Element jfif = (Element)tree.getElementsByTagName("app0JFIF").item(0); jfif.setAttribute("Xdensity", Integer.toString(dpi)); jfif.setAttribute("Ydensity", Integer.toString(dpi)); } if(JPEGcompression>=0 && JPEGcompression<=1f){ //old compression //jpegEncodeParam.setQuality(JPEGcompression,false); // new Compression JPEGImageWriteParam jpegParams = (JPEGImageWriteParam) imageWriter.getDefaultWriteParam(); jpegParams.setCompressionMode(JPEGImageWriteParam.MODE_EXPLICIT); jpegParams.setCompressionQuality(JPEGcompression); } //old write and clean //jpegEncoder.encode(image_to_save, jpegEncodeParam); //new Write and clean up imageWriter.write(imageMetaData, new IIOImage(image_to_save, null, null), null); ios.close(); imageWriter.dispose(); } }
2)测试类:
public class Test { /** * @param args */ public static void main(String[] args) { // TODO Auto-generated method stub try { ConvertPagesToHiResImages test = new ConvertPagesToHiResImages("jpg", "/Users/mac/Desktop/pdftest/pdftest.pdf"); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
其余的代码就不贴了,要用的话网上也有,读者自己搜一下,然后我遇到的最大的问题就是jar包的下载,各种难找。
相关文章推荐
- Java 将 PDF 页面转化成 图片
- java 将PDF 转化为图片
- Java实现pdf转化为png图片
- java将pdf,word,excel转成图片
- java利用renderer将pdf按页转换为图片
- java实现PPT转化为PDF
- java使用itext导出pdf,图片、表格、背景图
- Java或其他技术 将html转化成图片?
- java实现图片转化为字符图片
- C#调用exe解决PDF转图片问题(使用PDFBox方案,java语言编译jar包实现)
- Java生成PDF文档(表格、列表、添加图片等)
- Java将HTML转化为PDF+获得页数+合并PDF
- JAVA的pdf转图片方法
- Java PDF 转图片
- Java 将图片组合成PDF文件
- java在线预览网络pdf文件和图片
- java实现MsOffice文档向pdf转化之OpenOffice软件
- Java解析PDF文件(PDFBOX、itext解析PDF)导出PDF中的子图片,去除PDF中的水印
- 用 ghostscript 转化PDF文件为图片 的参数设置!
- java将pdf转换为图片