您的位置:首页 > Web前端 > HTML

Howto Convert PDF files to HTML files

2013-09-26 09:02 387 查看
Translates pdf documents into html format.Translates pdf files
into HTML or XML formats, combined with png images. Supports encrypted
pdf files.There is a program called pdftohtml to convert pdf to html
file.In ubuntu gutsy this package in bundled with poppler-utils so we
need to install this package.
Install poppler-utils in Ubuntu

sudo aptitude install poppler-utils

This will complete the installation

Using pdftohtml

pdftohtml Syntax

pdftohtml [options] [pdf file] [html file]

Available options

A summary of options are included below.

-h, -help - Show summary of options.

-f - first page to print

-l - last page to print

-q - don’t print any messages or errors

-v - print copyright and version info

-p - exchange .pdf links with .html

-c - generate complex output

-i - ignore images

-noframes - generate no frames. Not supported in complex output mode.

-stdout - use standard output

-zoom - zoom the pdf document (default 1.5)

-xml - output for XML post-processing

-enc - output text encoding name

-opw - owner password (for encrypted files)

-upw - user password (for encrypted files)

-hidden - force hidden text extraction

-dev - output device name for Ghostscript (png16m, jpeg etc)

-nomerge - do not merge paragraphs

-nodrm - override document DRM settings

pdftohtml Examples

pdftohtml test.pdf test.html

This command gives you a simple HTML file suitable for reading or
copying the textual content of the PDF file. You can actually grab the
text from your browser and paste it into other applications. It doesn’t
produce any PNG files, so you won’t be able to see any embedded
graphics. It’s a great utility if you just want to extract the text
from an Adobe file.

If you want to see graphics, you’ll need to use the -c (as in “complex”) option:

pdftohtml -c test.pdf test.html

This option produces individual HTML files, one for each page of the
PDF file, with the PNG references mixed in. The graphics in the
original PDF file show up in a browser and the text part can be cut and
pasted. The total size of the HTML and PNG files generated with the -c
option tend to be roughly equivalent to that of the original PDF.

阅读(414) | 评论(0) | 转发(0) |

0
上一篇:没有了

下一篇:2008 Linux开发者研讨会专题 ppt 下载地址

相关热门文章

linux 常见服务端口

【ROOTFS搭建】busybox的httpd...

什么是shell

linux socket的bug??

linux的线程是否受到了保护?...

关于enqueue 的dump 文件帮看...

tar --newer 05/12/2013 这个...

请教nginx代理tomcat作为子目...

LDAP安装 bus error是什么问...

select语句不修改sql,如何优...

给主人留下些什么吧!~~

评论热议
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: