选择压缩算法的经历 (by quqi99)
2007-08-03 14:11
211 查看
选择压缩算法的经历 (by quqi99)
作者:张华 发表于:2007-08-03 ( http://blog.csdn.net/quqi99 )
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明。
最近,由于蜘蛛下载下来的文件需要压缩,本人趁此机会学习了解了一系列的压缩算法。一要考虑压缩比,二要比较速度,三要考虑追加,删除,查询(即在不解压的情况下知道压缩包里压缩的是什么东西,能方便的提取出元数据信息)是否方便。
刚开始,主要是比较
zlib, bzip2, gzip, rar, zip
几种算法的压缩性能比。
对于
rar
格式,由于主要是调用命令进行解压缩(代码见附件一)。它是跑在虚拟机之外的,一旦出现了错误,可能整个虚拟机因此会死掉,所以这种方法不予考虑。
网上说
bzip2
对于文本压缩的效率算是最高的,
ant.jar
包的
org.apache.tools.bzip2
提供了相应的
API
。但是使用时总不顺利,也就没花时间继续了。相关代码见附件二(未测试成功)。
于是开始学习
zlib.
它的
J***A
版本叫
jzlib,
用
jzlib
进行解压缩的代码见附件三。觉得这个还不错。于是,准备用它,但是压缩一个文件还行,但用
java.util.zip
包那样压缩目录确挺不方便的。现在才开始恍然大悟。哦,原来这些压缩算法一般只注重算法本身,至于怎么用着方便如支持按条目压缩则是外围应用要管的事情。
于是,开始考虑怎么吸收
java.util.zip
包里的思想在
zlib
算法的基础上包装能按目录压缩。搞到最后,发现原来
java.util.zip
包的底层用的压缩算法就是用的是
zlib. SUN
公司只不过是在核心算法的基础上加上了一些如校验(
CRC32, Adler32
)及按目录压缩(
ZipEntry
)以及方便访问的输入输出流(
ZipInputStream
,
ZipOutputStream
)。
既然
java.util.zip
包里用的就是
zlib
,我们就不需要再考虑怎样按目录进行压缩了,但事情进展也并不是一帆风顺。
首先,直接用
java.util.zip
的
API
编出的解压缩不能支持中文文件名,因为
java
对于文字的编码是以
unicode
为基础,因此,若是以
ZipInputStream
及
ZipOutputStream
来处理压缩及解压缩的工作,碰到中文档名或路径,它就不处理。仔细查看了
ZipInputStream
的
API
,发现问题就出现在
java.uti.zip.ZipInputStream
类中的这一句:
ZipEntry e = createZipEntry(getUTF8String(b, 0, len));
它应该被改成:
ZipEntry e=null;
try
{
if (this.encoding.toUpperCase().equals("UTF-8"))
e=createZipEntry(getUTF8String(b, 0, len));
else
e=createZipEntry(new String(b,0,len,this.encoding));
}
catch(Exception byteE)
{
e=createZipEntry(getUTF8String(b, 0, len));
}
幸好,在网上一搜,发现这个改动不需要由我们自己来做,因为
ant
的
org.apache.tools.zip
包中已经为我们改好了。用这个包编写的解压缩代码见附件四
.
接着又发现了问题。解压文件时有两种方式,一是采用
ZipFile,
二是采用
ZipOutputStream
。
ZipFile
一次性将
zip
文件全部读到内存中去,对于大
zip
就不行了,这时得采用
ZipOutputStream
方式,但是
org.apache.tools.zip
包对
ZipOutputStream
类恰好没进行改定,只仅仅提供了改写后的
ZipFile
。当你用
java.util.zip.ZipOutputStream
时同样对于中文文件名的文件不能进行压缩。
这时候在网上找到了文件《
让
ZipOutputStream
和
ZipInputStream
支持中文》(可在
google
搜)。它的方法是直接改
JDK
的源代码。但是我觉得直接改
JDK
的
JAR
包以后软件部署时比较麻烦,为些,我开始寻找另外的解决办法。
为了不改动
java.util.zip.
ZipInputStream,
自己就直接将这个类再重写一遍,首先通过复制粘贴写一个与之内容一模一样的类
jcss.search.base.zip.C
ZipInputStream
。然后在这个类中将
ZipEntry e = createZipEntry(getUTF8String(b, 0, len))
改写成上述的代码。此类见附件五。
另外,将复制出与
java.uti.zip.ZipConstants
内容一模一样的类
jcss.search.base.zip.
ZipConstants
另外,再实现一个
jcss.search.base.zip.ZipEntry
类,代码见附件六
.
至此,
OK
。
若想进一步提高压缩比的话,可以采用
7zip,
并且目前也有专门版本的
7zip SDK
(实现了
LZMA
压缩算法
.
另外,也有热心人士为方便访问在此基础上增加了两件输入输出流类(
net.contrapunctus.lzma.LzmaInputStream
与
net.contrapunctus.lzma.LzmaOutputStream
)
,但是没有包装按目录进行压缩相关的条目类。
附件一:
package
jcss.search.base;
/**
*
@author
张华
*
@time
2007
-
8
-
1
*
@description
*/
public
class
RarUtil {
/**
*
解压
*
*
@param
compress
*
rar
压缩文件
*
@param
decompression
*
解压路径
*/
public
void
unZip(String compress, String decompression)
throws
Exception {
java.lang.Runtime rt = java.lang.Runtime.getRuntime
();
Process p = rt.exec(
"C://Program Files//WinRAR//UNRAR.EXE x -o+ -p- "
+ compress +
" "
+ decompression);
StringBuffer sb =
new
StringBuffer();
java.io.InputStream fis = p.getInputStream();
int
value = 0;
while
((value = fis.read()) != -1)
{
sb.append((
char
) value);
}
fis.close();
String result =
new
String(sb.toString().getBytes(
"ISO-8859-1"
),
"GBK"
);
System.
out
.println(result);
}
/**
*
*
@param
outputRar
输出目录
*
@param
compression
要压缩的文件或目录
*
@throws
Exception
*/
public
void
zip(String outputRar, String compression)
throws
Exception {
java.lang.Runtime rt = java.lang.Runtime.getRuntime
();
//rar.exe
x
-t
-o+
-p-
E:/2.rar
E:/
Process p = rt.exec(
"C://Program Files//WinRAR//rar.exe x -t -o+ -p- "
+ outputRar +
" "
+ compression);
StringBuffer sb =
new
StringBuffer();
java.io.InputStream fis = p.getInputStream();
int
value = 0;
while
((value = fis.read()) != -1)
{
sb.append((
char
) value);
}
fis.close();
String result =
new
String(sb.toString().getBytes(
"ISO-8859-1"
),
"GBK"
);
System.
out
.println(result);
}
/**
*
@param
args
*/
public
static
void
main(String[] args) {
RarUtil test =
new
RarUtil();
String compress =
"f:/
增加转码过滤器
.rar"
;
// rar
压缩文件
String decompression =
"f:/test/"
;
//
解压路径
try
{
test.zip(
"f:/test.rar"
,
"
说明
.txt"
);
//test.unZip(compress, decompression);
}
catch
(Exception e) {
e.printStackTrace();
}
}
}
附件二:
package jcss.search.base;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.tools.bzip2.CBZip2InputStream;
import org.apache.tools.bzip2.CBZip2OutputStream;
/**
* @author
张华
* @time 2007-7-26
* @description BZip2
压缩,解压算法
*/
public class BZip2Util {
public static void Bzip2Compress(String in, String to) {
try {
File source = new File(in);
File destination = new File(to);
CBZip2OutputStream output = new CBZip2OutputStream(
new FileOutputStream(destination));
final FileInputStream input = new FileInputStream(source);
copy(input, output);
input.close();
output.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void Bzip2Uncompress(String in, String to) {
try {
File source = new File(in);
File destination = new File(to);
FileOutputStream output =new FileOutputStream(destination);
CBZip2InputStream input = new CBZip2InputStream( new FileInputStream(source));
copy( input, output );
input.close();
output.close();
} catch (Exception e) {
e.printStackTrace();
}
}
static void copy(final InputStream input, final OutputStream output)
throws IOException {
final byte[] buffer = new byte[8024];
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
}
}
/**
* @param args
*/
public static void main(String[] args) {
BZip2Util test = new BZip2Util();
String in = "f://~HlIndex.htm";
String to = "f://a.bz2";
String out2 = "b.htm";
//test.Bzip2Compress(in, to);
//test.Bzip2Uncompress(to, out2);
}
}
附件三:
package example;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import com.jcraft.jzlib.*;
/**
缺点:不能按目录压缩。
* @author
张华
* @time 2007-7-30
* @description reference http://tianxiagod.spaces.live.com/
* http://blog.csdn.net/kong555/archive/2006/03/28/641855.aspx
*/
public class TestJZlib {
//
压缩的文件长度,压缩、解压时均要用,挺关键。
//
要确保方法
compressfile
()与
uncompressfile
()参数一致
static int resLen = 0;
/**
*
压缩
*
* @param data
* @param type
*
压缩方法为一个整数
-1
为默认压缩比
9
为最高压缩比
0
为不压缩
1
为快速压缩
* @return
*/
public static byte[] compressfile(byte[] data, int type,int len) {
int err;
int comprLen = len;
byte[] compr = new byte[comprLen];
ZStream c_stream = new ZStream();
err = c_stream.deflateInit(type);
CHECK_ERR(c_stream, err, "deflateInit");
c_stream.next_in = data;
c_stream.next_in_index = 0;
c_stream.next_out = compr;
c_stream.next_out_index = 0;
while (c_stream.total_in != data.length
&& c_stream.total_out < comprLen) {
c_stream.avail_in = c_stream.avail_out = 1; //
置初值
err = c_stream.deflate(JZlib.Z_NO_FLUSH);
CHECK_ERR(c_stream, err, "deflate");
}
System.out.println("
压缩前
--" + c_stream.total_in + "
字节
");
while (true) {
c_stream.avail_out = 1;
err = c_stream.deflate(JZlib.Z_FINISH);
if (err == JZlib.Z_STREAM_END) {
break;
}
CHECK_ERR(c_stream, err, "deflate");
}
System.out.println("
压缩后
--" + c_stream.total_out + "
字节
");
err = c_stream.deflateEnd();
CHECK_ERR(c_stream, err, "deflateEnd");
byte[] zipfile = new byte[(int) c_stream.total_out];
System.arraycopy(compr, 0, zipfile, 0, zipfile.length);
return zipfile;
}
public static byte[] uncompressfile(byte[] data,int len) {
int err;
int uncomprLen = len;
byte[] uncompr = new byte[uncomprLen];
ZStream d_stream = new ZStream();
err = d_stream.inflateInit();
CHECK_ERR(d_stream, err, "inflateInit");
d_stream.next_in = data;
d_stream.next_in_index = 0;
d_stream.next_out = uncompr;
d_stream.next_out_index = 0;
while (d_stream.total_out < uncomprLen
&& d_stream.total_in < uncomprLen) {
d_stream.avail_in = d_stream.avail_out = 1;
err = d_stream.inflate(JZlib.Z_NO_FLUSH);
if (err == JZlib.Z_STREAM_END) {
break;
}
CHECK_ERR(d_stream, err, "inflate");
}
System.out.println("
解压缩前
--" + d_stream.total_in + "
字节
");
System.out.println("
解压缩后
--" + d_stream.total_out + "
字节
");
err = d_stream.inflateEnd();
CHECK_ERR(d_stream, err, "inflateEnd");
byte[] unzipfile = new byte[(int) d_stream.total_out];
System.arraycopy(uncompr, 0, unzipfile, 0, unzipfile.length);
return unzipfile;
}
static void CHECK_ERR(ZStream z, int err, String msg) {
if (err != JZlib.Z_OK) {
if (z.msg != null) {
System.out.print(z.msg + " ");
}
System.out.println(msg + " error: " + err);
System.exit(1);
}
}
static void zip(File input, File output, int compressFactor) {
if (!input.exists())
return;
if (!output.getParentFile().exists())
output.getParentFile().mkdir();
try {
FileInputStream in = new FileInputStream(input);
FileOutputStream out = new FileOutputStream(output);
resLen = in.available();
byte[] buff = new byte[resLen];
in.read(buff);
byte[] suBuf = compressfile(buff, compressFactor,resLen);
out.write(suBuf, 0, suBuf.length); //
写压缩文件
in.close();
out.close();
System.out.println("
压缩完毕!
" + input.getAbsolutePath());
} catch (Exception e) {
e.printStackTrace();
}
}
static void unZip(File input, File output) {
if (!input.exists())
return;
if (!output.getParentFile().exists())
output.getParentFile().mkdir();
try {
FileInputStream in = new FileInputStream(input);
FileOutputStream out = new FileOutputStream(output);
byte[] buff = new byte[resLen];
in.read(buff);
byte[] suBuff = uncompressfile(buff,resLen);
out.write(suBuff, 0, suBuff.length); //
写压缩文件
in.close();
out.close();
System.out.println("
解压完毕!
" + input.getAbsolutePath());
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* @param args
*/
public static void main(String[] args) {
TestJZlib test = new TestJZlib();
//
压缩
File input = new File("f://
搜索引擎原理系统与设计
.pdf");
File output = new File("f://test.bz2");
test.zip(input, output, 9);
//
解压
File output2 = new File("f://test.jpg");
test.unZip(output, output2);
}
}
附件四:
package jcss.search.base;
/*
调用
org.apache.tools.zip
实现压缩。
夜可以使用
java.util.zip
不过如果是中文的话,
解压缩的时候文件名字会是乱码。原因是解压缩软件的编码格式跟
java.util.zip.ZipInputStream
的编码字符集不同
java.util.zip.ZipInputStream
的字符集固定是
UTF-8
注销的部分是解压缩的代码。
*/
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.Date;
import java.util.zip.ZipInputStream;
import jcss.search.base.zip.CZipInputStream;
import org.apache.tools.zip.ZipOutputStream;
/*
* @
作者:张华
@
日期:
2006-5-14
@
说明:
*/
public class ZipUtil {
int count = 0;
static final int BUFFER = 2048;
public void zip(String zipFileName, String inputFile) throws Exception {
zip(zipFileName, new File(inputFile));
}
public void zip(String zipFileName, File inputFile) throws Exception {
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(
new String(zipFileName.getBytes("gb2312"))));
System.out.println("zip start");
zip(out, inputFile, "");
System.out.println("zip done");
out.close();
}
public void zip(ZipOutputStream out, File f, String base) throws Exception {
System.out.println("Zipping
" + f.getName());
Date beginDate = new Date();
if (f.isDirectory()) {
File[] fl = f.listFiles();
// out.putNextEntry(new ZipEntry(base + "/"));
out.putNextEntry(new org.apache.tools.zip.ZipEntry(base + "/"));
base = base.length() == 0 ? "" : base + "/";
for (int i = 0; i < fl.length; i++) {
zip(out, fl[i], base + fl[i].getName());
System.out.println(fl[i].getName());
// System.out.println(new
// String(fl[i].getName().getBytes("gb2312")));
}
} else {
// out.putNextEntry(new ZipEntry(base));
out.putNextEntry(new org.apache.tools.zip.ZipEntry(base));
System.out.println(base);
FileInputStream in = new FileInputStream(f);
int b;
while ((b = in.read()) != -1)
out.write(b);
in.close();
}
Date endDate = new Date();
long temp = beginDate.getTime() - endDate.getTime();
System.out.println("
共用时间:
" + temp);
}
private void createDirectory(String directory, String subDirectory) {
String dir[];
File fl = new File(directory);
try {
if (subDirectory == "" && fl.exists() != true)
fl.mkdir();
else if (subDirectory != "") {
dir = subDirectory.replace('//', '/').split("/");
for (int i = 0; i < dir.length; i++) {
File subFile = new File(directory + File.separator + dir[i]);
if (subFile.exists() == false)
subFile.mkdir();
directory += File.separator + dir[i];
}
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
*
使用
ZipFile
解压缩小
ZIP
*
*
类
ZipInputStream
读出
ZIP
文件序列(简单地说就是读出这个
ZIP
文件压缩了多少文件)
*
而类
ZipFile
使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出
ZIP
压缩文件序列。
* ZIPInputStream
和
ZipFile
之间另外一个基本的不同点在于高速缓冲的使用方面。
*
当文件使用
ZipInputStream
和
FileInputStream
流读出的时候,
ZIP
条目不使用高速缓冲。
*
然而,如果使用
ZipFile
(文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果
ZipFile
(文件名)
*
被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在
UNIX
系统下,
*
这是什么作用都没有的,因为使用
ZipFile
打开的所有
ZIP
文件都在内存中存在映射,
*
所以使用
ZipFile
的性能优于
ZipInputStream
。
*
然而,如果同一
ZIP
文件的内容在程序执行期间经常改变,或是重载的话,使用
ZipInputStream
就成为你的首选了。
* @param zipFileName
* @param outputDirectory
* @throws Exception
*/
public void unSmallZip(String zipFileName, String outputDirectory)
throws Exception {
try {
Date beginDate = new Date();
org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);
java.util.Enumeration e = zipFile.getEntries();
org.apache.tools.zip.ZipEntry zipEntry = null;
createDirectory(outputDirectory, "");
while (e.hasMoreElements()) {
zipEntry = (org.apache.tools.zip.ZipEntry) e.nextElement();
String name = null;
if (zipEntry.isDirectory()) {
name = zipEntry.getName();
name = name.substring(0, name.length() - 1);
File f = new File(outputDirectory + File.separator + name);
f.mkdir();
System.out.println("
创建目录:
" + outputDirectory
+ File.separator + name);
} else {
String fileName = zipEntry.getName();
fileName = fileName.replace('//', '/');
count++;
System.out.println("
正在解压第
" + count + "
个文件
: "
+ zipEntry.getName());
if (fileName.indexOf("/") != -1) {
createDirectory(outputDirectory, fileName.substring(0,
fileName.lastIndexOf("/")));
fileName = fileName.substring(
fileName.lastIndexOf("/") + 1, fileName
.length());
}
File f = new File(outputDirectory + File.separator
+ zipEntry.getName());
f.createNewFile();
InputStream in = zipFile.getInputStream(zipEntry);
FileOutputStream out = new FileOutputStream(f);
byte[] by = new byte[1024];
int c;
while ((c = in.read(by)) != -1) {
out.write(by, 0, c);
}
out.close();
in.close();
}
}
//
删除文件不能在这里删,因为文件正在使用,应在上传那处删
//
解压后,删除压缩文件
// File zipFileToDel = new File(zipFileName);
// zipFileToDel.delete();
// System.out.println("
正在删除文件:
"+ zipFileToDel.getCanonicalPath());
// //
删除解压后的那一层目录
// delALayerDir(zipFileName, outputDirectory);
Date endDate = new Date();
long temp = beginDate.getTime() - endDate.getTime();
System.out.println("
解压共用时间:
" + temp);
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
*
使用
ZipInputStream
解压大
ZIP(
通过修改
ZipInputStream
类让其支持中文文件名
)
*
*
类
ZipInputStream
读出
ZIP
文件序列(简单地说就是读出这个
ZIP
文件压缩了多少文件)
*
而类
ZipFile
使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出
ZIP
压缩文件序列。
* ZIPInputStream
和
ZipFile
之间另外一个基本的不同点在于高速缓冲的使用方面。
*
当文件使用
ZipInputStream
和
FileInputStream
流读出的时候,
ZIP
条目不使用高速缓冲。
*
然而,如果使用
ZipFile
(文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果
ZipFile
(文件名)
*
被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在
UNIX
系统下,
*
这是什么作用都没有的,因为使用
ZipFile
打开的所有
ZIP
文件都在内存中存在映射,
*
所以使用
ZipFile
的性能优于
ZipInputStream
。
*
然而,如果同一
ZIP
文件的内容在程序执行期间经常改变,或是重载的话,使用
ZipInputStream
就成为你的首选了。
* @param zipFileName
* @param outputDirectory
* @throws Exception
*/
public void unBigZip(String zipFileName, String outputDirectory)
throws Exception {
try {
Date beginDate = new Date();
//org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);
FileInputStream fis = new FileInputStream(zipFileName);
BufferedOutputStream dest = null;
//CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis));
CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis),"gb2312");
//org.apache.tools.zip.ZipEntry entry;
//java.util.zip.ZipEntry entry;
jcss.search.base.zip.ZipEntry entry;
while((entry =zin.getNextEntry()) != null) {
String name = null;
if (entry.isDirectory()) {
name = entry.getName();
name = name.substring(0, name.length() - 1);
File f = new File(outputDirectory + File.separator + name);
f.mkdir();
System.out.println("
创建目录:
" + outputDirectory + File.separator + name);
}else{
String fileName = entry.getName();
fileName = fileName.replace('//', '/');
count++;
System.out.println("
正在解压第
" + count + "
个文件
: " + entry.getName());
if (fileName.indexOf("/") != -1) {
createDirectory(outputDirectory, fileName.substring(0,fileName.lastIndexOf("/")));
fileName = fileName.substring(fileName.lastIndexOf("/") + 1, fileName.length());
}
File f = new File(outputDirectory + File.separator + entry.getName());
f.createNewFile();
//
InputStream in = zipFile.getInputStream(zipEntry);
//
FileOutputStream out = new FileOutputStream(f);
//
byte[] by = new byte[1024];
//
int c;
//
while ((c = in.read(by)) != -1) {
//
out.write(by, 0, c);
//
}
//
out.close();
//
in.close();
int cnt;
byte data[] = new byte[BUFFER];
FileOutputStream fos = new FileOutputStream(f);
dest = new BufferedOutputStream(fos, BUFFER);
while ((cnt = zin.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, cnt);
}
dest.flush();
dest.close();
}
}
zin.close();
//
删除文件不能在这里删,因为文件正在使用,应在上传那处删
//
解压后,删除压缩文件
// File zipFileToDel = new File(zipFileName);
// zipFileToDel.delete();
// System.out.println("
正在删除文件:
"+ zipFileToDel.getCanonicalPath());
// //
删除解压后的那一层目录
// delALayerDir(zipFileName, outputDirectory);
Date endDate = new Date();
long temp = endDate.getTime() - beginDate.getTime();
System.out.println("
解压共用时间:
" + temp);
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
*
删掉一层目录
*
* @param zipFileName
* @param outputDirectory
*/
public void delALayerDir(String zipFileName, String outputDirectory) {
String[] dir = zipFileName.replace('//', '/').split("/");
String fileFullName = dir[dir.length - 1]; //
得到
aa.zip
int pos = -1;
pos = fileFullName.indexOf(".");
String fileName = fileFullName.substring(0, pos); //
得到
aa
String sourceDir = outputDirectory + File.separator + fileName;
try {
copyFile(new File(outputDirectory), new File(sourceDir), new File(
sourceDir));
deleteSourceBaseDir(new File(sourceDir));
} catch (Exception e) {
e.printStackTrace();
}
}
/**
*
将
sourceDir
目录的文件全部
copy
到
destDir
中去
*/
public void copyFile(File destDir, File sourceBaseDir, File sourceDir)
throws Exception {
File[] lists = sourceDir.listFiles();
String line = null;
String url = null;
if (lists == null)
return;
for (int i = 0; i < lists.length; i++) {
File f = lists[i];
if (f.isFile()) {
FileInputStream fis = new FileInputStream(f);
String content = "";
String sourceBasePath = sourceBaseDir.getCanonicalPath();
String destPath = destDir.getCanonicalPath();
String fPath = f.getCanonicalPath();
String drPath = destDir
+ fPath.substring(fPath.indexOf(sourceBasePath)
+ sourceBasePath.length());
FileOutputStream fos = new FileOutputStream(drPath);
byte[] b = new byte[2048];
while (fis.read(b) != -1) {
if (content != null)
content += new String(b);
else
content = new String(b);
b = new byte[2048];
}
content = content.trim();
fis.close();
fos.write(content.getBytes());
fos.flush();
fos.close();
} else {
//
先新建目录
new File(destDir + File.separator + f.getName()).mkdir();
copyFile(destDir, sourceBaseDir, f); //
递归调用
}
}
}
/**
*
将
sourceDir
目录的文件全部
copy
到
destDir
中去
*/
public void deleteSourceBaseDir(File curFile) throws Exception {
File[] lists = curFile.listFiles();
String line = null;
String url = null;
File parentFile = null;
for (int i = 0; i < lists.length; i++) {
File f = lists[i];
if (f.isFile()) {
f.delete();
//
若它的父目录没有文件了,说明已经删完,应该删除父目录
parentFile = f.getParentFile();
if (parentFile.list().length == 0)
parentFile.delete();
} else {
deleteSourceBaseDir(f); //
递归调用
}
}
}
public static void main(String[] args) {
try {
ZipUtil t = new ZipUtil();
// t.zip("e://test.zip", "E://news.sina.com.cn//news.sina.com.cn");
Date beginDate = new Date();
//t.unZip("e://test.zip", "E://news.sina.com.cn");
t.unBigZip("e://test.zip", "E://news.sina.com.cn");
Date endDate = new Date();
long temp = endDate.getTime() - beginDate.getTime();
System.out.println("
共用时间:
" + temp);
} catch (Exception e) {
e.printStackTrace(System.out);
}
}
}
附件五:
/*
* @(#)ZipInputStream.java
1.37 04/06/11
*
* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
*/
package
jcss.search.base.zip;
import
java.io.InputStream;
import
java.io.IOException;
import
java.io.EOFException;
import
java.io.PushbackInputStream;
import
java.util.zip.CRC32;
import
java.util.zip.Inflater;
import
java.util.zip.InflaterInputStream;
import
java.util.zip.ZipException;
/**
*
*
*
@author
David
Connelly
*
@version
1.37,
06/11/04
*/
public
class
CZipInputStream
extends
InflaterInputStream
implements
ZipConstants {
private
String
encoding
=
"UTF-8"
;
private
ZipEntry
entry
;
private
CRC32
crc
=
new
CRC32();
private
long
remaining
;
private
byte
[]
tmpbuf
=
new
byte
[512];
private
static
final
int
STORED
= ZipEntry.
STORED
;
private
static
final
int
DEFLATED
= ZipEntry.
DEFLATED
;
private
boolean
closed
=
false
;
// this flag is set to true after EOF has reached for
// one entry
private
boolean
entryEOF
=
false
;
/**
*
Check
to
make
sure
that
this
stream
has
not
been
closed
*/
private
void
ensureOpen()
throws
IOException {
if
(
closed
) {
throw
new
IOException(
"Stream closed"
);
}
}
boolean
usesDefaultInflater
=
false
;
/**
*
Creates
a
new
ZIP
input
stream.
*
@param
in
the
actual
input
stream
*/
public
CZipInputStream(InputStream in) {
super
(
new
PushbackInputStream(in, 512),
new
Inflater(
true
), 512);
usesDefaultInflater
=
true
;
if
(in ==
null
) {
throw
new
NullPointerException(
"in is null"
);
}
}
public
CZipInputStream(InputStream in,String encoding) {
super
(
new
PushbackInputStream(in,512),
new
Inflater(
true
),512);
usesDefaultInflater
=
true
;
if
(in ==
null
) {
throw
new
NullPointerException(
"in is null"
);
}
this
.
encoding
=encoding;
}
/**
*
Reads
the
next
ZIP
file
entry
and
positions
the
stream
at
the
*
beginning
of
the
entry
data.
*
@return
the
next
ZIP
file
entry,
or
null
if
there
are
no
more
entries
*
@exception
ZipException
if
a
ZIP
file
error
has
occurred
*
@exception
IOException
if
an
I/O
error
has
occurred
*/
public
ZipEntry getNextEntry()
throws
IOException {
ensureOpen();
if
(
entry
!=
null
) {
closeEntry();
}
crc
.reset();
inf
.reset();
if
((
entry
= readLOC()) ==
null
) {
return
null
;
}
if
(
entry
.
method
==
STORED
) {
remaining
=
entry
.
size
;
}
entryEOF
=
false
;
return
entry
;
}
/**
*
Closes
the
current
ZIP
entry
and
positions
the
stream
for
reading
the
*
next
entry.
*
@exception
ZipException
if
a
ZIP
file
error
has
occurred
*
@exception
IOException
if
an
I/O
error
has
occurred
*/
public
void
closeEntry()
throws
IOException {
ensureOpen();
while
(read(
tmpbuf
, 0,
tmpbuf
.
length
) != -1) ;
entryEOF
=
true
;
}
/**
*
Returns
0
after
EOF
has
reached
for
the
current
entry
data,
*
otherwise
always
return
1.
*
<p>
*
Programs
should
not
count
on
this
method
to
return
the
actual
number
*
of
bytes
that
could
be
read
without
blocking.
*
*
@return
1
before
EOF
and
0
after
EOF
has
reached
for
current
entry.
*
@exception
IOException
if
an
I/O
error
occurs.
*
*/
public
int
available()
throws
IOException {
ensureOpen();
if
(
entryEOF
) {
return
0;
}
else
{
return
1;
}
}
/**
*
Reads
from
the
current
ZIP
entry
into
an
array
of
bytes.
Blocks
until
*
some
input
is
available.
*
@param
b
the
buffer
into
which
the
data
is
read
*
@param
off
the
start
offset
of
the
data
*
@param
len
the
maximum
number
of
bytes
read
*
@return
the
actual
number
of
bytes
read,
or
-
1
if
the
end
of
the
*
entry
is
reached
*
@exception
ZipException
if
a
ZIP
file
error
has
occurred
*
@exception
IOException
if
an
I/O
error
has
occurred
*/
public
int
read(
byte
[] b,
int
off,
int
len)
throws
IOException {
ensureOpen();
if
(off < 0 || len < 0 || off > b.
length
- len) {
throw
new
IndexOutOfBoundsException();
}
else
if
(len == 0) {
return
0;
}
if
(
entry
==
null
) {
return
-1;
}
switch
(
entry
.
method
) {
case
DEFLATED
:
len =
super
.read(b, off, len);
if
(len == -1) {
readEnd(
entry
);
entryEOF
=
true
;
entry
=
null
;
}
else
{
crc
.update(b, off, len);
}
return
len;
case
STORED
:
if
(
remaining
<= 0) {
entryEOF
=
true
;
entry
=
null
;
return
-1;
}
if
(len >
remaining
) {
len = (
int
)
remaining
;
}
len =
in
.read(b, off, len);
if
(len == -1) {
throw
new
ZipException(
"unexpected EOF"
);
}
crc
.update(b, off, len);
remaining
-= len;
return
len;
default
:
throw
new
InternalError(
"invalid compression method"
);
}
}
/**
*
Skips
specified
number
of
bytes
in
the
current
ZIP
entry.
*
@param
n
the
number
of
bytes
to
skip
*
@return
the
actual
number
of
bytes
skipped
*
@exception
ZipException
if
a
ZIP
file
error
has
occurred
*
@exception
IOException
if
an
I/O
error
has
occurred
*
@exception
IllegalArgumentException
if
n
< 0
*/
public
long
skip(
long
n)
throws
IOException {
if
(n < 0) {
throw
new
IllegalArgumentException(
"negative skip length"
);
}
ensureOpen();
int
max = (
int
)Math.min
(n, Integer.
MAX_VALUE
);
int
total = 0;
while
(total < max) {
int
len = max - total;
if
(len >
tmpbuf
.
length
) {
len =
tmpbuf
.
length
;
}
len = read(
tmpbuf
, 0, len);
if
(len == -1) {
entryEOF
=
true
;
break
;
}
total += len;
}
return
total;
}
/**
*
Closes
this
input
stream
and
releases
any
system
resources
associated
*
with
the
stream.
*
@exception
IOException
if
an
I/O
error
has
occurred
*/
public
void
close()
throws
IOException {
if
(!
closed
) {
super
.close();
closed
=
true
;
}
}
private
byte
[]
b
=
new
byte
[256];
/*
* Reads local file (LOC) header for next entry.
*/
private
ZipEntry readLOC()
throws
IOException {
try
{
readFully(
tmpbuf
, 0,
LOCHDR
);
}
catch
(EOFException e) {
return
null
;
}
if
(get32
(
tmpbuf
, 0) !=
LOCSIG
) {
return
null
;
}
// get the entry name and create the ZipEntry first
int
len = get16
(
tmpbuf
,
LOCNAM
);
if
(len == 0) {
throw
new
ZipException(
"missing entry name"
);
}
int
blen =
b
.
length
;
if
(len > blen) {
do
blen = blen * 2;
while
(len > blen);
b
=
new
byte
[blen];
}
readFully(
b
, 0, len);
//ZipEntry e = createZipEntry(getUTF8String(b, 0, len));
ZipEntry e=
null
;
try
{
if
(
this
.
encoding
.toUpperCase().equals(
"UTF-8"
))
e=createZipEntry(getUTF8String
(
b
, 0, len));
else
e=createZipEntry(
new
String(
b
,0,len,
this
.
encoding
));
}
catch
(Exception byteE)
{
e=createZipEntry(getUTF8String
(
b
, 0, len));
}
// now get the remaining fields for the entry
e.
version
= get16
(
tmpbuf
,
LOCVER
);
e.
flag
= get16
(
tmpbuf
,
LOCFLG
);
if
((e.
flag
& 1) == 1) {
throw
new
ZipException(
"encrypted ZIP entry not supported"
);
}
e.
method
= get16
(
tmpbuf
,
LOCHOW
);
e.
time
= get32
(
tmpbuf
,
LOCTIM
);
if
((e.
flag
& 8) == 8) {
/* EXT descriptor present */
if
(e.
method
!=
DEFLATED
) {
throw
new
ZipException(
"only DEFLATED entries can have EXT descriptor"
);
}
}
else
{
e.
crc
= get32
(
tmpbuf
,
LOCCRC
);
e.
csize
= get32
(
tmpbuf
,
LOCSIZ
);
e.
size
= get32
(
tmpbuf
,
LOCLEN
);
}
len = get16
(
tmpbuf
,
LOCEXT
);
if
(len > 0) {
byte
[] bb =
new
byte
[len];
readFully(bb, 0, len);
e.
extra
= bb;
}
return
e;
}
/*
* Fetches a UTF8-encoded String from the specified byte array.
*/
private
static
String getUTF8String(
byte
[] b,
int
off,
int
len) {
// First, count the number of characters in the sequence
int
count = 0;
int
max = off + len;
int
i = off;
while
(i < max) {
int
c = b[i++] & 0xff;
switch
(c >> 4) {
case
0:
case
1:
case
2:
case
3:
case
4:
case
5:
case
6:
case
7:
// 0xxxxxxx
count++;
break
;
case
12:
case
13:
// 110xxxxx 10xxxxxx
if
((
int
)(b[i++] & 0xc0) != 0x80) {
throw
new
IllegalArgumentException();
}
count++;
break
;
case
14:
// 1110xxxx 10xxxxxx 10xxxxxx
if
(((
int
)(b[i++] & 0xc0) != 0x80) ||
((
int
)(b[i++] & 0xc0) != 0x80)) {
throw
new
IllegalArgumentException();
}
count++;
break
;
default
:
// 10xxxxxx, 1111xxxx
throw
new
IllegalArgumentException();
}
}
if
(i != max) {
throw
new
IllegalArgumentException();
}
// Now decode the characters...
char
[] cs =
new
char
[count];
i = 0;
while
(off < max) {
int
c = b[off++] & 0xff;
switch
(c >> 4) {
case
0:
case
1:
case
2:
case
3:
case
4:
case
5:
case
6:
case
7:
// 0xxxxxxx
cs[i++] = (
char
)c;
break
;
case
12:
case
13:
// 110xxxxx 10xxxxxx
cs[i++] = (
char
)(((c & 0x1f) << 6) | (b[off++] & 0x3f));
break
;
case
14:
// 1110xxxx 10xxxxxx 10xxxxxx
int
t = (b[off++] & 0x3f) << 6;
cs[i++] = (
char
)(((c & 0x0f) << 12) | t | (b[off++] & 0x3f));
break
;
default
:
// 10xxxxxx, 1111xxxx
throw
new
IllegalArgumentException();
}
}
return
new
String(cs, 0, count);
}
/**
*
Creates
a
new
<code>
ZipEntry
</code>
object
for
the
specified
*
entry
name.
*
*
@param
name
the
ZIP
file
entry
name
*
@return
the
ZipEntry
just
created
*/
protected
ZipEntry createZipEntry(String name) {
return
new
ZipEntry(name);
}
/*
* Reads end of deflated entry as well as EXT descriptor if present.
*/
private
void
readEnd(ZipEntry e)
throws
IOException {
int
n =
inf
.getRemaining();
if
(n > 0) {
((PushbackInputStream)
in
).unread(
buf
,
len
- n, n);
}
if
((e.
flag
& 8) == 8) {
/* EXT descriptor present */
readFully(
tmpbuf
, 0,
EXTHDR
);
long
sig = get32
(
tmpbuf
, 0);
if
(sig !=
EXTSIG
) {
// no EXTSIG present
e.
crc
= sig;
e.
csize
= get32
(
tmpbuf
,
EXTSIZ
-
EXTCRC
);
e.
size
= get32
(
tmpbuf
,
EXTLEN
-
EXTCRC
);
((PushbackInputStream)
in
).unread(
tmpbuf
,
EXTHDR
-
EXTCRC
- 1,
EXTCRC
);
}
else
{
e.
crc
= get32
(
tmpbuf
,
EXTCRC
);
e.
csize
= get32
(
tmpbuf
,
EXTSIZ
);
e.
size
= get32
(
tmpbuf
,
EXTLEN
);
}
}
if
(e.
size
!=
inf
.getBytesWritten()) {
throw
new
ZipException(
"invalid entry size (expected "
+ e.
size
+
" but got "
+
inf
.getBytesWritten() +
" bytes)"
);
}
if
(e.
csize
!=
inf
.getBytesRead()) {
throw
new
ZipException(
"invalid entry compressed size (expected "
+ e.
csize
+
" but got "
+
inf
.getBytesRead() +
" bytes)"
);
}
if
(e.
crc
!=
crc
.getValue()) {
throw
new
ZipException(
"invalid entry CRC (expected 0x"
+ Long.toHexString
(e.
crc
) +
" but got 0x"
+ Long.toHexString
(
crc
.getValue()) +
")"
);
}
}
/*
* Reads bytes, blocking until all bytes are read.
*/
private
void
readFully(
byte
[] b,
int
off,
int
len)
throws
IOException {
while
(len > 0) {
int
n =
in
.read(b, off, len);
if
(n == -1) {
throw
new
EOFException();
}
off += n;
len -= n;
}
}
/*
* Fetches unsigned 16-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
*/
private
static
final
int
get16(
byte
b[],
int
off) {
return
(b[off] & 0xff) | ((b[off+1] & 0xff) << 8);
}
/*
* Fetches unsigned 32-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
*/
private
static
final
long
get32(
byte
b[],
int
off) {
return
get16
(b, off) | ((
long
)get16
(b, off+2) << 16);
}
}
附件六:
package
jcss.search.base.zip;
/**
*
@author
张华
*
@time
2007
-
8
-
3
*
@description
**/
public
class
ZipEntry
extends
org.apache.tools.zip.ZipEntry {
String
name
;
// entry name
long
time
= -1;
// modification time (in DOS time)
long
crc
= -1;
// crc-32 of entry data
long
size
= -1;
// uncompressed size of entry data
long
csize
= -1;
// compressed size of entry data
int
method
= -1;
// compression method
byte
[]
extra
;
// optional extra field data for entry
String
comment
;
// optional comment string for entry
// The following flags are used only by Zip{Input,Output}Stream
int
flag
;
// bit flags
int
version
;
// version needed to extract
long
offset
;
// offset of loc header
/**
*
Compression
method
for
uncompressed
entries.
*/
public
static
final
int
STORED
= 0;
/**
*
Compression
method
for
compressed
(deflated)
entries.
*/
public
static
final
int
DEFLATED
= 8;
//
下面这句一定要注释掉
//
static {
//
/* Zip library is loaded from System.initializeSystemClass */
//
initIDs();
//
}
//
private static native void initIDs();
public
ZipEntry(String name){
super
(name);
}
}
作者:张华 发表于:2007-08-03 ( http://blog.csdn.net/quqi99 )
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明。
最近,由于蜘蛛下载下来的文件需要压缩,本人趁此机会学习了解了一系列的压缩算法。一要考虑压缩比,二要比较速度,三要考虑追加,删除,查询(即在不解压的情况下知道压缩包里压缩的是什么东西,能方便的提取出元数据信息)是否方便。
刚开始,主要是比较
zlib, bzip2, gzip, rar, zip
几种算法的压缩性能比。
对于
rar
格式,由于主要是调用命令进行解压缩(代码见附件一)。它是跑在虚拟机之外的,一旦出现了错误,可能整个虚拟机因此会死掉,所以这种方法不予考虑。
网上说
bzip2
对于文本压缩的效率算是最高的,
ant.jar
包的
org.apache.tools.bzip2
提供了相应的
API
。但是使用时总不顺利,也就没花时间继续了。相关代码见附件二(未测试成功)。
于是开始学习
zlib.
它的
J***A
版本叫
jzlib,
用
jzlib
进行解压缩的代码见附件三。觉得这个还不错。于是,准备用它,但是压缩一个文件还行,但用
java.util.zip
包那样压缩目录确挺不方便的。现在才开始恍然大悟。哦,原来这些压缩算法一般只注重算法本身,至于怎么用着方便如支持按条目压缩则是外围应用要管的事情。
于是,开始考虑怎么吸收
java.util.zip
包里的思想在
zlib
算法的基础上包装能按目录压缩。搞到最后,发现原来
java.util.zip
包的底层用的压缩算法就是用的是
zlib. SUN
公司只不过是在核心算法的基础上加上了一些如校验(
CRC32, Adler32
)及按目录压缩(
ZipEntry
)以及方便访问的输入输出流(
ZipInputStream
,
ZipOutputStream
)。
既然
java.util.zip
包里用的就是
zlib
,我们就不需要再考虑怎样按目录进行压缩了,但事情进展也并不是一帆风顺。
首先,直接用
java.util.zip
的
API
编出的解压缩不能支持中文文件名,因为
java
对于文字的编码是以
unicode
为基础,因此,若是以
ZipInputStream
及
ZipOutputStream
来处理压缩及解压缩的工作,碰到中文档名或路径,它就不处理。仔细查看了
ZipInputStream
的
API
,发现问题就出现在
java.uti.zip.ZipInputStream
类中的这一句:
ZipEntry e = createZipEntry(getUTF8String(b, 0, len));
它应该被改成:
ZipEntry e=null;
try
{
if (this.encoding.toUpperCase().equals("UTF-8"))
e=createZipEntry(getUTF8String(b, 0, len));
else
e=createZipEntry(new String(b,0,len,this.encoding));
}
catch(Exception byteE)
{
e=createZipEntry(getUTF8String(b, 0, len));
}
幸好,在网上一搜,发现这个改动不需要由我们自己来做,因为
ant
的
org.apache.tools.zip
包中已经为我们改好了。用这个包编写的解压缩代码见附件四
.
接着又发现了问题。解压文件时有两种方式,一是采用
ZipFile,
二是采用
ZipOutputStream
。
ZipFile
一次性将
zip
文件全部读到内存中去,对于大
zip
就不行了,这时得采用
ZipOutputStream
方式,但是
org.apache.tools.zip
包对
ZipOutputStream
类恰好没进行改定,只仅仅提供了改写后的
ZipFile
。当你用
java.util.zip.ZipOutputStream
时同样对于中文文件名的文件不能进行压缩。
这时候在网上找到了文件《
让
ZipOutputStream
和
ZipInputStream
支持中文》(可在
搜)。它的方法是直接改
JDK
的源代码。但是我觉得直接改
JDK
的
JAR
包以后软件部署时比较麻烦,为些,我开始寻找另外的解决办法。
为了不改动
java.util.zip.
ZipInputStream,
自己就直接将这个类再重写一遍,首先通过复制粘贴写一个与之内容一模一样的类
jcss.search.base.zip.C
ZipInputStream
。然后在这个类中将
ZipEntry e = createZipEntry(getUTF8String(b, 0, len))
改写成上述的代码。此类见附件五。
另外,将复制出与
java.uti.zip.ZipConstants
内容一模一样的类
jcss.search.base.zip.
ZipConstants
另外,再实现一个
jcss.search.base.zip.ZipEntry
类,代码见附件六
.
至此,
OK
。
若想进一步提高压缩比的话,可以采用
7zip,
并且目前也有专门版本的
7zip SDK
(实现了
LZMA
压缩算法
.
另外,也有热心人士为方便访问在此基础上增加了两件输入输出流类(
net.contrapunctus.lzma.LzmaInputStream
与
net.contrapunctus.lzma.LzmaOutputStream
)
,但是没有包装按目录进行压缩相关的条目类。
附件一:
package
jcss.search.base;
/**
*
@author
张华
*
@time
2007
-
8
-
1
*
@description
*/
public
class
RarUtil {
/**
*
解压
*
*
@param
compress
*
rar
压缩文件
*
@param
decompression
*
解压路径
*/
public
void
unZip(String compress, String decompression)
throws
Exception {
java.lang.Runtime rt = java.lang.Runtime.getRuntime
();
Process p = rt.exec(
"C://Program Files//WinRAR//UNRAR.EXE x -o+ -p- "
+ compress +
" "
+ decompression);
StringBuffer sb =
new
StringBuffer();
java.io.InputStream fis = p.getInputStream();
int
value = 0;
while
((value = fis.read()) != -1)
{
sb.append((
char
) value);
}
fis.close();
String result =
new
String(sb.toString().getBytes(
"ISO-8859-1"
),
"GBK"
);
System.
out
.println(result);
}
/**
*
*
@param
outputRar
输出目录
*
@param
compression
要压缩的文件或目录
*
@throws
Exception
*/
public
void
zip(String outputRar, String compression)
throws
Exception {
java.lang.Runtime rt = java.lang.Runtime.getRuntime
();
//rar.exe
x
-t
-o+
-p-
E:/2.rar
E:/
Process p = rt.exec(
"C://Program Files//WinRAR//rar.exe x -t -o+ -p- "
+ outputRar +
" "
+ compression);
StringBuffer sb =
new
StringBuffer();
java.io.InputStream fis = p.getInputStream();
int
value = 0;
while
((value = fis.read()) != -1)
{
sb.append((
char
) value);
}
fis.close();
String result =
new
String(sb.toString().getBytes(
"ISO-8859-1"
),
"GBK"
);
System.
out
.println(result);
}
/**
*
@param
args
*/
public
static
void
main(String[] args) {
RarUtil test =
new
RarUtil();
String compress =
"f:/
增加转码过滤器
.rar"
;
// rar
压缩文件
String decompression =
"f:/test/"
;
//
解压路径
try
{
test.zip(
"f:/test.rar"
,
"
说明
.txt"
);
//test.unZip(compress, decompression);
}
catch
(Exception e) {
e.printStackTrace();
}
}
}
附件二:
package jcss.search.base;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.tools.bzip2.CBZip2InputStream;
import org.apache.tools.bzip2.CBZip2OutputStream;
/**
* @author
张华
* @time 2007-7-26
* @description BZip2
压缩,解压算法
*/
public class BZip2Util {
public static void Bzip2Compress(String in, String to) {
try {
File source = new File(in);
File destination = new File(to);
CBZip2OutputStream output = new CBZip2OutputStream(
new FileOutputStream(destination));
final FileInputStream input = new FileInputStream(source);
copy(input, output);
input.close();
output.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void Bzip2Uncompress(String in, String to) {
try {
File source = new File(in);
File destination = new File(to);
FileOutputStream output =new FileOutputStream(destination);
CBZip2InputStream input = new CBZip2InputStream( new FileInputStream(source));
copy( input, output );
input.close();
output.close();
} catch (Exception e) {
e.printStackTrace();
}
}
static void copy(final InputStream input, final OutputStream output)
throws IOException {
final byte[] buffer = new byte[8024];
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
}
}
/**
* @param args
*/
public static void main(String[] args) {
BZip2Util test = new BZip2Util();
String in = "f://~HlIndex.htm";
String to = "f://a.bz2";
String out2 = "b.htm";
//test.Bzip2Compress(in, to);
//test.Bzip2Uncompress(to, out2);
}
}
附件三:
package example;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import com.jcraft.jzlib.*;
/**
缺点:不能按目录压缩。
* @author
张华
* @time 2007-7-30
* @description reference http://tianxiagod.spaces.live.com/
* http://blog.csdn.net/kong555/archive/2006/03/28/641855.aspx
*/
public class TestJZlib {
//
压缩的文件长度,压缩、解压时均要用,挺关键。
//
要确保方法
compressfile
()与
uncompressfile
()参数一致
static int resLen = 0;
/**
*
压缩
*
* @param data
* @param type
*
压缩方法为一个整数
-1
为默认压缩比
9
为最高压缩比
0
为不压缩
1
为快速压缩
* @return
*/
public static byte[] compressfile(byte[] data, int type,int len) {
int err;
int comprLen = len;
byte[] compr = new byte[comprLen];
ZStream c_stream = new ZStream();
err = c_stream.deflateInit(type);
CHECK_ERR(c_stream, err, "deflateInit");
c_stream.next_in = data;
c_stream.next_in_index = 0;
c_stream.next_out = compr;
c_stream.next_out_index = 0;
while (c_stream.total_in != data.length
&& c_stream.total_out < comprLen) {
c_stream.avail_in = c_stream.avail_out = 1; //
置初值
err = c_stream.deflate(JZlib.Z_NO_FLUSH);
CHECK_ERR(c_stream, err, "deflate");
}
System.out.println("
压缩前
--" + c_stream.total_in + "
字节
");
while (true) {
c_stream.avail_out = 1;
err = c_stream.deflate(JZlib.Z_FINISH);
if (err == JZlib.Z_STREAM_END) {
break;
}
CHECK_ERR(c_stream, err, "deflate");
}
System.out.println("
压缩后
--" + c_stream.total_out + "
字节
");
err = c_stream.deflateEnd();
CHECK_ERR(c_stream, err, "deflateEnd");
byte[] zipfile = new byte[(int) c_stream.total_out];
System.arraycopy(compr, 0, zipfile, 0, zipfile.length);
return zipfile;
}
public static byte[] uncompressfile(byte[] data,int len) {
int err;
int uncomprLen = len;
byte[] uncompr = new byte[uncomprLen];
ZStream d_stream = new ZStream();
err = d_stream.inflateInit();
CHECK_ERR(d_stream, err, "inflateInit");
d_stream.next_in = data;
d_stream.next_in_index = 0;
d_stream.next_out = uncompr;
d_stream.next_out_index = 0;
while (d_stream.total_out < uncomprLen
&& d_stream.total_in < uncomprLen) {
d_stream.avail_in = d_stream.avail_out = 1;
err = d_stream.inflate(JZlib.Z_NO_FLUSH);
if (err == JZlib.Z_STREAM_END) {
break;
}
CHECK_ERR(d_stream, err, "inflate");
}
System.out.println("
解压缩前
--" + d_stream.total_in + "
字节
");
System.out.println("
解压缩后
--" + d_stream.total_out + "
字节
");
err = d_stream.inflateEnd();
CHECK_ERR(d_stream, err, "inflateEnd");
byte[] unzipfile = new byte[(int) d_stream.total_out];
System.arraycopy(uncompr, 0, unzipfile, 0, unzipfile.length);
return unzipfile;
}
static void CHECK_ERR(ZStream z, int err, String msg) {
if (err != JZlib.Z_OK) {
if (z.msg != null) {
System.out.print(z.msg + " ");
}
System.out.println(msg + " error: " + err);
System.exit(1);
}
}
static void zip(File input, File output, int compressFactor) {
if (!input.exists())
return;
if (!output.getParentFile().exists())
output.getParentFile().mkdir();
try {
FileInputStream in = new FileInputStream(input);
FileOutputStream out = new FileOutputStream(output);
resLen = in.available();
byte[] buff = new byte[resLen];
in.read(buff);
byte[] suBuf = compressfile(buff, compressFactor,resLen);
out.write(suBuf, 0, suBuf.length); //
写压缩文件
in.close();
out.close();
System.out.println("
压缩完毕!
" + input.getAbsolutePath());
} catch (Exception e) {
e.printStackTrace();
}
}
static void unZip(File input, File output) {
if (!input.exists())
return;
if (!output.getParentFile().exists())
output.getParentFile().mkdir();
try {
FileInputStream in = new FileInputStream(input);
FileOutputStream out = new FileOutputStream(output);
byte[] buff = new byte[resLen];
in.read(buff);
byte[] suBuff = uncompressfile(buff,resLen);
out.write(suBuff, 0, suBuff.length); //
写压缩文件
in.close();
out.close();
System.out.println("
解压完毕!
" + input.getAbsolutePath());
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* @param args
*/
public static void main(String[] args) {
TestJZlib test = new TestJZlib();
//
压缩
File input = new File("f://
搜索引擎原理系统与设计
.pdf");
File output = new File("f://test.bz2");
test.zip(input, output, 9);
//
解压
File output2 = new File("f://test.jpg");
test.unZip(output, output2);
}
}
附件四:
package jcss.search.base;
/*
调用
org.apache.tools.zip
实现压缩。
夜可以使用
java.util.zip
不过如果是中文的话,
解压缩的时候文件名字会是乱码。原因是解压缩软件的编码格式跟
java.util.zip.ZipInputStream
的编码字符集不同
java.util.zip.ZipInputStream
的字符集固定是
UTF-8
注销的部分是解压缩的代码。
*/
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.Date;
import java.util.zip.ZipInputStream;
import jcss.search.base.zip.CZipInputStream;
import org.apache.tools.zip.ZipOutputStream;
/*
* @
作者:张华
@
日期:
2006-5-14
@
说明:
*/
public class ZipUtil {
int count = 0;
static final int BUFFER = 2048;
public void zip(String zipFileName, String inputFile) throws Exception {
zip(zipFileName, new File(inputFile));
}
public void zip(String zipFileName, File inputFile) throws Exception {
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(
new String(zipFileName.getBytes("gb2312"))));
System.out.println("zip start");
zip(out, inputFile, "");
System.out.println("zip done");
out.close();
}
public void zip(ZipOutputStream out, File f, String base) throws Exception {
System.out.println("Zipping
" + f.getName());
Date beginDate = new Date();
if (f.isDirectory()) {
File[] fl = f.listFiles();
// out.putNextEntry(new ZipEntry(base + "/"));
out.putNextEntry(new org.apache.tools.zip.ZipEntry(base + "/"));
base = base.length() == 0 ? "" : base + "/";
for (int i = 0; i < fl.length; i++) {
zip(out, fl[i], base + fl[i].getName());
System.out.println(fl[i].getName());
// System.out.println(new
// String(fl[i].getName().getBytes("gb2312")));
}
} else {
// out.putNextEntry(new ZipEntry(base));
out.putNextEntry(new org.apache.tools.zip.ZipEntry(base));
System.out.println(base);
FileInputStream in = new FileInputStream(f);
int b;
while ((b = in.read()) != -1)
out.write(b);
in.close();
}
Date endDate = new Date();
long temp = beginDate.getTime() - endDate.getTime();
System.out.println("
共用时间:
" + temp);
}
private void createDirectory(String directory, String subDirectory) {
String dir[];
File fl = new File(directory);
try {
if (subDirectory == "" && fl.exists() != true)
fl.mkdir();
else if (subDirectory != "") {
dir = subDirectory.replace('//', '/').split("/");
for (int i = 0; i < dir.length; i++) {
File subFile = new File(directory + File.separator + dir[i]);
if (subFile.exists() == false)
subFile.mkdir();
directory += File.separator + dir[i];
}
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
*
使用
ZipFile
解压缩小
ZIP
*
*
类
ZipInputStream
读出
ZIP
文件序列(简单地说就是读出这个
ZIP
文件压缩了多少文件)
*
而类
ZipFile
使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出
ZIP
压缩文件序列。
* ZIPInputStream
和
ZipFile
之间另外一个基本的不同点在于高速缓冲的使用方面。
*
当文件使用
ZipInputStream
和
FileInputStream
流读出的时候,
ZIP
条目不使用高速缓冲。
*
然而,如果使用
ZipFile
(文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果
ZipFile
(文件名)
*
被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在
UNIX
系统下,
*
这是什么作用都没有的,因为使用
ZipFile
打开的所有
ZIP
文件都在内存中存在映射,
*
所以使用
ZipFile
的性能优于
ZipInputStream
。
*
然而,如果同一
ZIP
文件的内容在程序执行期间经常改变,或是重载的话,使用
ZipInputStream
就成为你的首选了。
* @param zipFileName
* @param outputDirectory
* @throws Exception
*/
public void unSmallZip(String zipFileName, String outputDirectory)
throws Exception {
try {
Date beginDate = new Date();
org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);
java.util.Enumeration e = zipFile.getEntries();
org.apache.tools.zip.ZipEntry zipEntry = null;
createDirectory(outputDirectory, "");
while (e.hasMoreElements()) {
zipEntry = (org.apache.tools.zip.ZipEntry) e.nextElement();
String name = null;
if (zipEntry.isDirectory()) {
name = zipEntry.getName();
name = name.substring(0, name.length() - 1);
File f = new File(outputDirectory + File.separator + name);
f.mkdir();
System.out.println("
创建目录:
" + outputDirectory
+ File.separator + name);
} else {
String fileName = zipEntry.getName();
fileName = fileName.replace('//', '/');
count++;
System.out.println("
正在解压第
" + count + "
个文件
: "
+ zipEntry.getName());
if (fileName.indexOf("/") != -1) {
createDirectory(outputDirectory, fileName.substring(0,
fileName.lastIndexOf("/")));
fileName = fileName.substring(
fileName.lastIndexOf("/") + 1, fileName
.length());
}
File f = new File(outputDirectory + File.separator
+ zipEntry.getName());
f.createNewFile();
InputStream in = zipFile.getInputStream(zipEntry);
FileOutputStream out = new FileOutputStream(f);
byte[] by = new byte[1024];
int c;
while ((c = in.read(by)) != -1) {
out.write(by, 0, c);
}
out.close();
in.close();
}
}
//
删除文件不能在这里删,因为文件正在使用,应在上传那处删
//
解压后,删除压缩文件
// File zipFileToDel = new File(zipFileName);
// zipFileToDel.delete();
// System.out.println("
正在删除文件:
"+ zipFileToDel.getCanonicalPath());
// //
删除解压后的那一层目录
// delALayerDir(zipFileName, outputDirectory);
Date endDate = new Date();
long temp = beginDate.getTime() - endDate.getTime();
System.out.println("
解压共用时间:
" + temp);
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
*
使用
ZipInputStream
解压大
ZIP(
通过修改
ZipInputStream
类让其支持中文文件名
)
*
*
类
ZipInputStream
读出
ZIP
文件序列(简单地说就是读出这个
ZIP
文件压缩了多少文件)
*
而类
ZipFile
使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出
ZIP
压缩文件序列。
* ZIPInputStream
和
ZipFile
之间另外一个基本的不同点在于高速缓冲的使用方面。
*
当文件使用
ZipInputStream
和
FileInputStream
流读出的时候,
ZIP
条目不使用高速缓冲。
*
然而,如果使用
ZipFile
(文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果
ZipFile
(文件名)
*
被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在
UNIX
系统下,
*
这是什么作用都没有的,因为使用
ZipFile
打开的所有
ZIP
文件都在内存中存在映射,
*
所以使用
ZipFile
的性能优于
ZipInputStream
。
*
然而,如果同一
ZIP
文件的内容在程序执行期间经常改变,或是重载的话,使用
ZipInputStream
就成为你的首选了。
* @param zipFileName
* @param outputDirectory
* @throws Exception
*/
public void unBigZip(String zipFileName, String outputDirectory)
throws Exception {
try {
Date beginDate = new Date();
//org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);
FileInputStream fis = new FileInputStream(zipFileName);
BufferedOutputStream dest = null;
//CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis));
CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis),"gb2312");
//org.apache.tools.zip.ZipEntry entry;
//java.util.zip.ZipEntry entry;
jcss.search.base.zip.ZipEntry entry;
while((entry =zin.getNextEntry()) != null) {
String name = null;
if (entry.isDirectory()) {
name = entry.getName();
name = name.substring(0, name.length() - 1);
File f = new File(outputDirectory + File.separator + name);
f.mkdir();
System.out.println("
创建目录:
" + outputDirectory + File.separator + name);
}else{
String fileName = entry.getName();
fileName = fileName.replace('//', '/');
count++;
System.out.println("
正在解压第
" + count + "
个文件
: " + entry.getName());
if (fileName.indexOf("/") != -1) {
createDirectory(outputDirectory, fileName.substring(0,fileName.lastIndexOf("/")));
fileName = fileName.substring(fileName.lastIndexOf("/") + 1, fileName.length());
}
File f = new File(outputDirectory + File.separator + entry.getName());
f.createNewFile();
//
InputStream in = zipFile.getInputStream(zipEntry);
//
FileOutputStream out = new FileOutputStream(f);
//
byte[] by = new byte[1024];
//
int c;
//
while ((c = in.read(by)) != -1) {
//
out.write(by, 0, c);
//
}
//
out.close();
//
in.close();
int cnt;
byte data[] = new byte[BUFFER];
FileOutputStream fos = new FileOutputStream(f);
dest = new BufferedOutputStream(fos, BUFFER);
while ((cnt = zin.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, cnt);
}
dest.flush();
dest.close();
}
}
zin.close();
//
删除文件不能在这里删,因为文件正在使用,应在上传那处删
//
解压后,删除压缩文件
// File zipFileToDel = new File(zipFileName);
// zipFileToDel.delete();
// System.out.println("
正在删除文件:
"+ zipFileToDel.getCanonicalPath());
// //
删除解压后的那一层目录
// delALayerDir(zipFileName, outputDirectory);
Date endDate = new Date();
long temp = endDate.getTime() - beginDate.getTime();
System.out.println("
解压共用时间:
" + temp);
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
*
删掉一层目录
*
* @param zipFileName
* @param outputDirectory
*/
public void delALayerDir(String zipFileName, String outputDirectory) {
String[] dir = zipFileName.replace('//', '/').split("/");
String fileFullName = dir[dir.length - 1]; //
得到
aa.zip
int pos = -1;
pos = fileFullName.indexOf(".");
String fileName = fileFullName.substring(0, pos); //
得到
aa
String sourceDir = outputDirectory + File.separator + fileName;
try {
copyFile(new File(outputDirectory), new File(sourceDir), new File(
sourceDir));
deleteSourceBaseDir(new File(sourceDir));
} catch (Exception e) {
e.printStackTrace();
}
}
/**
*
将
sourceDir
目录的文件全部
copy
到
destDir
中去
*/
public void copyFile(File destDir, File sourceBaseDir, File sourceDir)
throws Exception {
File[] lists = sourceDir.listFiles();
String line = null;
String url = null;
if (lists == null)
return;
for (int i = 0; i < lists.length; i++) {
File f = lists[i];
if (f.isFile()) {
FileInputStream fis = new FileInputStream(f);
String content = "";
String sourceBasePath = sourceBaseDir.getCanonicalPath();
String destPath = destDir.getCanonicalPath();
String fPath = f.getCanonicalPath();
String drPath = destDir
+ fPath.substring(fPath.indexOf(sourceBasePath)
+ sourceBasePath.length());
FileOutputStream fos = new FileOutputStream(drPath);
byte[] b = new byte[2048];
while (fis.read(b) != -1) {
if (content != null)
content += new String(b);
else
content = new String(b);
b = new byte[2048];
}
content = content.trim();
fis.close();
fos.write(content.getBytes());
fos.flush();
fos.close();
} else {
//
先新建目录
new File(destDir + File.separator + f.getName()).mkdir();
copyFile(destDir, sourceBaseDir, f); //
递归调用
}
}
}
/**
*
将
sourceDir
目录的文件全部
copy
到
destDir
中去
*/
public void deleteSourceBaseDir(File curFile) throws Exception {
File[] lists = curFile.listFiles();
String line = null;
String url = null;
File parentFile = null;
for (int i = 0; i < lists.length; i++) {
File f = lists[i];
if (f.isFile()) {
f.delete();
//
若它的父目录没有文件了,说明已经删完,应该删除父目录
parentFile = f.getParentFile();
if (parentFile.list().length == 0)
parentFile.delete();
} else {
deleteSourceBaseDir(f); //
递归调用
}
}
}
public static void main(String[] args) {
try {
ZipUtil t = new ZipUtil();
// t.zip("e://test.zip", "E://news.sina.com.cn//news.sina.com.cn");
Date beginDate = new Date();
//t.unZip("e://test.zip", "E://news.sina.com.cn");
t.unBigZip("e://test.zip", "E://news.sina.com.cn");
Date endDate = new Date();
long temp = endDate.getTime() - beginDate.getTime();
System.out.println("
共用时间:
" + temp);
} catch (Exception e) {
e.printStackTrace(System.out);
}
}
}
附件五:
/*
* @(#)ZipInputStream.java
1.37 04/06/11
*
* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
*/
package
jcss.search.base.zip;
import
java.io.InputStream;
import
java.io.IOException;
import
java.io.EOFException;
import
java.io.PushbackInputStream;
import
java.util.zip.CRC32;
import
java.util.zip.Inflater;
import
java.util.zip.InflaterInputStream;
import
java.util.zip.ZipException;
/**
*
*
*
@author
David
Connelly
*
@version
1.37,
06/11/04
*/
public
class
CZipInputStream
extends
InflaterInputStream
implements
ZipConstants {
private
String
encoding
=
"UTF-8"
;
private
ZipEntry
entry
;
private
CRC32
crc
=
new
CRC32();
private
long
remaining
;
private
byte
[]
tmpbuf
=
new
byte
[512];
private
static
final
int
STORED
= ZipEntry.
STORED
;
private
static
final
int
DEFLATED
= ZipEntry.
DEFLATED
;
private
boolean
closed
=
false
;
// this flag is set to true after EOF has reached for
// one entry
private
boolean
entryEOF
=
false
;
/**
*
Check
to
make
sure
that
this
stream
has
not
been
closed
*/
private
void
ensureOpen()
throws
IOException {
if
(
closed
) {
throw
new
IOException(
"Stream closed"
);
}
}
boolean
usesDefaultInflater
=
false
;
/**
*
Creates
a
new
ZIP
input
stream.
*
@param
in
the
actual
input
stream
*/
public
CZipInputStream(InputStream in) {
super
(
new
PushbackInputStream(in, 512),
new
Inflater(
true
), 512);
usesDefaultInflater
=
true
;
if
(in ==
null
) {
throw
new
NullPointerException(
"in is null"
);
}
}
public
CZipInputStream(InputStream in,String encoding) {
super
(
new
PushbackInputStream(in,512),
new
Inflater(
true
),512);
usesDefaultInflater
=
true
;
if
(in ==
null
) {
throw
new
NullPointerException(
"in is null"
);
}
this
.
encoding
=encoding;
}
/**
*
Reads
the
next
ZIP
file
entry
and
positions
the
stream
at
the
*
beginning
of
the
entry
data.
*
@return
the
next
ZIP
file
entry,
or
null
if
there
are
no
more
entries
*
@exception
ZipException
if
a
ZIP
file
error
has
occurred
*
@exception
IOException
if
an
I/O
error
has
occurred
*/
public
ZipEntry getNextEntry()
throws
IOException {
ensureOpen();
if
(
entry
!=
null
) {
closeEntry();
}
crc
.reset();
inf
.reset();
if
((
entry
= readLOC()) ==
null
) {
return
null
;
}
if
(
entry
.
method
==
STORED
) {
remaining
=
entry
.
size
;
}
entryEOF
=
false
;
return
entry
;
}
/**
*
Closes
the
current
ZIP
entry
and
positions
the
stream
for
reading
the
*
next
entry.
*
@exception
ZipException
if
a
ZIP
file
error
has
occurred
*
@exception
IOException
if
an
I/O
error
has
occurred
*/
public
void
closeEntry()
throws
IOException {
ensureOpen();
while
(read(
tmpbuf
, 0,
tmpbuf
.
length
) != -1) ;
entryEOF
=
true
;
}
/**
*
Returns
0
after
EOF
has
reached
for
the
current
entry
data,
*
otherwise
always
return
1.
*
<p>
*
Programs
should
not
count
on
this
method
to
return
the
actual
number
*
of
bytes
that
could
be
read
without
blocking.
*
*
@return
1
before
EOF
and
0
after
EOF
has
reached
for
current
entry.
*
@exception
IOException
if
an
I/O
error
occurs.
*
*/
public
int
available()
throws
IOException {
ensureOpen();
if
(
entryEOF
) {
return
0;
}
else
{
return
1;
}
}
/**
*
Reads
from
the
current
ZIP
entry
into
an
array
of
bytes.
Blocks
until
*
some
input
is
available.
*
@param
b
the
buffer
into
which
the
data
is
read
*
@param
off
the
start
offset
of
the
data
*
@param
len
the
maximum
number
of
bytes
read
*
@return
the
actual
number
of
bytes
read,
or
-
1
if
the
end
of
the
*
entry
is
reached
*
@exception
ZipException
if
a
ZIP
file
error
has
occurred
*
@exception
IOException
if
an
I/O
error
has
occurred
*/
public
int
read(
byte
[] b,
int
off,
int
len)
throws
IOException {
ensureOpen();
if
(off < 0 || len < 0 || off > b.
length
- len) {
throw
new
IndexOutOfBoundsException();
}
else
if
(len == 0) {
return
0;
}
if
(
entry
==
null
) {
return
-1;
}
switch
(
entry
.
method
) {
case
DEFLATED
:
len =
super
.read(b, off, len);
if
(len == -1) {
readEnd(
entry
);
entryEOF
=
true
;
entry
=
null
;
}
else
{
crc
.update(b, off, len);
}
return
len;
case
STORED
:
if
(
remaining
<= 0) {
entryEOF
=
true
;
entry
=
null
;
return
-1;
}
if
(len >
remaining
) {
len = (
int
)
remaining
;
}
len =
in
.read(b, off, len);
if
(len == -1) {
throw
new
ZipException(
"unexpected EOF"
);
}
crc
.update(b, off, len);
remaining
-= len;
return
len;
default
:
throw
new
InternalError(
"invalid compression method"
);
}
}
/**
*
Skips
specified
number
of
bytes
in
the
current
ZIP
entry.
*
@param
n
the
number
of
bytes
to
skip
*
@return
the
actual
number
of
bytes
skipped
*
@exception
ZipException
if
a
ZIP
file
error
has
occurred
*
@exception
IOException
if
an
I/O
error
has
occurred
*
@exception
IllegalArgumentException
if
n
< 0
*/
public
long
skip(
long
n)
throws
IOException {
if
(n < 0) {
throw
new
IllegalArgumentException(
"negative skip length"
);
}
ensureOpen();
int
max = (
int
)Math.min
(n, Integer.
MAX_VALUE
);
int
total = 0;
while
(total < max) {
int
len = max - total;
if
(len >
tmpbuf
.
length
) {
len =
tmpbuf
.
length
;
}
len = read(
tmpbuf
, 0, len);
if
(len == -1) {
entryEOF
=
true
;
break
;
}
total += len;
}
return
total;
}
/**
*
Closes
this
input
stream
and
releases
any
system
resources
associated
*
with
the
stream.
*
@exception
IOException
if
an
I/O
error
has
occurred
*/
public
void
close()
throws
IOException {
if
(!
closed
) {
super
.close();
closed
=
true
;
}
}
private
byte
[]
b
=
new
byte
[256];
/*
* Reads local file (LOC) header for next entry.
*/
private
ZipEntry readLOC()
throws
IOException {
try
{
readFully(
tmpbuf
, 0,
LOCHDR
);
}
catch
(EOFException e) {
return
null
;
}
if
(get32
(
tmpbuf
, 0) !=
LOCSIG
) {
return
null
;
}
// get the entry name and create the ZipEntry first
int
len = get16
(
tmpbuf
,
LOCNAM
);
if
(len == 0) {
throw
new
ZipException(
"missing entry name"
);
}
int
blen =
b
.
length
;
if
(len > blen) {
do
blen = blen * 2;
while
(len > blen);
b
=
new
byte
[blen];
}
readFully(
b
, 0, len);
//ZipEntry e = createZipEntry(getUTF8String(b, 0, len));
ZipEntry e=
null
;
try
{
if
(
this
.
encoding
.toUpperCase().equals(
"UTF-8"
))
e=createZipEntry(getUTF8String
(
b
, 0, len));
else
e=createZipEntry(
new
String(
b
,0,len,
this
.
encoding
));
}
catch
(Exception byteE)
{
e=createZipEntry(getUTF8String
(
b
, 0, len));
}
// now get the remaining fields for the entry
e.
version
= get16
(
tmpbuf
,
LOCVER
);
e.
flag
= get16
(
tmpbuf
,
LOCFLG
);
if
((e.
flag
& 1) == 1) {
throw
new
ZipException(
"encrypted ZIP entry not supported"
);
}
e.
method
= get16
(
tmpbuf
,
LOCHOW
);
e.
time
= get32
(
tmpbuf
,
LOCTIM
);
if
((e.
flag
& 8) == 8) {
/* EXT descriptor present */
if
(e.
method
!=
DEFLATED
) {
throw
new
ZipException(
"only DEFLATED entries can have EXT descriptor"
);
}
}
else
{
e.
crc
= get32
(
tmpbuf
,
LOCCRC
);
e.
csize
= get32
(
tmpbuf
,
LOCSIZ
);
e.
size
= get32
(
tmpbuf
,
LOCLEN
);
}
len = get16
(
tmpbuf
,
LOCEXT
);
if
(len > 0) {
byte
[] bb =
new
byte
[len];
readFully(bb, 0, len);
e.
extra
= bb;
}
return
e;
}
/*
* Fetches a UTF8-encoded String from the specified byte array.
*/
private
static
String getUTF8String(
byte
[] b,
int
off,
int
len) {
// First, count the number of characters in the sequence
int
count = 0;
int
max = off + len;
int
i = off;
while
(i < max) {
int
c = b[i++] & 0xff;
switch
(c >> 4) {
case
0:
case
1:
case
2:
case
3:
case
4:
case
5:
case
6:
case
7:
// 0xxxxxxx
count++;
break
;
case
12:
case
13:
// 110xxxxx 10xxxxxx
if
((
int
)(b[i++] & 0xc0) != 0x80) {
throw
new
IllegalArgumentException();
}
count++;
break
;
case
14:
// 1110xxxx 10xxxxxx 10xxxxxx
if
(((
int
)(b[i++] & 0xc0) != 0x80) ||
((
int
)(b[i++] & 0xc0) != 0x80)) {
throw
new
IllegalArgumentException();
}
count++;
break
;
default
:
// 10xxxxxx, 1111xxxx
throw
new
IllegalArgumentException();
}
}
if
(i != max) {
throw
new
IllegalArgumentException();
}
// Now decode the characters...
char
[] cs =
new
char
[count];
i = 0;
while
(off < max) {
int
c = b[off++] & 0xff;
switch
(c >> 4) {
case
0:
case
1:
case
2:
case
3:
case
4:
case
5:
case
6:
case
7:
// 0xxxxxxx
cs[i++] = (
char
)c;
break
;
case
12:
case
13:
// 110xxxxx 10xxxxxx
cs[i++] = (
char
)(((c & 0x1f) << 6) | (b[off++] & 0x3f));
break
;
case
14:
// 1110xxxx 10xxxxxx 10xxxxxx
int
t = (b[off++] & 0x3f) << 6;
cs[i++] = (
char
)(((c & 0x0f) << 12) | t | (b[off++] & 0x3f));
break
;
default
:
// 10xxxxxx, 1111xxxx
throw
new
IllegalArgumentException();
}
}
return
new
String(cs, 0, count);
}
/**
*
Creates
a
new
<code>
ZipEntry
</code>
object
for
the
specified
*
entry
name.
*
*
@param
name
the
ZIP
file
entry
name
*
@return
the
ZipEntry
just
created
*/
protected
ZipEntry createZipEntry(String name) {
return
new
ZipEntry(name);
}
/*
* Reads end of deflated entry as well as EXT descriptor if present.
*/
private
void
readEnd(ZipEntry e)
throws
IOException {
int
n =
inf
.getRemaining();
if
(n > 0) {
((PushbackInputStream)
in
).unread(
buf
,
len
- n, n);
}
if
((e.
flag
& 8) == 8) {
/* EXT descriptor present */
readFully(
tmpbuf
, 0,
EXTHDR
);
long
sig = get32
(
tmpbuf
, 0);
if
(sig !=
EXTSIG
) {
// no EXTSIG present
e.
crc
= sig;
e.
csize
= get32
(
tmpbuf
,
EXTSIZ
-
EXTCRC
);
e.
size
= get32
(
tmpbuf
,
EXTLEN
-
EXTCRC
);
((PushbackInputStream)
in
).unread(
tmpbuf
,
EXTHDR
-
EXTCRC
- 1,
EXTCRC
);
}
else
{
e.
crc
= get32
(
tmpbuf
,
EXTCRC
);
e.
csize
= get32
(
tmpbuf
,
EXTSIZ
);
e.
size
= get32
(
tmpbuf
,
EXTLEN
);
}
}
if
(e.
size
!=
inf
.getBytesWritten()) {
throw
new
ZipException(
"invalid entry size (expected "
+ e.
size
+
" but got "
+
inf
.getBytesWritten() +
" bytes)"
);
}
if
(e.
csize
!=
inf
.getBytesRead()) {
throw
new
ZipException(
"invalid entry compressed size (expected "
+ e.
csize
+
" but got "
+
inf
.getBytesRead() +
" bytes)"
);
}
if
(e.
crc
!=
crc
.getValue()) {
throw
new
ZipException(
"invalid entry CRC (expected 0x"
+ Long.toHexString
(e.
crc
) +
" but got 0x"
+ Long.toHexString
(
crc
.getValue()) +
")"
);
}
}
/*
* Reads bytes, blocking until all bytes are read.
*/
private
void
readFully(
byte
[] b,
int
off,
int
len)
throws
IOException {
while
(len > 0) {
int
n =
in
.read(b, off, len);
if
(n == -1) {
throw
new
EOFException();
}
off += n;
len -= n;
}
}
/*
* Fetches unsigned 16-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
*/
private
static
final
int
get16(
byte
b[],
int
off) {
return
(b[off] & 0xff) | ((b[off+1] & 0xff) << 8);
}
/*
* Fetches unsigned 32-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
*/
private
static
final
long
get32(
byte
b[],
int
off) {
return
get16
(b, off) | ((
long
)get16
(b, off+2) << 16);
}
}
附件六:
package
jcss.search.base.zip;
/**
*
@author
张华
*
@time
2007
-
8
-
3
*
@description
**/
public
class
ZipEntry
extends
org.apache.tools.zip.ZipEntry {
String
name
;
// entry name
long
time
= -1;
// modification time (in DOS time)
long
crc
= -1;
// crc-32 of entry data
long
size
= -1;
// uncompressed size of entry data
long
csize
= -1;
// compressed size of entry data
int
method
= -1;
// compression method
byte
[]
extra
;
// optional extra field data for entry
String
comment
;
// optional comment string for entry
// The following flags are used only by Zip{Input,Output}Stream
int
flag
;
// bit flags
int
version
;
// version needed to extract
long
offset
;
// offset of loc header
/**
*
Compression
method
for
uncompressed
entries.
*/
public
static
final
int
STORED
= 0;
/**
*
Compression
method
for
compressed
(deflated)
entries.
*/
public
static
final
int
DEFLATED
= 8;
//
下面这句一定要注释掉
//
static {
//
/* Zip library is loaded from System.initializeSystemClass */
//
initIDs();
//
}
//
private static native void initIDs();
public
ZipEntry(String name){
super
(name);
}
}
相关文章推荐
- 是否需要对网络传输数据进行压缩?如何选择压缩算法?
- 文本压缩算法的对比和选择
- Hadoop压缩算法说明与选择
- 是否须要对网络数据传输进行压缩?怎样选择压缩算法?
- 是否需要对网络传输数据进行压缩?如何选择压缩算法?
- 有怪兽,有怪兽 - 通过MONSTER OF COMPRESSION选择压缩算法
- 文本压缩算法的对比和选择
- 压缩感知重构算法之正交匹配追踪(OMP)
- 一个开源的页面传输压缩算法
- 特征选择算法之开方检验(转载)
- H.264与MPEG4两种压缩算法的比较
- js移动端/H5同时选择多张图片上传并使用canvas压缩图片
- zlib库剖析(5):LZ77压缩算法
- 【算法导论】 2.2选择排序
- 算法之活动选择问题
- CNN 模型压缩与加速算法综述
- Huffman 编码压缩算法
- 稀疏矩阵的压缩存储及转置算法
- 【数据结构】稀疏矩阵的压缩存储和转置算法(C++代码)
- 2018年全国多校算法寒假训练营练习比赛(第二场) F 德玛西亚万岁(状态压缩DP 未解决)