您的位置:首页 > 其它

选择压缩算法的经历 (by quqi99)

2007-08-03 14:11 211 查看
选择压缩算法的经历 (by quqi99)




作者:张华 发表于:2007-08-03 ( http://blog.csdn.net/quqi99 )

版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明。





最近,由于蜘蛛下载下来的文件需要压缩,本人趁此机会学习了解了一系列的压缩算法。一要考虑压缩比,二要比较速度,三要考虑追加,删除,查询(即在不解压的情况下知道压缩包里压缩的是什么东西,能方便的提取出元数据信息)是否方便。

刚开始,主要是比较
zlib, bzip2, gzip, rar, zip

几种算法的压缩性能比。

对于
rar

格式,由于主要是调用命令进行解压缩(代码见附件一)。它是跑在虚拟机之外的,一旦出现了错误,可能整个虚拟机因此会死掉,所以这种方法不予考虑。

网上说
bzip2

对于文本压缩的效率算是最高的,
ant.jar

包的
org.apache.tools.bzip2

提供了相应的
API

。但是使用时总不顺利,也就没花时间继续了。相关代码见附件二(未测试成功)。

于是开始学习
zlib.

它的
J***A

版本叫
jzlib,


jzlib

进行解压缩的代码见附件三。觉得这个还不错。于是,准备用它,但是压缩一个文件还行,但用
java.util.zip

包那样压缩目录确挺不方便的。现在才开始恍然大悟。哦,原来这些压缩算法一般只注重算法本身,至于怎么用着方便如支持按条目压缩则是外围应用要管的事情。

于是,开始考虑怎么吸收
java.util.zip

包里的思想在
zlib

算法的基础上包装能按目录压缩。搞到最后,发现原来
java.util.zip

包的底层用的压缩算法就是用的是
zlib. SUN

公司只不过是在核心算法的基础上加上了一些如校验(
CRC32, Adler32

)及按目录压缩(
ZipEntry

)以及方便访问的输入输出流(
ZipInputStream


ZipOutputStream

)。

既然
java.util.zip

包里用的就是
zlib

,我们就不需要再考虑怎样按目录进行压缩了,但事情进展也并不是一帆风顺。

首先,直接用
java.util.zip


API

编出的解压缩不能支持中文文件名,因为
java

对于文字的编码是以
unicode

为基础,因此,若是以
ZipInputStream


ZipOutputStream

来处理压缩及解压缩的工作,碰到中文档名或路径,它就不处理。仔细查看了
ZipInputStream


API

,发现问题就出现在
java.uti.zip.ZipInputStream

类中的这一句:
ZipEntry e = createZipEntry(getUTF8String(b, 0, len));

它应该被改成:

ZipEntry e=null;

try

{

if (this.encoding.toUpperCase().equals("UTF-8"))

e=createZipEntry(getUTF8String(b, 0, len));

else

e=createZipEntry(new String(b,0,len,this.encoding));

}

catch(Exception byteE)

{

e=createZipEntry(getUTF8String(b, 0, len));

}



幸好,在网上一搜,发现这个改动不需要由我们自己来做,因为
ant


org.apache.tools.zip

包中已经为我们改好了。用这个包编写的解压缩代码见附件四
.

接着又发现了问题。解压文件时有两种方式,一是采用
ZipFile,

二是采用
ZipOutputStream


ZipFile

一次性将
zip

文件全部读到内存中去,对于大
zip

就不行了,这时得采用
ZipOutputStream

方式,但是
org.apache.tools.zip

包对
ZipOutputStream

类恰好没进行改定,只仅仅提供了改写后的
ZipFile

。当你用
java.util.zip.ZipOutputStream

时同样对于中文文件名的文件不能进行压缩。

这时候在网上找到了文件



ZipOutputStream




ZipInputStream


支持中文》(可在

google


搜)。它的方法是直接改

JDK


的源代码。但是我觉得直接改

JDK




JAR


包以后软件部署时比较麻烦,为些,我开始寻找另外的解决办法。



为了不改动

java.util.zip.



ZipInputStream,
自己就直接将这个类再重写一遍,首先通过复制粘贴写一个与之内容一模一样的类
jcss.search.base.zip.C


ZipInputStream
。然后在这个类中将
ZipEntry e = createZipEntry(getUTF8String(b, 0, len))

改写成上述的代码。此类见附件五。

另外,将复制出与
java.uti.zip.ZipConstants

内容一模一样的类
jcss.search.base.zip.

ZipConstants

另外,再实现一个
jcss.search.base.zip.ZipEntry


类,代码见附件六

.



至此,

OK








若想进一步提高压缩比的话,可以采用

7zip,


并且目前也有专门版本的

7zip SDK


(实现了

LZMA


压缩算法

.


另外,也有热心人士为方便访问在此基础上增加了两件输入输出流类(


net.contrapunctus.lzma.LzmaInputStream

net.contrapunctus.lzma.LzmaOutputStream


,但是没有包装按目录进行压缩相关的条目类。







附件一:

package

jcss.search.base;



/**



*

@author


张华



*

@time


2007
-
8
-
1



*

@description




*/

public


class

RarUtil {





/**




*

解压




*





*

@param


compress




*


rar
压缩文件




*

@param


decompression




*


解压路径




*/



public


void

unZip(String compress, String decompression)
throws

Exception {


java.lang.Runtime rt = java.lang.Runtime.getRuntime
();


Process p = rt.exec(
"C://Program Files//WinRAR//UNRAR.EXE x -o+ -p- "
+ compress +
" "
+ decompression);


StringBuffer sb =
new

StringBuffer();


java.io.InputStream fis = p.getInputStream();



int

value = 0;



while

((value = fis.read()) != -1)


{


sb.append((
char

) value);


}


fis.close();


String result =
new

String(sb.toString().getBytes(
"ISO-8859-1"
),
"GBK"
);


System.
out

.println(result);


}





/**




*





*

@param


outputRar


输出目录




*

@param


compression

要压缩的文件或目录




*

@throws


Exception




*/



public


void

zip(String outputRar, String compression)
throws

Exception {


java.lang.Runtime rt = java.lang.Runtime.getRuntime
();



//rar.exe
x
-t
-o+
-p-
E:/2.rar
E:/


Process p = rt.exec(
"C://Program Files//WinRAR//rar.exe x -t -o+ -p- "
+ outputRar +
" "
+ compression);


StringBuffer sb =
new

StringBuffer();


java.io.InputStream fis = p.getInputStream();



int

value = 0;



while

((value = fis.read()) != -1)


{


sb.append((
char

) value);


}


fis.close();


String result =
new

String(sb.toString().getBytes(
"ISO-8859-1"
),
"GBK"
);


System.
out

.println(result);


}





/**




*

@param


args




*/



public


static


void

main(String[] args) {




RarUtil test =
new

RarUtil();




String compress =
"f:/
增加转码过滤器
.rar"
;
// rar
压缩文件


String decompression =
"f:/test/"
;
//
解压路径





try

{


test.zip(
"f:/test.rar"
,
"
说明
.txt"
);



//test.unZip(compress, decompression);


}
catch

(Exception e) {


e.printStackTrace();


}


}



}





附件二:

package jcss.search.base;



import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.IOException;

import java.io.InputStream;

import java.io.OutputStream;



import org.apache.tools.bzip2.CBZip2InputStream;

import org.apache.tools.bzip2.CBZip2OutputStream;



/**


* @author

张华


* @time 2007-7-26


* @description BZip2

压缩,解压算法


*/

public class BZip2Util {




public static void Bzip2Compress(String in, String to) {


try {


File source = new File(in);


File destination = new File(to);


CBZip2OutputStream output = new CBZip2OutputStream(


new FileOutputStream(destination));


final FileInputStream input = new FileInputStream(source);


copy(input, output);


input.close();


output.close();


} catch (Exception e) {


e.printStackTrace();


}


}




public static void Bzip2Uncompress(String in, String to) {


try {


File source = new File(in);



File destination = new File(to);



FileOutputStream output =new FileOutputStream(destination);



CBZip2InputStream input = new CBZip2InputStream( new FileInputStream(source));



copy( input, output );



input.close();



output.close();


} catch (Exception e) {


e.printStackTrace();


}


}






static void copy(final InputStream input, final OutputStream output)


throws IOException {


final byte[] buffer = new byte[8024];


int n = 0;


while (-1 != (n = input.read(buffer))) {


output.write(buffer, 0, n);


}


}




/**



* @param args



*/


public static void main(String[] args) {




BZip2Util test = new BZip2Util();


String in = "f://~HlIndex.htm";


String to = "f://a.bz2";


String out2 = "b.htm";


//test.Bzip2Compress(in, to);




//test.Bzip2Uncompress(to, out2);




}



}



附件三:

package example;



import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;



import com.jcraft.jzlib.*;



/**

缺点:不能按目录压缩。


* @author

张华


* @time 2007-7-30


* @description reference http://tianxiagod.spaces.live.com/

* http://blog.csdn.net/kong555/archive/2006/03/28/641855.aspx

*/

public class TestJZlib {




//

压缩的文件长度,压缩、解压时均要用,挺关键。


//

要确保方法
compressfile

()与
uncompressfile

()参数一致


static int resLen = 0;




/**



*

压缩



*



* @param data



* @param type



*

压缩方法为一个整数
-1

为默认压缩比
9

为最高压缩比
0

为不压缩
1

为快速压缩



* @return



*/


public static byte[] compressfile(byte[] data, int type,int len) {




int err;


int comprLen = len;


byte[] compr = new byte[comprLen];


ZStream c_stream = new ZStream();


err = c_stream.deflateInit(type);


CHECK_ERR(c_stream, err, "deflateInit");





c_stream.next_in = data;


c_stream.next_in_index = 0;




c_stream.next_out = compr;


c_stream.next_out_index = 0;




while (c_stream.total_in != data.length


&& c_stream.total_out < comprLen) {


c_stream.avail_in = c_stream.avail_out = 1; //

置初值


err = c_stream.deflate(JZlib.Z_NO_FLUSH);


CHECK_ERR(c_stream, err, "deflate");


}


System.out.println("

压缩前
--" + c_stream.total_in + "

字节
");




while (true) {


c_stream.avail_out = 1;


err = c_stream.deflate(JZlib.Z_FINISH);


if (err == JZlib.Z_STREAM_END) {


break;


}


CHECK_ERR(c_stream, err, "deflate");


}


System.out.println("

压缩后
--" + c_stream.total_out + "

字节
");




err = c_stream.deflateEnd();


CHECK_ERR(c_stream, err, "deflateEnd");




byte[] zipfile = new byte[(int) c_stream.total_out];


System.arraycopy(compr, 0, zipfile, 0, zipfile.length);


return zipfile;


}




public static byte[] uncompressfile(byte[] data,int len) {


int err;


int uncomprLen = len;


byte[] uncompr = new byte[uncomprLen];


ZStream d_stream = new ZStream();




err = d_stream.inflateInit();


CHECK_ERR(d_stream, err, "inflateInit");




d_stream.next_in = data;


d_stream.next_in_index = 0;


d_stream.next_out = uncompr;


d_stream.next_out_index = 0;




while (d_stream.total_out < uncomprLen


&& d_stream.total_in < uncomprLen) {


d_stream.avail_in = d_stream.avail_out = 1;


err = d_stream.inflate(JZlib.Z_NO_FLUSH);


if (err == JZlib.Z_STREAM_END) {


break;


}


CHECK_ERR(d_stream, err, "inflate");


}


System.out.println("

解压缩前
--" + d_stream.total_in + "

字节
");


System.out.println("

解压缩后
--" + d_stream.total_out + "

字节
");




err = d_stream.inflateEnd();


CHECK_ERR(d_stream, err, "inflateEnd");




byte[] unzipfile = new byte[(int) d_stream.total_out];


System.arraycopy(uncompr, 0, unzipfile, 0, unzipfile.length);


return unzipfile;


}




static void CHECK_ERR(ZStream z, int err, String msg) {


if (err != JZlib.Z_OK) {


if (z.msg != null) {


System.out.print(z.msg + " ");


}


System.out.println(msg + " error: " + err);


System.exit(1);


}


}














static void zip(File input, File output, int compressFactor) {


if (!input.exists())


return;


if (!output.getParentFile().exists())


output.getParentFile().mkdir();




try {


FileInputStream in = new FileInputStream(input);


FileOutputStream out = new FileOutputStream(output);


resLen = in.available();


byte[] buff = new byte[resLen];


in.read(buff);




byte[] suBuf = compressfile(buff, compressFactor,resLen);


out.write(suBuf, 0, suBuf.length); //

写压缩文件




in.close();


out.close();


System.out.println("

压缩完毕!
" + input.getAbsolutePath());


} catch (Exception e) {


e.printStackTrace();


}


}




static void unZip(File input, File output) {


if (!input.exists())


return;


if (!output.getParentFile().exists())


output.getParentFile().mkdir();




try {


FileInputStream in = new FileInputStream(input);


FileOutputStream out = new FileOutputStream(output);




byte[] buff = new byte[resLen];


in.read(buff);




byte[] suBuff = uncompressfile(buff,resLen);


out.write(suBuff, 0, suBuff.length); //

写压缩文件




in.close();


out.close();


System.out.println("

解压完毕!
" + input.getAbsolutePath());


} catch (Exception e) {


e.printStackTrace();


}


}










/**



* @param args



*/


public static void main(String[] args) {




TestJZlib test = new TestJZlib();




//

压缩


File input = new File("f://

搜索引擎原理系统与设计
.pdf");


File output = new File("f://test.bz2");




test.zip(input, output, 9);




//

解压


File output2 = new File("f://test.jpg");


test.unZip(output, output2);








}



}



附件四:

package jcss.search.base;



/*



调用
org.apache.tools.zip

实现压缩。



夜可以使用
java.util.zip

不过如果是中文的话,



解压缩的时候文件名字会是乱码。原因是解压缩软件的编码格式跟


java.util.zip.ZipInputStream

的编码字符集不同


java.util.zip.ZipInputStream

的字符集固定是
UTF-8





注销的部分是解压缩的代码。


*/

import java.io.BufferedInputStream;

import java.io.BufferedOutputStream;

import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.InputStream;

import java.util.Date;

import java.util.zip.ZipInputStream;



import jcss.search.base.zip.CZipInputStream;



import org.apache.tools.zip.ZipOutputStream;



/*


* @

作者:张华
@

日期:
2006-5-14
@

说明:


*/

public class ZipUtil {




int count = 0;


static final int BUFFER = 2048;




public void zip(String zipFileName, String inputFile) throws Exception {


zip(zipFileName, new File(inputFile));


}




public void zip(String zipFileName, File inputFile) throws Exception {


ZipOutputStream out = new ZipOutputStream(new FileOutputStream(


new String(zipFileName.getBytes("gb2312"))));


System.out.println("zip start");


zip(out, inputFile, "");


System.out.println("zip done");


out.close();


}




public void zip(ZipOutputStream out, File f, String base) throws Exception {




System.out.println("Zipping
" + f.getName());


Date beginDate = new Date();


if (f.isDirectory()) {


File[] fl = f.listFiles();


// out.putNextEntry(new ZipEntry(base + "/"));


out.putNextEntry(new org.apache.tools.zip.ZipEntry(base + "/"));


base = base.length() == 0 ? "" : base + "/";


for (int i = 0; i < fl.length; i++) {


zip(out, fl[i], base + fl[i].getName());


System.out.println(fl[i].getName());


// System.out.println(new


// String(fl[i].getName().getBytes("gb2312")));


}


} else {


// out.putNextEntry(new ZipEntry(base));


out.putNextEntry(new org.apache.tools.zip.ZipEntry(base));


System.out.println(base);


FileInputStream in = new FileInputStream(f);


int b;




while ((b = in.read()) != -1)


out.write(b);


in.close();


}




Date endDate = new Date();


long temp = beginDate.getTime() - endDate.getTime();


System.out.println("

共用时间:
" + temp);


}




private void createDirectory(String directory, String subDirectory) {


String dir[];


File fl = new File(directory);


try {


if (subDirectory == "" && fl.exists() != true)


fl.mkdir();


else if (subDirectory != "") {


dir = subDirectory.replace('//', '/').split("/");


for (int i = 0; i < dir.length; i++) {


File subFile = new File(directory + File.separator + dir[i]);


if (subFile.exists() == false)


subFile.mkdir();


directory += File.separator + dir[i];


}


}


} catch (Exception ex) {


System.out.println(ex.getMessage());


}


}




/**



*

使用
ZipFile

解压缩小
ZIP



*

*


ZipInputStream

读出
ZIP

文件序列(简单地说就是读出这个
ZIP

文件压缩了多少文件)



*

而类
ZipFile

使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出
ZIP

压缩文件序列。


* ZIPInputStream


ZipFile

之间另外一个基本的不同点在于高速缓冲的使用方面。


*

当文件使用
ZipInputStream


FileInputStream

流读出的时候,
ZIP

条目不使用高速缓冲。


*

然而,如果使用
ZipFile

(文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果
ZipFile

(文件名)


*

被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在
UNIX

系统下,


*

这是什么作用都没有的,因为使用
ZipFile

打开的所有
ZIP

文件都在内存中存在映射,


*

所以使用
ZipFile

的性能优于
ZipInputStream





*

然而,如果同一
ZIP

文件的内容在程序执行期间经常改变,或是重载的话,使用
ZipInputStream

就成为你的首选了。



* @param zipFileName



* @param outputDirectory



* @throws Exception



*/


public void unSmallZip(String zipFileName, String outputDirectory)


throws Exception {


try {


Date beginDate = new Date();


org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);


java.util.Enumeration e = zipFile.getEntries();


org.apache.tools.zip.ZipEntry zipEntry = null;


createDirectory(outputDirectory, "");


while (e.hasMoreElements()) {


zipEntry = (org.apache.tools.zip.ZipEntry) e.nextElement();


String name = null;


if (zipEntry.isDirectory()) {


name = zipEntry.getName();


name = name.substring(0, name.length() - 1);


File f = new File(outputDirectory + File.separator + name);


f.mkdir();


System.out.println("

创建目录:
" + outputDirectory


+ File.separator + name);


} else {


String fileName = zipEntry.getName();


fileName = fileName.replace('//', '/');


count++;



System.out.println("

正在解压第
" + count + "

个文件
: "


+ zipEntry.getName());


if (fileName.indexOf("/") != -1) {


createDirectory(outputDirectory, fileName.substring(0,


fileName.lastIndexOf("/")));


fileName = fileName.substring(



fileName.lastIndexOf("/") + 1, fileName


.length());


}




File f = new File(outputDirectory + File.separator


+ zipEntry.getName());




f.createNewFile();


InputStream in = zipFile.getInputStream(zipEntry);


FileOutputStream out = new FileOutputStream(f);




byte[] by = new byte[1024];


int c;


while ((c = in.read(by)) != -1) {


out.write(by, 0, c);


}


out.close();


in.close();


}




}




//

删除文件不能在这里删,因为文件正在使用,应在上传那处删


//

解压后,删除压缩文件


// File zipFileToDel = new File(zipFileName);


// zipFileToDel.delete();


// System.out.println("

正在删除文件:
"+ zipFileToDel.getCanonicalPath());




// //

删除解压后的那一层目录


// delALayerDir(zipFileName, outputDirectory);




Date endDate = new Date();


long temp = beginDate.getTime() - endDate.getTime();


System.out.println("

解压共用时间:
" + temp);




} catch (Exception ex) {


System.out.println(ex.getMessage());


}




}




/**



*

使用
ZipInputStream

解压大
ZIP(

通过修改
ZipInputStream

类让其支持中文文件名
)



*



*


ZipInputStream

读出
ZIP

文件序列(简单地说就是读出这个
ZIP

文件压缩了多少文件)



*

而类
ZipFile

使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出
ZIP

压缩文件序列。


* ZIPInputStream


ZipFile

之间另外一个基本的不同点在于高速缓冲的使用方面。


*

当文件使用
ZipInputStream


FileInputStream

流读出的时候,
ZIP

条目不使用高速缓冲。


*

然而,如果使用
ZipFile

(文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果
ZipFile

(文件名)


*

被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在
UNIX

系统下,


*

这是什么作用都没有的,因为使用
ZipFile

打开的所有
ZIP

文件都在内存中存在映射,


*

所以使用
ZipFile

的性能优于
ZipInputStream




*

然而,如果同一
ZIP

文件的内容在程序执行期间经常改变,或是重载的话,使用
ZipInputStream

就成为你的首选了。



* @param zipFileName



* @param outputDirectory



* @throws Exception



*/


public void unBigZip(String zipFileName, String outputDirectory)


throws Exception {


try {


Date beginDate = new Date();


//org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);


FileInputStream fis = new FileInputStream(zipFileName);


BufferedOutputStream dest = null;


//CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis));


CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis),"gb2312");




//org.apache.tools.zip.ZipEntry entry;


//java.util.zip.ZipEntry entry;


jcss.search.base.zip.ZipEntry entry;


while((entry =zin.getNextEntry()) != null) {


String name = null;


if (entry.isDirectory()) {


name = entry.getName();


name = name.substring(0, name.length() - 1);


File f = new File(outputDirectory + File.separator + name);


f.mkdir();


System.out.println("

创建目录:
" + outputDirectory + File.separator + name);


}else{


String fileName = entry.getName();


fileName = fileName.replace('//', '/');


count++;


System.out.println("

正在解压第
" + count + "

个文件
: " + entry.getName());


if (fileName.indexOf("/") != -1) {


createDirectory(outputDirectory, fileName.substring(0,fileName.lastIndexOf("/")));


fileName = fileName.substring(fileName.lastIndexOf("/") + 1, fileName.length());


}


File f = new File(outputDirectory + File.separator + entry.getName());


f.createNewFile();



//
InputStream in = zipFile.getInputStream(zipEntry);

//
FileOutputStream out = new FileOutputStream(f);

//
byte[] by = new byte[1024];

//
int c;

//
while ((c = in.read(by)) != -1) {

//
out.write(by, 0, c);

//
}

//
out.close();

//
in.close();




int cnt;



byte data[] = new byte[BUFFER];



FileOutputStream fos = new FileOutputStream(f);



dest = new BufferedOutputStream(fos, BUFFER);



while ((cnt = zin.read(data, 0, BUFFER)) != -1) {



dest.write(data, 0, cnt);



}



dest.flush();



dest.close();


}


}


zin.close();




//

删除文件不能在这里删,因为文件正在使用,应在上传那处删


//

解压后,删除压缩文件


// File zipFileToDel = new File(zipFileName);


// zipFileToDel.delete();


// System.out.println("

正在删除文件:
"+ zipFileToDel.getCanonicalPath());




// //

删除解压后的那一层目录


// delALayerDir(zipFileName, outputDirectory);




Date endDate = new Date();


long temp = endDate.getTime() - beginDate.getTime();


System.out.println("

解压共用时间:
" + temp);




} catch (Exception ex) {


System.out.println(ex.getMessage());


}




}




/**



*

删掉一层目录



*



* @param zipFileName



* @param outputDirectory



*/


public void delALayerDir(String zipFileName, String outputDirectory) {




String[] dir = zipFileName.replace('//', '/').split("/");


String fileFullName = dir[dir.length - 1]; //

得到
aa.zip


int pos = -1;


pos = fileFullName.indexOf(".");


String fileName = fileFullName.substring(0, pos); //

得到
aa


String sourceDir = outputDirectory + File.separator + fileName;


try {


copyFile(new File(outputDirectory), new File(sourceDir), new File(


sourceDir));




deleteSourceBaseDir(new File(sourceDir));




} catch (Exception e) {


e.printStackTrace();


}


}




/**



*


sourceDir

目录的文件全部
copy


destDir

中去



*/


public void copyFile(File destDir, File sourceBaseDir, File sourceDir)


throws Exception {




File[] lists = sourceDir.listFiles();


String line = null;


String url = null;


if (lists == null)


return;


for (int i = 0; i < lists.length; i++) {


File f = lists[i];


if (f.isFile()) {


FileInputStream fis = new FileInputStream(f);


String content = "";




String sourceBasePath = sourceBaseDir.getCanonicalPath();


String destPath = destDir.getCanonicalPath();


String fPath = f.getCanonicalPath();


String drPath = destDir


+ fPath.substring(fPath.indexOf(sourceBasePath)


+ sourceBasePath.length());


FileOutputStream fos = new FileOutputStream(drPath);




byte[] b = new byte[2048];


while (fis.read(b) != -1) {


if (content != null)


content += new String(b);


else


content = new String(b);


b = new byte[2048];


}




content = content.trim();


fis.close();




fos.write(content.getBytes());


fos.flush();


fos.close();




} else {


//

先新建目录


new File(destDir + File.separator + f.getName()).mkdir();




copyFile(destDir, sourceBaseDir, f); //

递归调用


}


}


}




/**



*


sourceDir

目录的文件全部
copy


destDir

中去



*/


public void deleteSourceBaseDir(File curFile) throws Exception {




File[] lists = curFile.listFiles();


String line = null;


String url = null;


File parentFile = null;


for (int i = 0; i < lists.length; i++) {


File f = lists[i];


if (f.isFile()) {


f.delete();


//

若它的父目录没有文件了,说明已经删完,应该删除父目录


parentFile = f.getParentFile();


if (parentFile.list().length == 0)


parentFile.delete();


} else {


deleteSourceBaseDir(f); //

递归调用


}


}


}




public static void main(String[] args) {


try {


ZipUtil t = new ZipUtil();


// t.zip("e://test.zip", "E://news.sina.com.cn//news.sina.com.cn");


Date beginDate = new Date();


//t.unZip("e://test.zip", "E://news.sina.com.cn");


t.unBigZip("e://test.zip", "E://news.sina.com.cn");




Date endDate = new Date();


long temp = endDate.getTime() - beginDate.getTime();


System.out.println("

共用时间:
" + temp);




} catch (Exception e) {


e.printStackTrace(System.out);


}


}



}



附件五:

/*


* @(#)ZipInputStream.java
1.37 04/06/11


*


* Copyright 2004 Sun Microsystems, Inc. All rights reserved.


* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.


*/



package

jcss.search.base.zip;



import

java.io.InputStream;

import

java.io.IOException;

import

java.io.EOFException;

import

java.io.PushbackInputStream;

import

java.util.zip.CRC32;

import

java.util.zip.Inflater;

import

java.util.zip.InflaterInputStream;

import

java.util.zip.ZipException;









/**



*




*



*

@author



David

Connelly



*

@version



1.37,

06/11/04



*/

public


class

CZipInputStream
extends

InflaterInputStream
implements

ZipConstants {





private

String
encoding
=
"UTF-8"
;









private

ZipEntry
entry
;



private

CRC32
crc
=
new

CRC32();



private


long


remaining
;



private


byte

[]
tmpbuf
=
new


byte

[512];





private


static


final


int


STORED

= ZipEntry.
STORED

;



private


static


final


int


DEFLATED

= ZipEntry.
DEFLATED

;





private


boolean


closed
=
false

;



// this flag is set to true after EOF has reached for



// one entry



private


boolean


entryEOF
=
false

;





/**



*

Check

to

make

sure

that

this

stream

has

not

been

closed



*/



private


void

ensureOpen()
throws

IOException {



if

(
closed
) {




throw


new

IOException(
"Stream closed"
);


}


}





boolean


usesDefaultInflater
=
false

;



/**



*

Creates

a

new

ZIP

input

stream.



*

@param


in

the

actual

input

stream




*/



public

CZipInputStream(InputStream in) {



super

(
new

PushbackInputStream(in, 512),
new

Inflater(
true

), 512);



usesDefaultInflater
=
true

;



if

(in ==
null

) {



throw


new

NullPointerException(
"in is null"
);


}


}






public

CZipInputStream(InputStream in,String encoding) {



super

(
new

PushbackInputStream(in,512),
new

Inflater(
true

),512);



usesDefaultInflater
=
true

;



if

(in ==
null

) {



throw


new

NullPointerException(
"in is null"
);


}



this

.
encoding
=encoding;


}







/**



*

Reads

the

next

ZIP

file

entry

and

positions

the

stream

at

the



*

beginning

of

the

entry

data.



*

@return


the

next

ZIP

file

entry,

or

null

if

there

are

no

more

entries



*

@exception


ZipException

if

a

ZIP

file

error

has

occurred



*

@exception


IOException

if

an

I/O

error

has

occurred



*/



public

ZipEntry getNextEntry()
throws

IOException {


ensureOpen();



if

(
entry
!=
null

) {



closeEntry();


}



crc
.reset();



inf
.reset();



if

((
entry
= readLOC()) ==
null

) {




return


null

;


}



if

(
entry
.
method
==
STORED

) {




remaining
=
entry
.
size
;


}



entryEOF
=
false

;



return


entry
;


}





/**



*

Closes

the

current

ZIP

entry

and

positions

the

stream

for

reading

the



*

next

entry.



*

@exception


ZipException

if

a

ZIP

file

error

has

occurred



*

@exception


IOException

if

an

I/O

error

has

occurred



*/



public


void

closeEntry()
throws

IOException {


ensureOpen();



while

(read(
tmpbuf
, 0,
tmpbuf
.
length
) != -1) ;



entryEOF
=
true

;


}





/**



*

Returns

0

after

EOF

has

reached

for

the

current

entry

data,



*

otherwise

always

return

1.



*

<p>



*

Programs

should

not

count

on

this

method

to

return

the

actual

number



*

of

bytes

that

could

be

read

without

blocking.



*



*

@return



1

before

EOF

and

0

after

EOF

has

reached

for

current

entry.



*

@exception



IOException


if

an

I/O

error

occurs.



*




*/



public


int

available()
throws

IOException {


ensureOpen();



if

(
entryEOF
) {



return

0;


}
else

{



return

1;


}


}





/**



*

Reads

from

the

current

ZIP

entry

into

an

array

of

bytes.

Blocks

until



*

some

input

is

available.



*

@param


b

the

buffer

into

which

the

data

is

read



*

@param


off

the

start

offset

of

the

data



*

@param


len

the

maximum

number

of

bytes

read



*

@return


the

actual

number

of

bytes

read,

or

-
1

if

the

end

of

the



*


entry

is

reached



*

@exception


ZipException

if

a

ZIP

file

error

has

occurred



*

@exception


IOException

if

an

I/O

error

has

occurred



*/



public


int

read(
byte

[] b,
int

off,
int

len)
throws

IOException {


ensureOpen();



if

(off < 0 || len < 0 || off > b.
length
- len) {




throw


new

IndexOutOfBoundsException();


}
else


if

(len == 0) {




return

0;


}





if

(
entry
==
null

) {




return

-1;


}



switch

(
entry
.
method
) {



case


DEFLATED

:



len =
super

.read(b, off, len);




if

(len == -1) {


readEnd(
entry
);



entryEOF
=
true

;



entry
=
null

;



}
else

{



crc
.update(b, off, len);



}




return

len;



case


STORED

:




if

(
remaining
<= 0) {



entryEOF
=
true

;



entry
=
null

;



return

-1;



}




if

(len >
remaining
) {


len = (
int

)
remaining
;




}



len =
in
.read(b, off, len);




if

(len == -1) {



throw


new

ZipException(
"unexpected EOF"
);



}




crc
.update(b, off, len);




remaining
-= len;




return

len;



default

:




throw


new

InternalError(
"invalid compression method"
);


}



}





/**



*

Skips

specified

number

of

bytes

in

the

current

ZIP

entry.



*

@param


n

the

number

of

bytes

to

skip



*

@return


the

actual

number

of

bytes

skipped



*

@exception


ZipException

if

a

ZIP

file

error

has

occurred



*

@exception


IOException

if

an

I/O

error

has

occurred



*

@exception


IllegalArgumentException

if

n

< 0



*/



public


long

skip(
long

n)
throws

IOException {



if

(n < 0) {



throw


new

IllegalArgumentException(
"negative skip length"
);


}



ensureOpen();



int

max = (
int

)Math.min
(n, Integer.
MAX_VALUE

);



int

total = 0;



while

(total < max) {




int

len = max - total;




if

(len >
tmpbuf
.
length
) {


len =
tmpbuf
.
length
;



}



len = read(
tmpbuf
, 0, len);




if

(len == -1) {




entryEOF
=
true

;



break

;



}



total += len;


}



return

total;


}





/**



*

Closes

this

input

stream

and

releases

any

system

resources

associated



*

with

the

stream.



*

@exception


IOException

if

an

I/O

error

has

occurred




*/



public


void

close()
throws

IOException {



if

(!
closed
) {




super

.close();



closed
=
true

;


}


}





private


byte

[]
b
=
new


byte

[256];





/*


* Reads local file (LOC) header for next entry.


*/



private

ZipEntry readLOC()
throws

IOException {



try

{



readFully(
tmpbuf
, 0,
LOCHDR

);


}
catch

(EOFException e) {




return


null

;


}



if

(get32
(
tmpbuf
, 0) !=
LOCSIG

) {




return


null

;


}



// get the entry name and create the ZipEntry first



int

len = get16
(
tmpbuf
,
LOCNAM

);



if

(len == 0) {




throw


new

ZipException(
"missing entry name"
);


}



int

blen =
b
.
length
;



if

(len > blen) {



do




blen = blen * 2;



while

(len > blen);



b
=
new


byte

[blen];


}


readFully(
b
, 0, len);







//ZipEntry e = createZipEntry(getUTF8String(b, 0, len));


ZipEntry e=
null

;



try




{



if

(
this

.
encoding
.toUpperCase().equals(
"UTF-8"
))


e=createZipEntry(getUTF8String
(
b
, 0, len));



else




e=createZipEntry(
new

String(
b
,0,len,
this

.
encoding
));


}



catch

(Exception byteE)


{


e=createZipEntry(getUTF8String
(
b
, 0, len));


}









// now get the remaining fields for the entry


e.
version
= get16
(
tmpbuf
,
LOCVER

);


e.
flag
= get16
(
tmpbuf
,
LOCFLG

);



if

((e.
flag
& 1) == 1) {




throw


new

ZipException(
"encrypted ZIP entry not supported"
);


}


e.
method
= get16
(
tmpbuf
,
LOCHOW

);


e.
time
= get32
(
tmpbuf
,
LOCTIM

);



if

((e.
flag
& 8) == 8) {




/* EXT descriptor present */




if

(e.
method
!=
DEFLATED

) {



throw


new

ZipException(



"only DEFLATED entries can have EXT descriptor"
);



}


}
else

{



e.
crc
= get32
(
tmpbuf
,
LOCCRC

);



e.
csize
= get32
(
tmpbuf
,
LOCSIZ

);



e.
size
= get32
(
tmpbuf
,
LOCLEN

);


}


len = get16
(
tmpbuf
,
LOCEXT

);



if

(len > 0) {




byte

[] bb =
new


byte

[len];



readFully(bb, 0, len);



e.
extra
= bb;


}



return

e;


}





/*


* Fetches a UTF8-encoded String from the specified byte array.


*/



private


static

String getUTF8String(
byte

[] b,
int

off,
int

len) {



// First, count the number of characters in the sequence



int

count = 0;



int

max = off + len;



int

i = off;



while

(i < max) {




int

c = b[i++] & 0xff;




switch

(c >> 4) {




case

0:
case

1:
case

2:
case

3:
case

4:
case

5:
case

6:
case

7:



// 0xxxxxxx


count++;



break

;




case

12:
case

13:



// 110xxxxx 10xxxxxx



if

((
int

)(b[i++] & 0xc0) != 0x80) {




throw


new

IllegalArgumentException();


}


count++;



break

;




case

14:



// 1110xxxx 10xxxxxx 10xxxxxx



if

(((
int

)(b[i++] & 0xc0) != 0x80) ||



((
int

)(b[i++] & 0xc0) != 0x80)) {




throw


new

IllegalArgumentException();


}


count++;



break

;




default

:



// 10xxxxxx, 1111xxxx



throw


new

IllegalArgumentException();



}


}



if

(i != max) {




throw


new

IllegalArgumentException();


}



// Now decode the characters...



char

[] cs =
new


char

[count];


i = 0;



while

(off < max) {




int

c = b[off++] & 0xff;




switch

(c >> 4) {




case

0:
case

1:
case

2:
case

3:
case

4:
case

5:
case

6:
case

7:



// 0xxxxxxx


cs[i++] = (
char

)c;



break

;




case

12:
case

13:



// 110xxxxx 10xxxxxx


cs[i++] = (
char

)(((c & 0x1f) << 6) | (b[off++] & 0x3f));



break

;




case

14:



// 1110xxxx 10xxxxxx 10xxxxxx



int

t = (b[off++] & 0x3f) << 6;


cs[i++] = (
char

)(((c & 0x0f) << 12) | t | (b[off++] & 0x3f));



break

;




default

:



// 10xxxxxx, 1111xxxx



throw


new

IllegalArgumentException();



}


}



return


new

String(cs, 0, count);


}





/**



*

Creates

a

new

<code>
ZipEntry
</code>

object

for

the

specified



*

entry

name.



*



*

@param


name

the

ZIP

file

entry

name



*

@return


the

ZipEntry

just

created



*/



protected

ZipEntry createZipEntry(String name) {



return


new

ZipEntry(name);


}





/*


* Reads end of deflated entry as well as EXT descriptor if present.


*/



private


void

readEnd(ZipEntry e)
throws

IOException {



int

n =
inf
.getRemaining();



if

(n > 0) {



((PushbackInputStream)
in
).unread(
buf
,
len
- n, n);


}



if

((e.
flag
& 8) == 8) {




/* EXT descriptor present */



readFully(
tmpbuf
, 0,
EXTHDR

);




long

sig = get32
(
tmpbuf
, 0);



if

(sig !=
EXTSIG

) {
// no EXTSIG present


e.
crc
= sig;


e.
csize
= get32
(
tmpbuf
,
EXTSIZ

-
EXTCRC

);


e.
size
= get32
(
tmpbuf
,
EXTLEN

-
EXTCRC

);


((PushbackInputStream)
in
).unread(



tmpbuf
,
EXTHDR

-
EXTCRC

- 1,
EXTCRC

);


}
else

{


e.
crc
= get32
(
tmpbuf
,
EXTCRC

);


e.
csize
= get32
(
tmpbuf
,
EXTSIZ

);


e.
size
= get32
(
tmpbuf
,
EXTLEN

);


}


}



if

(e.
size
!=
inf
.getBytesWritten()) {




throw


new

ZipException(



"invalid entry size (expected "
+ e.
size
+



" but got "
+
inf
.getBytesWritten() +
" bytes)"
);


}



if

(e.
csize
!=
inf
.getBytesRead()) {




throw


new

ZipException(



"invalid entry compressed size (expected "
+ e.
csize
+



" but got "
+
inf
.getBytesRead() +
" bytes)"
);


}



if

(e.
crc
!=
crc
.getValue()) {




throw


new

ZipException(



"invalid entry CRC (expected 0x"
+ Long.toHexString
(e.
crc
) +



" but got 0x"
+ Long.toHexString
(
crc
.getValue()) +
")"
);


}


}





/*


* Reads bytes, blocking until all bytes are read.


*/



private


void

readFully(
byte

[] b,
int

off,
int

len)
throws

IOException {



while

(len > 0) {




int

n =
in
.read(b, off, len);




if

(n == -1) {



throw


new

EOFException();



}



off += n;



len -= n;


}


}





/*


* Fetches unsigned 16-bit value from byte array at specified offset.


* The bytes are assumed to be in Intel (little-endian) byte order.


*/



private


static


final


int

get16(
byte

b[],
int

off) {



return

(b[off] & 0xff) | ((b[off+1] & 0xff) << 8);


}





/*


* Fetches unsigned 32-bit value from byte array at specified offset.


* The bytes are assumed to be in Intel (little-endian) byte order.


*/



private


static


final


long

get32(
byte

b[],
int

off) {



return

get16
(b, off) | ((
long

)get16
(b, off+2) << 16);


}

}



附件六:

package

jcss.search.base.zip;





/**



*

@author


张华



*

@time



2007
-
8
-
3



*

@description





**/

public


class

ZipEntry
extends

org.apache.tools.zip.ZipEntry {




String
name
;

// entry name



long


time
= -1;

// modification time (in DOS time)



long


crc
= -1;

// crc-32 of entry data



long


size
= -1;

// uncompressed size of entry data



long


csize
= -1;


// compressed size of entry data



int


method
= -1;

// compression method



byte

[]
extra
;

// optional extra field data for entry


String
comment
;

// optional comment string for entry



// The following flags are used only by Zip{Input,Output}Stream



int


flag
;

// bit flags



int


version
;

// version needed to extract



long


offset
;

// offset of loc header





/**



*

Compression

method

for

uncompressed

entries.



*/



public


static


final


int


STORED

= 0;





/**



*

Compression

method

for

compressed

(deflated)

entries.



*/



public


static


final


int


DEFLATED

= 8;





//
下面这句一定要注释掉

//
static {

//
/* Zip library is loaded from System.initializeSystemClass */

//
initIDs();

//
}

//
private static native void initIDs();





public


ZipEntry(String name){



super

(name);


}



}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: