您的位置：首页 > 编程语言 > Java开发

java管理hdfs文件的常用类小结

2014-07-24 00:04 288 查看

最近在写一些hadoop的运维工具，考虑到hadoop本身是java编写的，api比较方便，可以直接拿来就用
准备用java来写。
今天测试了几个fs相关的类。
主要有FileStatus,FileSystem,DistributedFileSystem,DatanodeInfo,BlockLocation
FileStatus是和文件的属性相关的类，比如文件的名称，大小，属主等
主要的取值方法有：
getPath,getLen,getModificationTime,getAccessTime,getReplication,getOwner,getGroup,getPermission等，通过字面意思就可以很容易的理解其功能
FileSystem是一个抽象类，常见的具体实现类是LocalFileSystem和DistributedFileSystem
常见的方法：
listStatus 查看目录下的文件属性，返回FileStatus数组，getFileStatus 查看一个文件的属性，返回FileStatus
copyToLocalFile hdfs到本地复制，copyFromLocalFile 本地到hdfs复制，exists 判断文件是否存在，参数为Path，getLocal 返回本地文件系统，LocalFileSystem
getFileBlockLocations ，返回BlockLocation的数组，参数是FileStatus 等
DatanodeInfo是和datanode有关的类，主要是包含了datanode的相关信息，主要有容量，打开文件数，主机名，ip等信息，比如查看整个hdfs集群的存储状态（dfsadmin report）就是调用了
getDatanodeReport方法。
BlockLocation包含了block所在的datanode的信息，可以用来查看一个文件的block的具体信息，比如大小，所在主机，是否为坏块等

综上，可以用FileStatus来获取文件的属性，FileSystem的具体实现类来对文件做操作，用DatanodeInfo来获取datanode的信息，BlockLocation来看文件到block的对应关系

下面列举两个例子：
例1：

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.DistributedFileSystem;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
import org.apache.hadoop.fs.BlockLocation;
import static java.lang.System.out;
public class FileStatusTest {
private static Configuration config = new Configuration();
private static FileSystem hdfs = null ;
private static FileStatus[] status = null ;
public static void main(String[] args) throws IOException{
List<String>  namelist = new ArrayList<String>();
config.addResource("hdfs-site.xml" );
config.addResource("core-site.xml" );
out.println(config.get("dfs.namenode.name.dir"));
out.println(config.get("fs.defaultFS"));
FileSystem hdfs = FileSystem.get( config);
out.println(hdfs.getClass().getName());
String file = args[0];
try {
if(!hdfs.exists(new Path (file))){
return;
}
status = hdfs.listStatus(new Path (file));
out.println("file num is " + status.length);
for(FileStatus f: status) {
namelist.add(f.getPath().toString());
out.println("file name is " + f.getPath());
}
} catch(Exception e){
e.printStackTrace();
}
}
}

例2：

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.DistributedFileSystem;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
import org.apache.hadoop.fs.BlockLocation;
import static java.lang.System.out;
public class FileStatusTest {
private static Configuration config = new Configuration();
private static FileSystem hdfs = null ;
private static FileStatus[] status = null ;
public static void main(String[] args) throws IOException{
List<String>  namelist = new ArrayList<String>();
List<String>  hostlist = new ArrayList<String>();
config.addResource("hdfs-site.xml" );
config.addResource("core-site.xml" );
out.println(config.get("dfs.namenode.name.dir"));
out.println(config.get("fs.defaultFS"));
FileSystem hdfs = FileSystem.get( config);
try{
DistributedFileSystem dishdfs = (DistributedFileSystem) hdfs;
DatanodeInfo[] nodeStats = dishdfs.getDataNodeStats();
String[] names = new String[nodeStats.length];
for (int i = 0; i < nodeStats.length; i++) {
out.println("hostname is " + nodeStats[i].getName());
out.println("dfs used is " + nodeStats[i].getDfsUsed());
out.println(nodeStats[i].getDatanodeReport());
}
} catch(Exception e){
e.printStackTrace();
}
out.println(hdfs.getClass().getName());
String file = args[0];
try {
if(!hdfs.exists(new Path (file))){
return;
}
status = hdfs.listStatus(new Path (file));
out.println("file num is " + status.length);
for(FileStatus f: status) {
namelist.add(f.getPath().toString());
out.println("file name is " + f.getPath());
BlockLocation[] blks = hdfs.getFileBlockLocations(f, 0, f.getLen());
for (BlockLocation blk:blks){
out.println("blk name is " + blk.toString());
out.println("length is " + blk.getLength());
}
}
} catch(Exception e){
e.printStackTrace();
}
}
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： java 文件系统 hdfs

相关文章推荐

新的分享

章节导航