idea本地调试hadoop
2017-01-01 11:01
357 查看
hadoop依赖jar
<!-- hadoop --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.0</version> </dependency>
core是核心包,一般core和common是必须的,hdfs是用来读写hdfs数据的。
调试代码
以经典的天气数据分析为例,hadoop分为三个步骤:建立任务,map,reduce,以下是三个步骤的代码:#### MaxTemperature.java main类 public class MaxTemperature { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); //注1 job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } } #### map类 MaxTemperatureMapper.java public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } } } #### reduce类 MaxTemperatureReducer.java public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); } }
运行
(1) MaxTemperature 设置edit configurationsmain类选MaxTemperature & program arguments 填写input/ output/作为输入(注意这里output目录不需要我们自动生成,否则会报错)
(2) 准备好数据,开始运行吧
相关文章推荐
- 在 Windows 上使用 Idea 本地调试 Hadoop 程序
- win7安装Hadoop2.7.1 ,IDEA本地远程调试
- idea中hadoop本地debug调试以及本地提交模式(不需要打jar包上传)
- intellij idea本地开发调试hadoop的方法
- IDEA调试本地Hadoop程序
- 解决IDEA调试Hadoop程序中无法加载本地库的问题
- 用python + hadoop streaming 编写分布式程序(一) -- 原理介绍,样例程序与本地调试
- 用python + hadoop streaming 分布式编程(一) -- 原理介绍,样例程序与本地调试
- hadoop:IDEA本地编写mapreducer的wordcount并测试,并上传到hadoop的linux服务器进行测试
- Windows 使用Eclipse配置连接hadoop,编译运行MapReduce --本地调试WordCount
- IDEA远程调试Hadoop步骤及出错解决整理
- IDEA远程调试Hadoop步骤及出错解决整理
- IDEA本地执行 or 调试Spark Application的方法
- hadoop2.7.2本地调试MR IDEA本地调试mapreduce
- windows下搭建hadoop-2.6.0本地idea开发环境
- Hadoop Mapreduce本地调试
- 用python + hadoop streaming 编写分布式程序(一) -- 原理介绍,样例程序与本地调试
- [置顶] 安装Idea(集成scala)以及在windows上配置spark(hadoop依赖)本地开发环境
- Python利用hadoop Streaming编写的Map-Reduce程序命令运行和本地调试运行
- 用python + hadoop streaming 编写分布式程序的本地调试方法