用hadoop计算气象数据温度的最大值
2011-10-09 18:20
344 查看
The Files
You need 3 files to run the maxTemperature example:a C++ file containing the map and reduce functions,
a data file containing some temperature data such as found at the National Climatic Data Cener (NCDC), and
a Makefile to compile the C++ file.
Max_temperature.cpp
#include <algorithm> #include <limits> #include <string> #include "stdint.h" // <-- this is missing from the book #include "hadoop/Pipes.hh" #include "hadoop/TemplateFactory.hh" #include "hadoop/StringUtils.hh" using namespace std; class MaxTemperatureMapper : public HadoopPipes::Mapper { public: MaxTemperatureMapper(HadoopPipes::TaskContext& context) { } void map(HadoopPipes::MapContext& context) { string line = context.getInputValue(); string year = line.substr(15, 4); string airTemperature = line.substr(87, 5); string q = line.substr(92, 1); if (airTemperature != "+9999" && (q == "0" || q == "1" || q == "4" || q == "5" || q == "9")) { context.emit(year, airTemperature); } } }; class MapTemperatureReducer : public HadoopPipes::Reducer { public: MapTemperatureReducer(HadoopPipes::TaskContext& context) { } void reduce(HadoopPipes::ReduceContext& context) { int maxValue = -10000; while (context.nextValue()) { maxValue = max(maxValue, HadoopUtils::toInt(context.getInputValue())); } context.emit(context.getInputKey(), HadoopUtils::toString(maxValue)); } }; int main(int argc, char *argv[]) { return HadoopPipes::runTask(HadoopPipes::TemplateFactory<MaxTemperatureMapper, MapTemperatureReducer>()); }
Makefile
Create a Make file with the following entries. Note that you need to figure out if your computer hosts a 32-bit processor or a 64-bit processor, and pick the right library. To find this out,run the following command:
uname -a
To which the OS responds:
Linux hadoop6 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux
The i686 indicates a 32-bit machine, for which you need to use the Linux-i386-32 library. Anything with 64 indicates the other type, for which you use the Linux-amd64-64 library.
CC = g++ HADOOP_INSTALL = /home/hadoop/hadoop PLATFORM = Linux-i386-32 CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include max_temperature: max_temperature.cpp $(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \ -lhadooputils -lpthread -g -O2 -o $@
Data File
Create a file called sample.txt which will contain sample temperature data from the NCDC.0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
Put the data file in HDFS:
hadoop dfs -mkdir ncdc hadoop dfs -put sample.txt ncdc
Compiling and Running
You need a C++ compiler. GNU g++ is probably the best choice. Check that it is installed (by typing g++ at the prompt). If it is not installed yet, install it!sudo apt-get install g++
Compile the code:
make max_temperature
and fix any errors you're getting.
Copy the executable file (max_temperature) to a bin directory in HDFS:
hadoop dfs -mkdir bin hadoop dfs -put max_temperature bin/max_temperature
Run the program!
hadoop pipes -D hadoop.pipes.java.recordreader=true \ -D hadoop.pipes.java.recordwriter=true \ -input ncdc/sample.txt -output ncdc-out \ -program bin/max_temperature
Verify that you have gotten the right output:
hadoop dfs -text ncdc-out/part-00000 1949 111 1950 22
相关文章推荐
- Hadoop—MapReduce计算气象温度等例子---练习
- Hadoop第5周练习—MapReduce计算气象温度等例子
- hadoop实验:求气象数据的最低温度
- 通过常规气象观测数据计算净辐射Rs示意图
- 【备忘】【No2】2016年最新云计算视频教程hadoop大数据实战开发
- Hadoop气象数据
- Hadoop:适合大数据的分布式存储与计算平台(第三讲)
- 王家林的81门一站式云计算分布式大数据&移动互联网解决方案课程第一门课程:云计算分布式大数据Hadoop企业级开发动手实践
- 大数据(十四)Hadoop-MR编程 -- 【使用hadoop计算网页之间的PageRank值----概念】
- hadoop(适合大数据的分布式存储和分布式计算平台)---总结
- hadoop 气象数据分析
- Hadoop 实例1---通过采集的气象数据分析每年的最高温度
- 7、 数据仓库Hive(使用sql进行计算的hadoop框架)
- 大数据 (十五)Hadoop-MR编程 -- 【使用hadoop计算网页之间的PageRank值----编程】
- 大数据实时计算工程师/Hadoop工程师/数据分析师职业路线图
- 怎样下载美国气象数据用于hadoop实验
- Hadoop学习笔记—4.初识MapReduce 一、神马是高大上的MapReduce MapReduce是Google的一项重要技术,它首先是一个编程模型,用以进行大数据量的计算。对于大数据
- 还在用Hadoop么?Hadoop服务器造成5PB数据泄露_中国、美国受波及最大!
- 利用位运算计算某种数据类型的最大值和最小值
- (第4篇)hadoop之魂--mapreduce计算框架,让收集的数据产生价值