您的位置:首页 > 运维架构

用hadoop计算气象数据温度的最大值

2011-10-09 18:20 344 查看


The Files

You need 3 files to run the maxTemperature example:

a C++ file containing the map and reduce functions,

a data file containing some temperature data such as found at the National Climatic Data Cener (NCDC), and

a Makefile to compile the C++ file.


Max_temperature.cpp

#include <algorithm>
#include <limits>
#include <string>

#include  "stdint.h"  // <-- this is missing from the book

#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"

using namespace std;

class MaxTemperatureMapper : public HadoopPipes::Mapper {
public:
MaxTemperatureMapper(HadoopPipes::TaskContext& context) {
}
void map(HadoopPipes::MapContext& context) {
string line = context.getInputValue();
string year = line.substr(15, 4);
string airTemperature = line.substr(87, 5);
string q = line.substr(92, 1);
if (airTemperature != "+9999" &&
(q == "0" || q == "1" || q == "4" || q == "5" || q == "9")) {
context.emit(year, airTemperature);
}
}
};

class MapTemperatureReducer : public HadoopPipes::Reducer {
public:
MapTemperatureReducer(HadoopPipes::TaskContext& context) {
}
void reduce(HadoopPipes::ReduceContext& context) {
int maxValue = -10000;
while (context.nextValue()) {
maxValue = max(maxValue, HadoopUtils::toInt(context.getInputValue()));
}
context.emit(context.getInputKey(), HadoopUtils::toString(maxValue));
}
};

int main(int argc, char *argv[]) {
return HadoopPipes::runTask(HadoopPipes::TemplateFactory<MaxTemperatureMapper,
MapTemperatureReducer>());
}



Makefile

Create a Make file with the following entries. Note that you need to figure out if your computer hosts a 32-bit processor or a 64-bit processor, and pick the right library. To find this out,
run the following command:
uname -a

To which the OS responds:
Linux hadoop6 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux

The i686 indicates a 32-bit machine, for which you need to use the Linux-i386-32 library. Anything with 64 indicates the other type, for which you use the Linux-amd64-64 library.

CC = g++
HADOOP_INSTALL = /home/hadoop/hadoop
PLATFORM = Linux-i386-32
CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include

max_temperature: max_temperature.cpp
$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \
-lhadooputils -lpthread -g -O2 -o $@


Data File

Create a file called sample.txt which will contain sample temperature data from the NCDC.


0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999


Put the data file in HDFS:

hadoop dfs -mkdir ncdc
hadoop dfs -put sample.txt ncdc


Compiling and Running

You need a C++ compiler. GNU g++ is probably the best choice. Check that it is installed (by typing g++ at the prompt). If it is not installed yet, install it!

sudo apt-get install g++


Compile the code:

make  max_temperature


and fix any errors you're getting.

Copy the executable file (max_temperature) to a bin directory in HDFS:

hadoop dfs -mkdir bin
hadoop dfs -put max_temperature bin/max_temperature


Run the program!

hadoop pipes -D hadoop.pipes.java.recordreader=true  \
-D hadoop.pipes.java.recordwriter=true \
-input ncdc/sample.txt  -output ncdc-out  \
-program bin/max_temperature


Verify that you have gotten the right output:

hadoop dfs -text ncdc-out/part-00000

1949	111
1950	22
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐