The Files

You need 3 files to run the maxTemperature example:

a C++ file containing the map and reduce functions,

a data file containing some temperature data such as found at the National Climatic Data Cener (NCDC), and

a Makefile to compile the C++ file.


#include <algorithm>
#include <limits>
#include <string>

#include  "stdint.h"  // <-- this is missing from the book

#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"

using namespace std;

class MaxTemperatureMapper : public HadoopPipes::Mapper {
MaxTemperatureMapper(HadoopPipes::TaskContext& context) {
void map(HadoopPipes::MapContext& context) {
string line = context.getInputValue();
string year = line.substr(15, 4);
string airTemperature = line.substr(87, 5);
string q = line.substr(92, 1);
if (airTemperature != "+9999" &&
(q == "0" || q == "1" || q == "4" || q == "5" || q == "9")) {
context.emit(year, airTemperature);

class MapTemperatureReducer : public HadoopPipes::Reducer {
MapTemperatureReducer(HadoopPipes::TaskContext& context) {
void reduce(HadoopPipes::ReduceContext& context) {
int maxValue = -10000;
while (context.nextValue()) {
maxValue = max(maxValue, HadoopUtils::toInt(context.getInputValue()));
context.emit(context.getInputKey(), HadoopUtils::toString(maxValue));

int main(int argc, char *argv[]) {
return HadoopPipes::runTask(HadoopPipes::TemplateFactory<MaxTemperatureMapper,


Create a Make file with the following entries. Note that you need to figure out if your computer hosts a 32-bit processor or a 64-bit processor, and pick the right library. To find this out,
run the following command:
uname -a

To which the OS responds:
Linux hadoop6 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux

The i686 indicates a 32-bit machine, for which you need to use the Linux-i386-32 library. Anything with 64 indicates the other type, for which you use the Linux-amd64-64 library.

CC = g++
HADOOP_INSTALL = /home/hadoop/hadoop
PLATFORM = Linux-i386-32

max_temperature: max_temperature.cpp
$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \
-lhadooputils -lpthread -g -O2 -o $@

Data File

Create a file called sample.txt which will contain sample temperature data from the NCDC.


Put the data file in HDFS:

hadoop dfs -mkdir ncdc
hadoop dfs -put sample.txt ncdc

Compiling and Running

You need a C++ compiler. GNU g++ is probably the best choice. Check that it is installed (by typing g++ at the prompt). If it is not installed yet, install it!

sudo apt-get install g++

Compile the code:

make  max_temperature

and fix any errors you're getting.

Copy the executable file (max_temperature) to a bin directory in HDFS:

hadoop dfs -mkdir bin
hadoop dfs -put max_temperature bin/max_temperature

Run the program!

hadoop pipes -D hadoop.pipes.java.recordreader=true  \
-D hadoop.pipes.java.recordwriter=true \
-input ncdc/sample.txt  -output ncdc-out  \
-program bin/max_temperature

Verify that you have gotten the right output:

hadoop dfs -text ncdc-out/part-00000

1949	111
1950	22
