您的位置:首页 > 运维架构

OpenCV中parallel_for 和 parallel_for_学习笔记

2014-10-17 09:46 302 查看
原地址:http://blog.csdn.net/chouclee/article/details/8682561

OpenCV 从2.4.3开始加入了并行计算的函数parallel_for和parallel_for_(更准确地讲,parallel_for以前就存在于tbb模块中,但是OpenCV官网将其列在2.4.3.的New
Features中,应该是重新改写过的)。
2.4.3中自带的calcOpticalFlowPyrLK函数也用parallel_for重写过了,之前我一直认为parallel_for就是用来并行计算的,之前也自己写了一些用parallel_for实现的算法。直到今天在opencv官网中看到别人的提问,才发现parallel_for实际上是serial loop,而parallel_for_才是parallel loop(OpenCV官网answer)。
为了比较for循环,parallel_for和parallel_for_ 三者的差异,下面做了一个简单的测试,对一个Mat中所有的元素(按列为单位)做立方操作。

Code
test.hpp

[cpp]
view plaincopyprint?

/**@ Test parallel_for and parallel_for_
/**@ Author: chouclee
/**@ 03/17/2013*/
#include <opencv2/core/internal.hpp>

namespace cv
{
namespace test
{
class parallelTestBody : public ParallelLoopBody//参考官方给出的answer,构造一个并行的循环体类
{
public:
parallelTestBody(Mat& _src)//class constructor
{
src = &_src;
}
void operator()(const Range& range) const//重载操作符()
{
Mat& srcMat = *src;
int stepSrc = (int)(srcMat.step/srcMat.elemSize1());//获取每一行的元素总个数(相当于cols*channels,等同于step1)
for (int colIdx = range.start; colIdx < range.end; ++colIdx)
{
float* pData = (float*)srcMat.col(colIdx).data;
for (int i = 0; i < srcMat.rows; ++i)
pData[i*stepSrc] = std::pow(pData[i*stepSrc],3);
}
}

private:
Mat* src;
};

struct parallelTestInvoker//构造一个供parallel_for使用的循环结构体
{
parallelTestInvoker(Mat& _src)//struct constructor
{
src = &_src;
}
void operator()(const BlockedRange& range) const//使用BlockedRange需要包含opencv2/core/internal.hpp
{
Mat& srcMat = *src;
int stepSrc = (int)(srcMat.step/srcMat.elemSize1());
for (int colIdx = range.begin(); colIdx < range.end(); ++colIdx)
{
float* pData = (float*)srcMat.col(colIdx).data;
for (int i = 0; i < srcMat.rows; ++i)
pData[i*stepSrc] = std::pow(pData[i*stepSrc],3);
}
}
Mat* src;
};
}//namesapce test
void parallelTestWithFor(InputArray _src)//'for' loop
{
CV_Assert(_src.kind() == _InputArray::MAT);
Mat src = _src.getMat();
CV_Assert(src.isContinuous());
int stepSrc = (int)(src.step/src.elemSize1());
for (int x = 0; x < src.cols; ++x)
{
float* pData = (float*)src.col(x).data;
for (int y = 0; y < src.rows; ++y)
pData[y*stepSrc] = std::pow(pData[y*stepSrc], 3);
}
};

void parallelTestWithParallel_for(InputArray _src)//'parallel_for' loop
{
CV_Assert(_src.kind() == _InputArray::MAT);
Mat src = _src.getMat();
int totalCols = src.cols;
typedef test::parallelTestInvoker parallelTestInvoker;
parallel_for(BlockedRange(0, totalCols), parallelTestInvoker(src));
};

void parallelTestWithParallel_for_(InputArray _src)//'parallel_for_' loop
{
CV_Assert(_src.kind() == _InputArray::MAT);
Mat src = _src.getMat();
int totalCols = src.cols;
typedef test::parallelTestBody parallelTestBody;
parallel_for_(Range(0, totalCols), parallelTestBody(src));
};
}//namespace cv

 
 
main.cpp

[cpp]
view plaincopyprint?

/**@ Test parallel_for and parallel_for_
/**@ Author: chouclee
/**@ 03/17/2013*/
#include <opencv2/opencv.hpp>

#include <time.h>

#include "test.hpp"
using namespace cv;
using namespace std;

int main(int argc, char* argv[])
{
Mat testInput = Mat::ones(40,400000, CV_32F);
clock_t start, stop;

start = clock();
parallelTestWithFor(testInput);
stop = clock();
cout<<"Running time using \'for\':"<<(double)(stop - start)/CLOCKS_PER_SEC*1000<<"ms"<<endl;

start = clock();
parallelTestWithParallel_for(testInput);
stop = clock();
cout<<"Running time using \'parallel_for\':"<<(double)(stop - start)/CLOCKS_PER_SEC*1000<<"ms"<<endl;

start = clock();
parallelTestWithParallel_for_(testInput);
stop = clock();
cout<<"Running time using \'parallel_for_\':"<<(double)(stop - start)/CLOCKS_PER_SEC*1000<<"ms"<<endl;

system("pause");
}

 

Result
输入为400000*40时,结果如下:

Debug模式

Running time using 'for': 1376ms

Running time using 'parallel_for': 1316ms

Running time using 'parallel_for_': 553ms

Release模式

Running time using 'for': 463ms

Running time using 'parallel_for': 475ms

Running time using 'parallel_for_': 301ms

输入改为40*400000

Debug模式

Running time using 'for': 1005ms

Running time using 'parallel_for': 1013ms

Running time using 'parallel_for_': 526ms

Release模式

Running time using 'for': 105ms

Running time using 'parallel_for': 106ms

Running time using 'parallel_for_': 81ms

输入改为4000*4000

Debug模式

Running time using 'for': 1138ms

Running time using 'parallel_for': 1136ms

Running time using 'parallel_for_': 411ms

Release模式

Running time using 'for': 234ms

Running time using 'parallel_for': 239ms

Running time using 'parallel_for_': 130ms

大多数情况下,parallel_for比for循环慢那么一丁丁点儿,有时甚至会比for循环快一些,总体上两者差不多,parallel_for_一直都是最快的。但上面的代码只是做测试使用(因此强制按列进行操作),实际上,像上面这种简单的操作,直接对Mat使用for循环和指针递增操作,只需要几十毫秒。但是,对于复杂算法,比如光流或之类的,使用parallel_for(虽然不是并行操作,但代码简洁易于维护,且速度和for循环差不多)或者parallel_for_将是不错的选择。

Reference:
http://answers.opencv.org/question/3730/how-to-use-parallel_for/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: