OpenCL 初实践(1)矩阵相乘
2016-08-11 10:19
495 查看
对2000*2000的两个矩阵做矩阵相乘,
执行内核 设定的globalsize 是二维的,200*200,实验环境是centos OpenCL 2.0 AMD-APP (1800.5) AMD Accelerated Parallel Processing,
MaxItemSize=256,256,256,
故选取的globalsize global item size 是200,200.
size_t globalSize[2] = {200,200};
size_t localSize[1] = {200};
err = clEnqueueNDRangeKernel(commands, kernel, 2, NULL, globalSize, NULL, 0, NULL, NULL);
核函数:
__kernel void mmult(__global int* a, __global int* b, __global int* output)
{
int width = get_global_id(1);
int height = get_global_id(0);
int num_size = get_global_size(0);
int rank = 2000;
for (int total = 1;total<=10;total++)
{
int running = 0;
for(int num =0;num<2000;num++)
{
int aIndex = width*rank+num;
int bIndex = num*rank + height*10+total-1;
running += a[aIndex]*b[bIndex];
}
output[width*rank+height*10+total-1] = running;
}
return;
}
执行内核 设定的globalsize 是二维的,200*200,实验环境是centos OpenCL 2.0 AMD-APP (1800.5) AMD Accelerated Parallel Processing,
MaxItemSize=256,256,256,
故选取的globalsize global item size 是200,200.
size_t globalSize[2] = {200,200};
size_t localSize[1] = {200};
err = clEnqueueNDRangeKernel(commands, kernel, 2, NULL, globalSize, NULL, 0, NULL, NULL);
核函数:
__kernel void mmult(__global int* a, __global int* b, __global int* output)
{
int width = get_global_id(1);
int height = get_global_id(0);
int num_size = get_global_size(0);
int rank = 2000;
for (int total = 1;total<=10;total++)
{
int running = 0;
for(int num =0;num<2000;num++)
{
int aIndex = width*rank+num;
int bIndex = num*rank + height*10+total-1;
running += a[aIndex]*b[bIndex];
}
output[width*rank+height*10+total-1] = running;
}
return;
}
相关文章推荐
- 【转载】OpenCL实现矩阵相乘
- C++实战之OpenCL矩阵相乘
- OpenCL例程3-矩阵相乘
- C++实战之OpenCL矩阵相乘优化(二)
- openCL-矩阵相乘
- HDOJ Matrix multiplication 4920【矩阵相乘】
- 第14周实践 矩阵乘法
- hdu 4920 矩阵相乘
- 算法实践篇-多矩阵乘法最优次序-动态规划
- 稀疏矩阵的三元组顺序表存储及矩阵相乘算法小结
- 程序算法艺术与实践:递归策略之矩阵乘法问题
- HDOJ 4291 A Short problem(Fib矩阵相乘)
- pku 3070 Fibonacci 矩阵快速幂相乘求Fibonacci 数列
- poj 3070 快速幂 矩阵相乘
- 算法提高 矩阵相乘
- 数据结构上机实践第八周项目8-稀疏矩阵的三元组表示的实现及应用
- 【神经网络与深度学习】【C/C++】比较OpenBLAS,Intel MKL和Eigen的矩阵相乘性能
- 去年的算法作业题_01背包_矩阵相乘
- C++两个矩阵相乘
- 矩阵相乘(c)