您的位置：首页 > 编程语言 > C语言/C++

CUDA template kernel 与其他编译器合作编译

2016-12-11 20:58 211 查看

简介

在优化Kernel的时候，希望某些变量是常量，例如循环的次数相关的变量。如果次数限制是常量的话，编译器就可以将循环展开。展开的循环，会省掉一些判断，从而节省一些计算时间。

C++的模版中可以使用常量。但是我又不想所有的源代码都由

nvcc

来编译（其实C++的代码还是调用的host compiler），故此，我写这篇博客来提供一种方法。

代码实例

实例中有三个文件：

main.cpp

用host compiler来编译。

cuda_interfaces.cu

用

nvcc

来编译。

cuda_interfaces.h

是

cuda_interfaces.cu

的接口头文件。

// main.cpp
#include <iostream>
#include "cuda_interfaces.h"
int main(int argc, char** argv){
su::gpu_func<0>();
su::gpu_func<1>();
su::gpu_func<2>();
su::gpu_func<3>();
su::gpu_func<4>();
su::gpu_func<5>();
return EXIT_SUCCESS;
}

// cuda_interfaces.h
#ifndef __CUDA_INTERFACES_H__
#define __CUDA_INTERFACES_H__
namespace su{
template<int _s> void gpu_func();
}
#endif

// cuda_interfaces.cu
#include <host_defines.h>
#include <device_launch_parameters.h>

#include <iostream>
using namespace std;

namespace su{

template <int _s>
__global__ void kernel_func(int *data)
{
int x = threadIdx.x + blockIdx.x*blockDim.x;
if (x < _s){
data[x] = _s;
}
else{
data[x] = 0;
}
}

template<int _s> void gpu_func()
{
int n_threads = 32;
int *h_data = new int[n_threads];
int *d_data = NULL;
cudaMalloc(&d_data, n_threads*sizeof(int));

kernel_func<_s><<<1, n_threads>>>(d_data);
cudaMemcpy(h_data, d_data, n_threads*sizeof(int), cudaMemcpyDeviceToHost);

for (int i = 0; i < n_threads; i++){
cout << h_data[i] << " ";
}
cout << endl;

// release memory
delete[] h_data; h_data = NULL;
cudaFree(d_data); d_data = NULL;
}

// note _s only support 0,1,2,3,4,5
template void gpu_func<0>();
template void gpu_func<1>();
template void gpu_func<2>();
template void gpu_func<3>();
template void gpu_func<4>();
template void gpu_func<5>();
}

上述代码执行结果如下：

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 4 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 5 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

关键点

完成在模版中使用常量的关键在于在.cu文件里的声明，告诉

nvcc

要编译哪几个常量的函数！

Enjoy！

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： cuda c语言编译器 kernel

相关文章推荐

新的分享

章节导航