您的位置：首页 > 其它

CUDA使用CUDAArray的纹理

2012-11-07 22:22 225 查看

/article/5546674.html

简单范例

接下来，还是用实例来看吧∼这边是用 CUDA Array 的 texture来做 transpose 的动作。[程式原始码下载]
首先，main() 的内容如下：

void main( int argc, char** argv )

{

int w    = 1920,

h    = 1200;

//Setup test data

unsignedchar *aSrc = new unsigned char[ w * h ],

*aRS1 = new unsigned char[ w * h ],

*aRS2 = new unsigned char[ w * h ];

for( int i =0; i < w * h ; ++ i )

aSrc[i] = i % 256;
//CPU code

Transpose_CPU( aSrc, aRS1, w, h );
//GPU Code

Transpose_GPU( aSrc, aRS2, w, h );
//check

for( int i =0; i < w * h; ++ i )

if( aRS1[i] != aRS2[i] )

{

printf( "Error!!!!" );

break;

}

}

一样很简单，就先宣告出原始资料的 aSrc，还有转置过的资料 aRS1 和 aRS2；然后在原始资料 aSrc 中，填入一些值。（以此例，aSrc 应该是1920*1200，aRS1 和 aRS2 应该是1200 * 1920；不过由于在宣告成一维阵列时没差别，所以没特别去修改。）
而接下来，就是分别跑 CPU 版和 GPU 版的程式，并比较两者的结果了∼而CPU 版的函式Transpose_CPU() 内容如下：

void Transpose_CPU( unsigned char* sImg, unsigned char *tImg,

int w, int h )

{

int x, y, idx1, idx2;

for( y = 0; y < h; ++ y )

for( x = 0; x < w; ++ x )

{

idx1 = y * w + x;

idx2 = x * h + y;

tImg[idx2] = sImg[idx1];

}

}

内容应该不用多加解释了∼总之，就是根据方向的不同，采取不同的方法计算出 idx1 和 idx2两个记忆体空间的索引值，以此来把资料由 sImg 複制到 tImg，藉此做到转置的动作。
而 Transpose_GPU() 所在的.cu 档，内容则如下：

#define BLOCK_DIM 16  texture<unsigned char, 2, cudaReadModeElementType> rT;  extern "C"

void Transpose_GPU( unsigned char* sImg, unsigned char *tImg,

int w, int h );

__global__

void Transpose_Texture( unsigned char* aRS, int w, int h )

{

int idxX = blockIdx.x * blockDim.x + threadIdx.x,

idxY = blockIdx.y * blockDim.y + threadIdx.y;

if( idxX < w && idxY < h )

aRS[ idxX * h + idxY ] = tex2D( rT, idxX, idxY );

}

void Transpose_GPU( unsigned char* sImg, unsigned char *tImg,

int w, int h )

{

// compute the size of data

int data_size = sizeof(unsigned char) * w * h;

// part1a. prepare the result data

unsigned char *dImg;

cudaMalloc( (void**)&dImg, data_size );

// part1b. prepare the source data

cudaChannelFormatDesc chDesc = cudaCreateChannelDesc<unsigned char>();

cudaArray* cuArray;

cudaMallocArray(&cuArray, &chDesc, w, h);

cudaMemcpyToArray( cuArray, 0, 0, sImg, data_size,

cudaMemcpyHostToDevice );

cudaBindTextureToArray( rT, cuArray );

// part2. run kernel

dim3 block( BLOCK_DIM, BLOCK_DIM ),

grid( ceil( (float)w / BLOCK_DIM), ceil( (float)h / BLOCK_DIM) );

Transpose_Texture<<< grid, block>>>( dImg, w, h );

// part3. copy the data from device

cudaMemcpy( tImg, dImg, data_size, cudaMemcpyDeviceToHost );

// par4. release data

cudaUnbindTexture( rT );

cudaFreeArray( cuArray );

cudaFree( dImg ); }

首先，之前也有提过了，目前的 CUDA似乎只允许把 texture 宣告在file-scope，所以一开始就要宣告一个 2D texture 来当输入资料；说实话，对于这点 Heresy觉得实在不是很方便。
接下来，直接看 main() 所呼叫的 Transpose_GPU() 吧∼他做的内容如下：

先把所需要的记忆体大小计算出来
[part1a] 宣告 dImg，并指派记忆体位址给 dImg 来储存计算后的结果。
[part1b] 建立 CUDAarray cuArray、派记忆体位址，将资料由 hostmemory（sImg）複制到 devicememory（cuArray）；并透过cudaBindTextureToArray() 将 rT 和 cuArray做联繫。
[part2] 呼叫 kernelfunction：Transpose_Texture() 来进行计算。在这边，threadblock 的大小是定义为 BLOCK_DIM*BLOCK_DIM(16*16)，grid 的大小则是根据宽和高来除以 block的大小。
[part3] 将结果由 device memory（dImg）複制回 hostmemory（tImg）。
[part4] 透过 cudaUnbindTexture() 将 rT 和 sImg 间的联繫解除，并使用cudaFreeArray()、cudaFree() 将device memory 释放掉。

而本程式的 kernelfunction Transpose_Texture() 内，则是直接透过blockIdx、blockDim、threadIdx 这三个变数，计算出二维中的位置，并在x、y 都没有超过范围时，进行资料转置的複制，把(idxX, idxY)的资料，透过 tex2D() 取出，储存到aRS[idxX
* h + idxY ]。

到此为止，应该是使用 CUDA 2D texture 最基本的方法了∼实际上正如在 part.1 时所提及的，使用 CUDA Array 的texture 其实还有一些额外的功能可以使用！而除了 high-level 的使用外，也还有low-level、更细节的功能可以使用∼不过这边就暂时不提了∼之后有空再说吧。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航