您的位置:首页 > 其它

CUDA学习笔记

2012-04-26 07:46 344 查看
1. About page-locked host memory / pinned memory:

(1) Restrict their use to memory that will be used as a source/destination in calls to cudaMemcpy() and freeing
them when they are no longer needed.

(2) When we use cudaMemcpyAsync(), we need to use page locked host memory.

2. About streams:

(1) Nvidia's GPU has two separate engines handling memory copies and kernel executions:Copy Engine & Kernel
Engine



Figure 1 : not efficient



Figure2 : efficient

Trick: queue operations in all streams in a breadth-first order instead of depth-first order

To be continued...
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: