PyCUDA 学习笔记 -- pagelocked memory
2016-02-23 05:03
1126 查看
PyCUDA: pagelocked memory
In GPU Programming, we have to transfer data from CPU to GPU which might take a while.Normal ways of transferring data:
import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule import numpy as np a = np.random.randn(256, 256) a = a.astype a_gpu = cuda.mem_alloc(a.nbytes) cuda.memcpy_htod(a_gpu, a) ##following codes ...
transfer pagelocked host memory from host to device (aka. from CPU to GPU)
The above codes would take quite a long time to transfer the data, so we try the pagelocked ways.Background concerning the pagelocked memory and GPU pinned memory
CPU data allocations are pageable by default. The GPU cannot access data directly from pageable host memory, so when a data transfer from pageable host memory to device memory is invoked, the CUDA driver must first allocate a temporary page-locked, or “pinned”, host array, copy the data to the pinned array, and then transfer the data from the pinned array to device memory.Device Interface:
pycuda.driver. pagelocked_empty(shape, dtype, order=”C”, mem_flags=0)Allocate a pagelocked numpy.ndarray of shape, dtype and order
mem_flags: may be one of the values in host_alloc_flags. It may only be non-zero on CUDA 2.2 and newer:
The default one is equal to the cudaMallocHost(void
** ptr, size_t size) in CUDA, which allocates size bytes of host memory that is page-locked and accessible to the device.
PORTABLE:
The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.
DEVICEMAP:
Maps the allocation into the CUDA address space. This device pointer to the memory may be obtained by calling cudaHostGetDevicePoineer()
WRITECOMBINED:
Allocates the memory as write-combined(WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers that will be written by the CPU and read by the device via mapped pinned memory or host -> device transfers.
cuda.pagelocked_empty(shape, dtype, order="C") cuda.pagelocked_zeros( .. ) cuda.pagelocked_empty_like( array ) cuda.pagelocked_zero_like( array )
However, when I try to access the page-locked memory from CPU, it appears to be super slow
相关文章推荐
- CUDA搭建
- Some Notes of Caffe Installation
- 用python做GPU计算(1)——安装以及配置
- [硬件资讯]32nm Atom性能首曝:GPU性能三倍于今
- vtkGPUVolumeRayCastMapper (Examples)
- GPU(CUDA)学习日记(三)------ CUDA基本架构介绍以及编程入门!
- GPU通用计算调研报告
- 关于XenServer 6.0 GPU Passthrough使用注意事项
- 第一個 CUDA 程式
- GPU 的硬體架構
- 定义在GPU上的变量
- 详解GPU的常见参数及其对显卡的重要性
- GPU架构(续)
- GPU memory结构
- 美军方青睐GPU计算
- SGI将推Prism XL系统 集CPU和GPU为一体
- 获取当前IOS设备的CPU型号,CPU核数,GPU,GPU核数,屏幕分辨率,屏幕尺寸,PPI等信息
- 反编译ARB program to GLSL shader日记
- vs2008--CUDA环境配置
- 解决Driver/library version mismatch