您的位置：首页 > 其它

数据任务并行计算研究

2017-12-29 23:42 519 查看

参考：https://software.intel.com/zh-cn/blogs/2011/12/02/400009299

参考：https://www.zhihu.com/question/21823699

参考：http://blog.csdn.net/hellochenlu/article/details/52370757

参考：https://baike.baidu.com/item/SIMD/3412835

参考：http://blog.csdn.net/xqch1983/article/details/8309683

参考：http://blog.csdn.net/augusdi/article/details/8806214

参考：http://blog.csdn.net/fengbingchun/article/details/20228667

参考：https://en.wikipedia.org/wiki/Task_parallelism

参考：https://zh.wikipedia.org/wiki/OpenMP

参考：https://en.wikipedia.org/wiki/Message_Passing_Interface

参考：http://www.ssc.net.cn/files/MPI%E7%BC%96%E7%A8%8B%E5%88%9D%E6%AD%A5.pdf

参考：http://blog.csdn.net/u012337841/article/details/16358547

(Owed by: 春夜喜雨 http://blog.csdn.net/chunyexiyu)
Datalevel parallelism（DLP）数据级并行：SIMD（Single Instruction Multiple Data），单指令多数据流，采用一个控制器来控制多个处理器，对一组数据（又称“数据向量”）中的每一个分别执行相同的操作从而实现空间上的并行性的技术。

INTEL处理器支持的SIMD技术包括MMX/SSE/AVX.

MMX提供了8个64bit的寄存器进行SIMD操作；

SSE系列提供了8个128bit的寄存器进行SIMD指令操作；

而最新的AVX指令则支持256bit的SIMD操作；

目前SIMD指令可以有四种方法进行使用分别是汇编语言，C++类，编译器Intrisincs和自动矢量化。

Tasklevel parallelism（TLP）任务级并行：聚焦于在不同的处理器上同时分配进程或线程执行的任务。典型的应用是Pipeline（流水线），把任务拆分成独立的模块分别执行。

Pipeline在计算上包括：Instructionpipelines / Graphics pipelines / Software pipelines / Http pipelines

Instructionlevel parallelism（ILP）指令级并行：是指处理器能同时处理多条指令。有两种并行的方法，一种是硬件级，一种是软件级。

使用OpenMP可以实现任务并行和数据并行；使用MPI也可以实现任务的并行，通过启动多进程执行任务的并行通信。

使用OpenCL可以实现数据级并行，它会把GPU使用起来，用于更大规模的计算并行（OpenCL把CPU，GPU都当成计算单元，可以有更多的处理单元，成本相对也高一些---例如加载数据到显存。）

下面是并行的一些样例代码：

1. MMX:

void addMMX(float*
a, float*
b, float* c,
int nSize) {
    __m64* pA = (__m64*)a;
    __m64* pB = (__m64*)b;
    __m64* pR = (__m64*)r;
    for (int i = 0; i < nSize /2; i++) {
        pR[i] =
_mm_add_pi32(pA[i], pB[i]);;
    }

}

2. SSE:

void mutiSSE(double* a, double* b, double* r, int nSize) {

    __m128d* pA = (__m128d*)a;
    __m128d* pB = (__m128d*)b;
    __m128d* pR = (__m128d*)r;
    for (int i = 0; i < nSize / 2; i++) {
        pR[i] = _mm_mul_pd(pA[i], pB[i]);
}

}

3. AVX:

void mutiAVX(double* a, double* b, double* r, int nSize) {

    __m256d* pA = (__m256d*)a;
    __m256d* pB = (__m256d*)b;
    __m256d* pR = (__m256d*)r;
    for (int i = 0; i < nSize / 4; i++) {
        pR[i] = _mm256_mul_pd(pA[i], pB[i]);
}

}

4. OpenMP: 需要打开编译选项/openmp

void mutiOMP(double* a, double* b, double* r, int nSize) {

    #pragma omp parallel for

    for (int i = 0; i < nSize; i++) {

        r = a[i] * b[i];

    }

}

5. MPI: Windows下需要下载MPI的SDK和运行库（可以下载MircosoftMPI）

int main(int
argc, char **argv)
{
    char buf[256];
    int nRank, nProcNum;

    /* Initialize the infrastructure necessary for communication */
    MPI_Init(&argc, &argv);
    /* Identify this process */
    MPI_Comm_rank(MPI_COMM_WORLD, &nRank);
    /* Find out how many total processes are active */
    MPI_Comm_size(MPI_COMM_WORLD, &nProcNum);

    /* Until this point, all programs have been doing exactly the same.
    Here, we check the rank to distinguish the roles of the programs */
    if (nRank == 0) {
        int nOtherProc;
        printf("We have %i processes.\n", nProcNum);
        /* Receive messages from all other process */
        for (nOtherProc = 1; nOtherProc < nProcNum; nOtherProc++)
        {
            MPI_Recv(buf, sizeof(buf),
MPI_CHAR, nOtherProc,
                0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
            printf("%s\n", buf);
        }
    }
    else {
        /* Send message to process #0 */
        sprintf(buf, "Process %i reporting for duty.", nRank);
        MPI_Send(buf, sizeof(buf),
MPI_CHAR, 0,
            0, MPI_COMM_WORLD);
    }

    /* Tear down the communication infrastructure */
    MPI_Finalize();
    return 0;
}

运行的时候：

mpiexec.exe -n 5 demo.exe

We have 5 processes.

Process 1 reporting for duty.

Process 2 reporting for duty.

Process 3 reporting for duty.

Process 4 reporting for duty.

(Owed by: 春夜喜雨 http://blog.csdn.net/chunyexiyu)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航