您的位置:首页 > 其它

MPI中可能会出现的错误

2010-05-20 14:11 253 查看
转自:

http://hi.baidu.com/linzch/blog/item/7e7d750e18329ec07acbe14f.html




1. p1_xxxxx: p4_error: interrupt SIGSEGV: 11

这个错误可能是因为某个进程中出现了段错误引起的,自己编程中曾出现过的错误:

a.只在一个进程中给指针申请空间,而在其他进程没有申请,所以在广播的时候出错。

b.数组内存的越界使用。




网上有个人说的很好:

"There are 2 things to check.

** Run one of the test programs like pi3.f or cpi.c to see whether your cluster's OK.

** if it is, the fault is in your code. See if you're exceeding array bounds or accessing memory which you haven't allocated, There's a SIGSEGV error - that's a segmentation violation. That might explain stuff like

bm_list_21829: p4_error: interrupt SIGINT: 2

Once you have a seg. violation, all the 4 processors are sent a signal to interrupt the process (SIGINT). Signals are defined in /usr/include/sys/signal.h (at least on the SGIs; might be

different on other systems). "







2. p1_10401: p4_error: : 14

1 - MPI_BCAST : Message truncated

[1] Aborting program !

[1] Aborting program!

这个也是由于mpi_bcast的接收空间不够引起的,要在mpi_bcast之前分配足够大的空间,这样就不会truncated了





3. p4_error: alloc_p4_msg failed:


p0_6773: (7.828703) xx_shmalloc: returning NULL; requested 1048616 bytesp0_6773: (7.828762) p4_shmalloc returning NULL; request = 1048616 bytes 内存空间没分配足,可以通过设置环境变量P4_GLOBMEMSIZE (in bytes)来增大程序需要的内存空间

export P4_GLOBMEMSIZE=32000000 (for bash users) setenv P4_GLOBMEMSIZE 32000000 (for csh or tcsh users)



4.libcprts.so.5: cannot open shared object file: No such file or directory


/home/jbrandt/tests/test.exe: error while loading shared libraries:libcprts.so.5: cannot open shared object file: No such file or directoryp0_792: p4_error: Child process exited while making connection to remoteprocess on compute-0-0.local: 0/opt/mpich/intel/bin/mpirun: line 1: 792 Broken pipe /home/jbrandt/tests/test.exe - p4pg /home/jbrandt/tests/PI646 -p4wd /home/jbrandt/tes

没有用-static静态的连接,用-static重新编译就好了
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐