Segmentation fault error when running in Hilbert

When I tried to run my code in Hilbert cluster with 4 nodes and 4 cpus each node, I kept on getting this error

p15_26112: p4_error: interrupt SIGSEGV: 11
rm_l_15_26139: (699.613281) net_send: could not write to fd=5, errno = 32
Initial Guess …
Start NL Poisson…
dUin : 0.00106498
dUin : 0.00104582
dUin : 0.00100422
dUin : 0.000719826
dUin : 0.000220362
dUin : 1.35221e-05
dUin : 4.5627e-08
p0_17202: p4_error: interrupt SIGx: 13
p15_26112: (699.625000) net_send: could not write to fd=5, errno = 32
p7_13549: (701.113281) net_send: could not write to fd=5, errno = 32
rm_l_13_26081: (699.851562) net_send: could not write to fd=5, errno = 32
p12_26025: (699.964844) net_send: could not write to fd=5, errno = 32
p9_2640: (700.449219) net_send: could not write to fd=5, errno = 32
p0_17202: (732.257812) net_send: could not write to fd=4, errno = 32
—————————–

I finally found out the reasons. It turns out I omitted the line in PBS script that specify the memory requirements. It seems that it can run in the head node but the other nodes will require -l mem=

so this is an example of scripts that I use:

#!/bin/sh
#PBS -N MPI_Job
#PBS -l nodes=4:ppn=4
#PBS -l ncpus=16
#PBS -l mem=2gb
#PBS -V
#PBS -o Output_File
#PBS -e Error_File
#PBS -l walltime=40:00:00

cd $PBS_O_WORKDIR
mpirun -np 16 -machinefile $PBS_NODEFILE ./a.out

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: