Thursday, March 9, 2017

Submit parallel mkl openmp jobs (blupf90, yams) in genotoul cluster

The blupf90 software and my own one includes parallelised code using Intel fortran MKL and OPENMP facilities. These use several threads, usually witihin a single node (as they share memory). I was getting nuts to make this work in bioinfo.genotoul.fr until I went to see the Support people. One of the problems for me was that there are actually several grid engines so google does not always find the good solution.

Short message: use this:


alegarra@genotoul2 ~/work $ cat test_omp.sh

#!/bin/bash

#$ -pe parallel_smp 10
export OMP_NUM_THREADS=10
your_program

alegarra@genotoul2 ~/work $ qsub test_omp.sh

where 10 is the number of threads that you want.

Long explanation:


I have a simple example that just creates a big matrix and inverts it. Let's try a 1000 x 1000 matrix, and the program runs smoothly without any particular command or request for parallel:

$cat test_omp.sh
#!/bin/bash
echo "1000 1000" | /save/alegarra/progs/parallel_inverse/a.out
alegarra@genotoul2 ~/work $ qsub test_omp.sh

$cat test_omp.sh.o8160505
nanim,nsnpp
        1000        1000
 Z=rnd()
 GG=ZZ
    Dgemm MKL #threads=    20   40 Elapsed omp_get_time:     0.1530
 GG=GG/(tr/n)
 GG=0.95GG+0.05I
 GG^-1
    Inverse LAPACK MKL dpotrf/i #threads=   40   40 Elapsed omp_get_time:     0.1071

Epilog : job finished at Thu Mar  9 15:45:53 CET 2017

However, this fails if I try a larger 10000 x 10000 matrix:

OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.
OMP: Error #178: Function pthread_getattr_np failed:
OMP: System error #12: Cannot allocate memory
forrtl: error (76): Abort trap signal

The reason is that every thread allocates memory, so that this is too much. 

Then I try by requesting explicitly 4 threads and 1000 x 1000 matrix. This is doable by requesting the adequate environment:

-pe parallel_smp demands X cores on the same node (multi-thread, OpenMP) as explained in the docs.
Do not forget putting the SAME number in -pe parallel_smp and in export OMP_NUM_THREADS

alegarra@genotoul2 ~/work $ cat test_omp.sh
#!/bin/bash
export OMP_NUM_THREADS=4 
echo "1000 1000" | /save/alegarra/progs/parallel_inverse/a.out
alegarra@genotoul2 ~/work $ qsub -pe parallel_smp 4 test_omp.sh

and it does work:
alegarra@genotoul2 ~/work $ cat test_omp.sh.o8160541 
 nanim,nsnpp
        1000        1000
 Z=rnd()
 GG=ZZ
    Dgemm MKL #threads=     4    4 Elapsed omp_get_time:     0.0320
 GG=GG/(tr/n)
 GG=0.95GG+0.05I
 GG^-1
    Inverse LAPACK MKL dpotrf/i #threads=    4    4 Elapsed omp_get_time:     0.0230
Epilog : job finished at Thu Mar  9 16:01:30 CET 2017

now, try with 10000 x 10000:


alegarra@genotoul2 ~/work $ cat test_omp.sh.o8160558 

 nanim,nsnpp

       10000       10000
 Z=rnd()
 GG=ZZ
    Dgemm MKL #threads=     4    4 Elapsed omp_get_time:    45.8333
 GG=GG/(tr/n)
 GG=0.95GG+0.05I
 GG^-1
    Inverse LAPACK MKL dpotrf/i #threads=    4    4 Elapsed omp_get_time:    29.2803
Epilog : job finished at Thu Mar  9 16:05:11 CET 2017

and it also works. I can also put the -pe within the script:

alegarra@genotoul2 ~/work $ cat test_omp.sh

#!/bin/bash

#$ -pe parallel_smp 4
export OMP_NUM_THREADS=4
echo "10000 10000" | /save/alegarra/progs/parallel_inverse/a.out
alegarra@genotoul2 ~/work $ qsub test_omp.sh


Now, I try with 10 threads and it also works. Then I will try using yams:


A couple of tricks more. To know the parallel environments in the cluster and some characteristics:


alegarra@genotoul2 ~/work/mtr_2015/mf/methodR $ qconf -spl
parallel_10
parallel_20
parallel_40
parallel_48
parallel_fill
parallel_fill_amd
parallel_rr
parallel_rr_amd
parallel_smp
parallel_testq
alegarra@genotoul2 ~/work/mtr_2015/mf/methodR $ qconf -sp parallel_smp
pe_name            parallel_smp
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min

accounting_summary TRUE