Mpiexec.hydra to openmpi

Hello, I have a sbatch script that is thought for a system with mpiexec.hydra, but Baobab has only the standard OpenMPI executable installed, so, as I haven’t found nothing online I would like to ask if there is someone able to translate my hydra script in a openmpi one:

mpiexec.hydra -f hostfile -configfile configfile

where hostfile is the name of the nodes that will be used:

gpu013
gpu013
gpu013
gpu013
gpu013
gpu013
gpu013
gpu013

and configfile:

-np 1 -env CUDA_VISIBLE_DEVICES 0 python run.py  :
-np 1 -env CUDA_VISIBLE_DEVICES 1 python run.py  :
-np 1 -env CUDA_VISIBLE_DEVICES 2 python run.py :
-np 1 -env CUDA_VISIBLE_DEVICES 3 python run.py :
-np 1 -env CUDA_VISIBLE_DEVICES 4 python run.py  :
-np 1 -env CUDA_VISIBLE_DEVICES 5 python run.py  :
-np 1 -env CUDA_VISIBLE_DEVICES 6 python run.py :
-np 1 -env CUDA_VISIBLE_DEVICES 7 python run.py 

This two files are created on the fly by a script that takes the needed information from srun (GitHub - choderalab/clusterutils: Utilities for running parallel jobs with Torque/Moab and MPI)

I need to do this workaround because OpenMM has some problems in using multiple GPUs on clusters, specially when performing replica exchange runs, see What's the OpenMM's mechanism of finding GPU device id? · Issue #2000 · openmm/openmm · GitHub

Hi,

if I understand well, you need to have eight mpi task launched and each task need to have one gpu?

you can try something like that:

#!/bin/sh
#SBATCH --job-name=test
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=8
#SBATCH --partition=debug-gpu
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=verbose,single:1

ml GCC/10.2.0 CUDA/11.1.1

sbatch run.py
1 Like

Hello, thank you very much with your help, but by doing so it dies for some kind of MPI error:

*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu006.cluster:15067] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu006.cluster:15064] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu006.cluster:15069] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu006.cluster:15070] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu006.cluster:15068] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu006.cluster:15063] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu006.cluster:15065] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[gpu006.cluster:15066] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
srun: error: gpu006: tasks 0-2,4-7: Exited with exit code 1
srun: error: gpu006: task 3: Exited with exit code 1

The gpu binding output is this:

gpu-bind: usable_gres=0x2; bit_alloc=0xFF; local_inx=8; global_list=1; local_list=1
gpu-bind: usable_gres=0x8; bit_alloc=0xFF; local_inx=8; global_list=3; local_list=3
gpu-bind: usable_gres=0x1; bit_alloc=0xFF; local_inx=8; global_list=0; local_list=0
gpu-bind: usable_gres=0x10; bit_alloc=0xFF; local_inx=8; global_list=4; local_list=4
gpu-bind: usable_gres=0x4; bit_alloc=0xFF; local_inx=8; global_list=2; local_list=2
gpu-bind: usable_gres=0x20; bit_alloc=0xFF; local_inx=8; global_list=5; local_list=5
gpu-bind: usable_gres=0x40; bit_alloc=0xFF; local_inx=8; global_list=6; local_list=6
gpu-bind: usable_gres=0x80; bit_alloc=0xFF; local_inx=8; global_list=7; local_list=7

I have no idea what’s going on, maybe python’s mpi4py isn’t able to deal with gpu binding?

Do you initialize some lib/var before launching run.py? You should probably load a python version using module first.
Can you give us some insight of the content of run.py?

1 Like

The modules I use are:

module purge
module load fosscuda/2019b Doxygen SWIG Anaconda3 CMake cuDNN/7.6.5.32

in the sbatch scripts then I activate my conda environment and then use the absolute path to the python executable in order to be sure that the right one will be used

The script I run uses openmmtools.multistate.multistatesampler from openmmtools GitHub - choderalab/openmmtools: A batteries-included toolkit for the GPU-accelerated OpenMM molecular simulation engine.

This multistatesampler API is the one using MPI, it uses it through another package by chderalab: GitHub - choderalab/mpiplus: Utilities to run on MPI. that uses mpi4py in order to parallelize the code.

They are having some problems with this, but they are of a different kind of mine MPI bug when multiple GPUs are used per calculation · Issue #449 · choderalab/openmmtools · GitHub

I am using python3.7

I’m afraid your mpi4py isn’t linked against OpenMPI.

May I ask you to try first with mpi4py we have on the clusters?

Here is a working helloworld.

Then try the same example with your version of mpi4py and see if it works.

1 Like

Exactly as you said, if I run the example with the mpi4py in my conda environment I get the error, if I load the module and use the default python like in the example it works

Is there a way to force my conda env to use the mpi4py version that is installed on the cluster? Or to link its mpi4py to the OpenMPI library on the cluster?

I’m no conda expert but I can have a look with you next week. If you are intererested let me know, I will contact you to schedule an appointement.

1 Like

Thank you very much!
I sent you a private message here on the HPC community

Hi, after a lot of digging and searching I finally found a solution python - Cannot install mpi4py using conda AND specify pre-installed mpicc path - Stack Overflow

Instead of installing mpi4py with conda (that installs a precompiled version) you have to install it via pip specifying the path to the mpicc compiler
env MPICC=/path/to/mpicc pip install mpi4py
(be sure to use the pip executable of your conda env and not the system one)

In this way mpi4py will link to the system libraries and everything works smoothly

ps if in the future someone may need to run openmm with cuda and is not sure if it’s cuda installation or the set of loaded modules is right there are 2 tests to do

srun python -m simtk.testInstallation
srun python -c 'import simtk.openmm as m; print(m.Platform.getPluginLoadFailures())'

The latter is needed to check for missing libraries

for openmm 7.5 from conda-forge this modules are working for me fosscuda/2019b Doxygen SWIG Anaconda3 CMake cuDNN/7.6.5.32 (the last one is pretty important)

Hello,

that’s great!

Best

1 Like