GROMACS efficiency is low on Baobab

If you are asking for help, try to provide information that can help us solve your issue, such as :

what did you try:
I had running GROMACS on the baobab with the following script, but it had a low efficiency of 6.7 ns/day. My MD simulation system has about 600,000 atoms.
#!/bin/bash
#SBATCH --job-name=“*”
#SBATCH --mail-type=ALL
#SBATCH --mail-user= *
#SBATCH --time=12:00:00
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --partition=shared-gpu
#SBATCH --gpus=1
module load GCC/11.3.0
module load OpenMPI/4.1.4
module load GROMACS/2023.1-CUDA-11.7.0
export OMP_NUM_THREADS=8
srun gmx_mpi mdrun -deffnm * -s *.tpr -v -cpi *.cpt -noappend -pin on -nb gpu

what was the expected result:
It should be an efficiency of about 15 ns/day, compared with the simulation results of my colleague who had the same setting of gpu on yggdrasil and similar size of system.