Problem running partition GPU (debug)

Primary informations

Username: humeau
Cluster: Baobad

I’m a recent user. On the Baobab server, I can’t run a script on a GPU partition. I wanted to run a debug and the request fails directly.

#!/bin/env bash

#SBATCH --partition=debug-gpu
#SBATCH --time=00:10:00
#SBATCH --output=journal-%j.out
#SBATCH --mem=10000
#SBATCH --gres=gpu:1,VramPerGpu:80GB
#SBATCH --cpus-per-task=12
module load cuDNN/8.6.0.163-CUDA-11.8.0 GCCcore/10.3.0 Python/3.9.5 OpenSSL/1.1.1q
srun python3.9 -m venv ~/gallicorpora_yolov8/.env
echo "Creation virtualenv"
source ~/gallicorpora_yolov8/.env/bin/activate
mkdir $HOME/tmp
TMPDIR=$HOME/tmp pip install comet-ml ultralytics

I’ve got this :

salloc: error: invalid partition specified: debug-gpu
salloc: error: Job submit/allocate failed: Invalid partition name specified

This seems to be a similar problem to this topic: Problem running any job

Thank you in advance for your help.

Hi @Maxime.Humeau

The partition debug-gpu is only present on Yggdrasil. If you really need to test on Baobab, you can use

#SBATCH --partition=shared-gpu
#SBATCH --gpus=1 
#SBATCH --time=00:15:00

By specifying 15 min and/or 10 for your task, you shouldn’t have to wait too long (depending on partition availability).

edit: We don’t have A100’s on Yggdrasil.

1 Like