we would like to use futhark (a language that automatically generates code for different languages (like c or opencl or cuda). Would it be possible to install the compiler on baobab? There exist different methods listed here:
According to this documentation, there is no advantage to use the pre-compiled snapshot. I would suggest to install it from source. It’s good you managed to install it on Baobab as user.
#!/bin/env bash
#SBATCH --partition=shared-gpu
#SBATCH --time=00:10:00
# #SBATCH --gres=gpu:pascal:1
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/llaurieriveros/futhark/lattice_boltzmann
module load CUDA
# see here for more samples:
# /opt/cudasample/NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/release/
# if you need to know the allocated CUDA device, you can obtain it here:
echo $CUDA_VISIBLE_DEVICES
time srun LB_generation_cuda 50 50 0.0 -0.00009 0.8 150
that would give me the error :
failed with error code 100 (no CUDA-capable device is detected)
when specifying “#SBATCH --gres=gpu:pascal:1”, running “sbatch rGpu.sh” would give:
sbatch: error: Batch job submission failed: Requested node configuration is not available
finally “#SBATCH --gre=gpu:titan:1” would work but is very very slow
$ srun --gres=gpu:pascal:1 --partition=shared-gpu hostname
srun: job 18524009 queued and waiting for resources
Maybe when you tried some nodes were in drain (not available) mode? Did you tried with the same partition? Be aware that some nodes are already migrated to CentOS 7. You can check with sinfo, the partition name contains the suffix -EL7
I’m not sure of the implication to add time in front of srun. I suggest to add date one line before srun and one line after.
Do you mind to test again and be sure to have always the same node?
You can fix the node you want with --nodelist=gpuXXX
It would be interessting to see if the issue is reproductible. As well, don’t allocate a node and run your job directly on the node after connecting through ssh because you won’t get your allocated gpu.