Update on Baobab Maintenance Aug 15-16? + Persistent Post Maintenance GPU Issue

Hi there,

We provide two different tests:

  1. Pytorch only, i.e. everything is provided by the cluster:
    p/pytorch/cuda_9.1.85_-_device_count.sbatch · f172a4888789c8f8cdc9c97c5d36d47f5b68f789 · hpc / softs · GitLab
capello@login2:~/scratch/softs/p/pytorch (master)$ for I in gpu{002..011}; do \
    sbatch --nodelist=${I} --output=./cuda_9.1.85_-_device_count.sbatch_-_slurm-%j.out ./cuda_9.1.85_-_device_count.sbatch; \
 done
Submitted batch job 19720202
Submitted batch job 19720203
Submitted batch job 19720204
Submitted batch job 19720205
Submitted batch job 19720206
Submitted batch job 19720207
Submitted batch job 19720208
Submitted batch job 19720209
Submitted batch job 19720210
Submitted batch job 19720211
capello@login2:~/scratch/softs/p/pytorch (master)$ 
  1. Pytorch via Singularity:
    p/pytorch/cuda_-_matrix_zeros.py · f172a4888789c8f8cdc9c97c5d36d47f5b68f789 · hpc / softs · GitLab
capello@login2:~/scratch/softs/p/pytorch (master)$ ls -l pytorch.simg 
-rwxr-xr-x 1 capello unige 2637373471 Jul  2 16:53 pytorch.simg
capello@login2:~/scratch/softs/p/pytorch (master)$ for I in gpu{002..011}; do \
    sbatch --nodelist=${I} --output=./cuda_9.2.148.1_-_matrix_zeros_-_singularity.sbatch_-_slurm-%j.out ./cuda_9.2.148.1_-_matrix_zeros_-_singularity.sbatch; \
 done
Submitted batch job 19720722
Submitted batch job 19720723
Submitted batch job 19720724
Submitted batch job 19720725
Submitted batch job 19720726
Submitted batch job 19720727
Submitted batch job 19720728
Submitted batch job 19720729
Submitted batch job 19720730
Submitted batch job 19720731
capello@login2:~/scratch/softs/p/pytorch (master)$ 

I will report back once the jobs above have been completed.

Thx, bye,
Luca