Hi there,
Here we are:
- Pytorch only:
capello@login2:~/scratch/softs/p/pytorch (master)$ for I in {02..11}; do \
cat cuda_9.1.85_-_device_count.sbatch_-_slurm-197202${I}.out; \
echo; \
done
I: full hostname: gpu002.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu003.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu004.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu005.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu006.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu007.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu008.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu009.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu010.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
I: full hostname: gpu011.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
torch.cuda.device_count: 1
capello@login2:~/scratch/softs/p/pytorch (master)$
- Pytorch via Singularity:
capello@login2:~/scratch/softs/p/pytorch (master)$ for I in {22..31}; do \
cat cuda_9.2.148.1_-_matrix_zeros_-_singularity.sbatch_-_slurm-197207${I}.out; \
echo; \
done
I: full hostname: gpu002.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu003.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu004.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu005.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu006.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu007.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu008.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu009.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu010.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
I: full hostname: gpu011.cluster
I: CUDA_VISIBLE_DEVICES: 0
=====
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
capello@login2:~/scratch/softs/p/pytorch (master)$
Thus, the local tests are OK, I will check how to add them to our automatic-installation-is-finished-OK script to have them logged, at least for maintenances.
Thx, bye,
Luca
PS, some links to previous discussions: