Hello all,
I hope this message finds you well. I am currently facing some difficulties with getting PyTorch to work on GPU and I was wondering if anyone here could provide me with some guidance or assistance.
Here’s my sbatch script:
#!/bin/sh
#SBATCH --partition=private-teodoro-gpu
#SBATCH --time=0-00:15:00
#SBATCH --gpus=2
#SBATCH --cpus-per-task 4
#SBATCH --mem-per-cpu=16000
echo $CUDA_VISIBLE_DEVICES
module load Anaconda3/2022.05 CUDA/11.7.0
source activate nlu
echo $CUDA_VISIBLE_DEVICES
echo "python script"
srun python test.py
Here follows my test.py script:
import torch
print("GPUs in python script", list(range(torch.cuda.device_count())))
import os
command = "nvcc --version"
os.system(command)
command = "nvidia-smi"
os.system(command)
device = torch.device("cuda:0")
# Define the matrices
A = torch.randn(1000, 1000).to(device)
B = torch.randn(1000, 1000).to(device)
# Perform matrix multiplication on the GPU
C = torch.matmul(A, B)
# Move the result back to the CPU
C = C.cpu()
# Print the result
print(C)
Here follows the output:
0,1
0,1
python script
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
Wed Jun 14 14:22:48 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A |
| 0% 27C P8 26W / 370W| 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 On | 00000000:21:00.0 Off | N/A |
| 0% 27C P8 30W / 370W| 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
GPUs in python script [0, 1]
Traceback (most recent call last):
File "/home/users/y/yazdani0/NLU4EHR_cosim_ft/test.py", line 17, in <module>
A = torch.randn(1000, 1000).to(device)
File "/home/users/y/yazdani0/.conda/envs/nlu/lib/python3.10/site-packages/torch/cuda/__init__.py", line 229, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
Thank you.