Error: Unable to create a step for job

Joel.Schaer · December 20, 2023, 4:13pm

Hello,
I have encountered an issue trying to launch job arrays on baobab GPUs. Since this afternoon, I systematically get the same error message in the output file, which is the following:

srun: error: Unable to create step for job 6401239: Invalid Trackable RESource (TRES) specification

I didn’t have this problem before while using the same .sh file.

Primary informations

Username: schaerjo
Cluster: Baobab

Script file

The script file I used is the following:

#!/bin/env bash
#SBATCH --array=1-16%20
#SBATCH --partition=private-kruse-gpu
#SBATCH --time=1-00:00:00
#SBATCH --output=%A_%a.out
#SBATCH --mem=3000  
#SBATCH --gpus=ampere:1 
#SBATCH --constraint=DOUBLE_PRECISION_GPU

module load Julia

cd /home/users/s/schaerjo/scratch/ProjectMTW/Axisymmetric/LBM_mixed_forcing/Tip/serie5/

srun julia --optimize=3 /home/users/s/schaerjo/Code/ProjectMTW/Axisymmetric/LBM_mixed_forcing/Tip/serie5/Simulation.jl

Ludovic.Dumoulin · December 20, 2023, 11:24pm

Hello,

I have the same problem.

srun: error: Unable to create step for job 6401213: Invalid Trackable RESource (TRES) specification

Primary informations

Username: dumoulil
Cluster: Baobab

Script file

#!/bin/env bash
#SBATCH --array=1-1680%40
#SBATCH --partition=private-kruse-gpu,shared-gpu
#SBATCH --time=0-12:00:00
#SBATCH --output=%J.out
#SBATCH --mem=3000  
#SBATCH --gpus=ampere:1 
#SBATCH --constraint=DOUBLE_PRECISION_GPU

module load Julia

cd /srv/beegfs/scratch/users/d/dumoulil/Data/P-series/AdptDt/
srun julia --optimize=3 /home/users/d/dumoulil/Code/FFT_2D_P_AdptDt/2D.jl

Thank you for your help,
Best,
Ludovic

Gael.Rossignol · December 21, 2023, 9:59am

Ludovic.Dumoulin:

#!/bin/env bash
#SBATCH --array=1-1680%40
#SBATCH --partition=private-kruse-gpu,shared-gpu
#SBATCH --time=0-12:00:00
#SBATCH --output=%J.out
#SBATCH --mem=3000  
#SBATCH --gpus=ampere:1 
#SBATCH --constraint=DOUBLE_PRECISION_GPU

Dear Users,

I’m not able to reproduce the issue with following sbatch :

#!/bin/env bash
#SBATCH --partition=shared-gpu
#SBATCH --time=0-12:00:00
#SBATCH --output=%J.out
#SBATCH --mem=3000
#SBATCH --gpus=ampere:1
#SBATCH --constraint=DOUBLE_PRECISION_GPU

nvidia-smi

Executing it :

(baobab)-[rossigng@login2 ~]$ sbatch test.sbatch

I get correct output :

(baobab)-[rossigng@login2 ~]$ cat 6429061.out
Thu Dec 21 10:55:55 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          On  | 00000000:01:00.0 Off |                    0 |
| N/A   26C    P0              33W / 250W |      2MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Maybe a bug was present, but as yesterday we had an update of slurm could you please try again to check if all is working fine now?

Best regards,

Lucas.Bezio · December 21, 2023, 10:22am

Hello I still have the same error. I’m launching the same batch file as few days ago which was working well.

Joel.Schaer · December 21, 2023, 10:30am

Hello,

I just tried and I still have the same error.

Lucas.Bezio · December 21, 2023, 1:27pm

It works again for me

Ludovic.Dumoulin · December 21, 2023, 2:18pm

I had the same issue this morning around 1am.
I still have the same issue for most of my jobs; of the 800 jobs I started (which I cancelled before sending the 1680 jobs), only 5 were actually running.

srun: error: Unable to create step for job 6430121: Invalid Trackable RESource (TRES) specification

Joel.Schaer · December 21, 2023, 2:25pm

I also still have this issue.
I launched a job array and most of them resulted in the same error.
Only two of my jobs are running normally, on gpu020.

Yann.Sagon · December 21, 2023, 3:25pm

@Joel.Schaer and @Ludovic.Dumoulin

you seems to have a similar batch script. We’ll focus on your issue first.

identify a couple of job id that wasn’t working as expected
identify a couple of job id that was working as expected

Then you can check using sacct on which gpu node they were running:

sacct -j <jobid> -o node,jobid,state

Doing so, we’ll try to see if they crash on the same gpu node for example.

Please let us know how you submit your sbatch script (every command you type please). And let us know as well the full path to the sbatch and logs.

Thanks

Joel.Schaer · December 21, 2023, 4:24pm

Hello,
I launched a job array consisting of 16 jobs with ID: 6430844
The jobs 6430844_1 and 6430844_2 are now running normally on gpu020.
The rest have been launched on gpu030 and have failed.

To submit the script I use a Julia script that upload my code on the cluster and execute the command:

ssh schaerjo@login2.baobab.hpc.unige.ch cd $cluster_saving_directory && sbatch $sh_name

Where $cluster_saving_directory is the path to the folder where I save my data which is also where the .sh file is. In this case this path is

 scratch/ProjectMTW/Axisymmetric/LBM_mixed_forcing/Tip/serie5/

The relevant Julia package is OpenSSH_jll
Julia version: 1.9.3

Here is the sacct output.

(baobab)-[schaerjo@login2 ~]$ sacct -j 6430844 -o node,jobid,state
       NodeList JobID             State
--------------- ------------ ----------
         gpu020 6430844_1       RUNNING
         gpu020 6430844_1.b+    RUNNING
         gpu020 6430844_1.e+    RUNNING
         gpu020 6430844_1.0     RUNNING
         gpu020 6430844_2       RUNNING
         gpu020 6430844_2.b+    RUNNING
         gpu020 6430844_2.e+    RUNNING
         gpu020 6430844_2.0     RUNNING
         gpu030 6430844_3        FAILED
         gpu030 6430844_3.b+     FAILED
         gpu030 6430844_3.e+  COMPLETED
         gpu030 6430844_4        FAILED
         gpu030 6430844_4.b+     FAILED
         gpu030 6430844_4.e+  COMPLETED
         gpu030 6430844_5        FAILED
         gpu030 6430844_5.b+     FAILED
         gpu030 6430844_5.e+  COMPLETED
         gpu030 6430844_6        FAILED
         gpu030 6430844_6.b+     FAILED
         gpu030 6430844_6.e+  COMPLETED
         gpu030 6430844_7        FAILED
         gpu030 6430844_7.b+     FAILED
         gpu030 6430844_7.e+  COMPLETED
         gpu030 6430844_8        FAILED
         gpu030 6430844_8.b+     FAILED
         gpu030 6430844_8.e+  COMPLETED
         gpu030 6430844_9        FAILED
         gpu030 6430844_9.b+     FAILED
         gpu030 6430844_9.e+  COMPLETED
         gpu030 6430844_10       FAILED
         gpu030 6430844_10.+     FAILED
         gpu030 6430844_10.+  COMPLETED
         gpu030 6430844_11       FAILED
         gpu030 6430844_11.+     FAILED
         gpu030 6430844_11.+  COMPLETED
         gpu030 6430844_12       FAILED
         gpu030 6430844_12.+     FAILED
         gpu030 6430844_12.+  COMPLETED
         gpu030 6430844_13       FAILED
         gpu030 6430844_13.+     FAILED
         gpu030 6430844_13.+  COMPLETED
         gpu030 6430844_14       FAILED
         gpu030 6430844_14.+     FAILED
         gpu030 6430844_14.+  COMPLETED
         gpu030 6430844_15       FAILED
         gpu030 6430844_15.+     FAILED
         gpu030 6430844_15.+  COMPLETED
         gpu030 6430844_16       FAILED
         gpu030 6430844_16.+     FAILED
         gpu030 6430844_16.+  COMPLETED

Yann.Sagon · December 22, 2023, 8:13am

@Joel.Schaer thanks for the feedback.

I tested as Joel and as Ludovic to submit a job. I used almost your own sbatch script to submit a job on gpu030 requiring all the fourGPUs and it worked.

This is the sbatch:

(baobab)-[dumoulil@login2 test_sagon]$ cat bla.sh
#!/bin/env bash
#SBATCH --partition=private-kruse-gpu
#SBATCH --output=%J.out
#SBATCH --mem=40G
#SBATCH --cpus-per-task=4
#SBATCH --gpus=ampere:4
#SBATCH --constraint=DOUBLE_PRECISION_GPU

srun --mpi=pmi2 ./gpu_burn -tc -d 72000

Submit the job:

(baobab)-[dumoulil@login2 test_sagon]$ sbatch --nodelist=gpu030 bla.sh

Check the GPUs are doing something:

(baobab)-[root@gpu030 ~]$ nvidia-smi
Fri Dec 22 08:36:29 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          On  | 00000000:01:00.0 Off |                    0 |
| N/A   58C    P0             239W / 250W |  36275MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          On  | 00000000:42:00.0 Off |                    0 |
| N/A   58C    P0             243W / 250W |  36275MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          On  | 00000000:81:00.0 Off |                    0 |
| N/A   58C    P0             256W / 250W |  36275MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          On  | 00000000:C1:00.0 Off |                    0 |
| N/A   60C    P0             236W / 250W |  36275MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1855048      C   ...rs/d/dumoulil/test_sagon/./gpu_burn    36266MiB |
|    1   N/A  N/A   1855063      C   ...rs/d/dumoulil/test_sagon/./gpu_burn    36266MiB |
|    2   N/A  N/A   1855064      C   ...rs/d/dumoulil/test_sagon/./gpu_burn    36266MiB |
|    3   N/A  N/A   1855066      C   ...rs/d/dumoulil/test_sagon/./gpu_burn    36266MiB |
+---------------------------------------------------------------------------------------+

@Joel.Schaer when I connected to your account, I noticed that you preload several module. Not sure if this is an issue?

Do you know which CUDA version is bundled with Julia? You need version 11.0 minimum. And compute capability not bigger than 8.0.

We have created a reservation on gpu030 for Joel and Dumoulil and we rebooted gpu030. Can you please try to submit again a job on this gpu?.

sbatch --nodelist=gpu030 --reservation=schaerjo_9852 <yoursbatch>

Ludovic.Dumoulin · December 22, 2023, 9:29am

Hello,
Sorry I was in vacation yesterday,

sacct -j 6429959 -o node,jobid,state

       NodeList JobID             State
--------------- ------------ ----------
         gpu020 6429959_1    CANCELLED+
         gpu020 6429959_1.b+  CANCELLED
         gpu020 6429959_1.e+  COMPLETED
         gpu020 6429959_1.0      FAILED
         gpu020 6429959_2    CANCELLED+
         gpu020 6429959_2.b+  CANCELLED
         gpu020 6429959_2.e+  COMPLETED
         gpu020 6429959_2.0      FAILED
         gpu030 6429959_3        FAILED
         gpu030 6429959_3.b+     FAILED
         gpu030 6429959_3.e+  COMPLETED
         gpu030 6429959_4        FAILED
         gpu030 6429959_4.b+     FAILED
         gpu030 6429959_4.e+  COMPLETED
         gpu030 6429959_5        FAILED
         gpu030 6429959_5.b+     FAILED
         gpu030 6429959_5.e+  COMPLETED
         gpu030 6429959_6        FAILED
         gpu030 6429959_6.b+     FAILED
         gpu030 6429959_6.e+  COMPLETED
         gpu030 6429959_7        FAILED
         gpu030 6429959_7.b+     FAILED
         gpu030 6429959_7.e+  COMPLETED
         gpu030 6429959_8        FAILED
         gpu030  6429959_...      FAILED
         gpu030 6429959_39       FAILED
         gpu030 6429959_39.+     FAILED
         gpu030 6429959_39.+  COMPLETED
         gpu030 6429959_40       FAILED
         gpu030 6429959_40.+     FAILED
         gpu030 6429959_40.+  COMPLETED
         gpu028 6429959_41   CANCELLED+
         gpu028 6429959_41.+  CANCELLED
         gpu028 6429959_41.+  COMPLETED
         gpu028 6429959_41.0     FAILED
         gpu029 6429959_42   CANCELLED+
         gpu029 6429959_42.+  CANCELLED
         gpu029 6429959_42.+  COMPLETED
         gpu029 6429959_42.0  CANCELLED
         gpu032 6429959_43   CANCELLED+
         gpu032 6429959_43.+  CANCELLED
         gpu032 6429959_43.+  COMPLETED
         gpu032 6429959_43.0     FAILED
         gpu030 6429959_44       FAILED
         gpu030 6429959_44.+     FAILED
         gpu030 6429959_44.+  COMPLETED
         gpu030 6429959_45       FAILED
         gpu030 6429959_...      FAILED
         gpu030 6429959_571      FAILED
         gpu030 6429959_571+     FAILED
         gpu030 6429959_571+  COMPLETED
         gpu027 6429959_572  CANCELLED+
         gpu027 6429959_572+  CANCELLED
         gpu027 6429959_572+  COMPLETED
         gpu027 6429959_572+     FAILED
         gpu030 6429959_573      FAILED
         gpu030 6429959_573+     FAILED
         gpu030 6429959_573+  COMPLETED
         gpu030 6429959_...      FAILED
         gpu030 6429959_764      FAILED
         gpu030 6429959_764+     FAILED
...

It seems that only jobs on GPU030 are failling.

The CUDA version of julia is:

CUDA runtime 11.8, artifact installation
CUDA driver 12.3
NVIDIA driver 545.23.8

Libraries:
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 12.0.0+545.23.8

Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

4 devices:
  0: NVIDIA A100-PCIE-40GB (sm_80, 39.550 GiB / 40.000 GiB available)
  1: NVIDIA A100-PCIE-40GB (sm_80, 39.550 GiB / 40.000 GiB available)
  2: NVIDIA A100-PCIE-40GB (sm_80, 39.550 GiB / 40.000 GiB available)
  3: NVIDIA A100-PCIE-40GB (sm_80, 39.550 GiB / 40.000 GiB available)

(tested on GPU 030)

I’ll try to run my simulation using the reservation node.
Thank you for your help

Ludovic.Dumoulin · December 22, 2023, 9:57am

It seems that everything is working well. Most of the jobs are completed because they check if the data are complete or not and run the simulation if it is not. (Some simulations crashed due to scratch problems last week)

sacct -j 6467623 -o node,jobid,state
       NodeList JobID             State
--------------- ------------ ----------
         gpu030 6467623_1     COMPLETED
         gpu030 6467623_1.b+  COMPLETED
         gpu030 6467623_1.e+  COMPLETED
         gpu030 6467623_1.0   COMPLETED
         gpu030 6467623_2     COMPLETED
         gpu030 6467623_2.b+  COMPLETED
         gpu030 6467623_2.e+  COMPLETED
         gpu030 6467623_2.0   COMPLETED
         gpu030 6467623_3     COMPLETED
         gpu030 6467623_3.b+  COMPLETED
         gpu030 6467623_3.e+  COMPLETED
         gpu030 6467623_3.0   COMPLETED
         gpu030 6467623_4     COMPLETED
         gpu030 6467623_4.b+  COMPLETED
         gpu030 6467623_4.e+  COMPLETED
         gpu030 6467623_4.0   COMPLETED
         gpu030 6467623_5     COMPLETED
         gpu030 6467623_5.b+  COMPLETED
         gpu030 6467623_5.e+  COMPLETED
         gpu030 6467623_5.0   COMPLETED
         gpu030 6467623_6     COMPLETED
         gpu030 6467623_6.b+  COMPLETED
         gpu030 6467623_6.e+  COMPLETED
         gpu030 6467623_6.0   COMPLETED
         gpu030 6467623_7     COMPLETED
         gpu030 6467623_7.b+  COMPLETED
         gpu030 6467623_7.e+  COMPLETED
         gpu030 6467623_7.0   COMPLETED
         gpu030 6467623_8     COMPLETED
         gpu030 6467623_8.b+  COMPLETED
         gpu030 6467623_8.e+  COMPLETED
         gpu030 6467623_8.0   COMPLETED
         gpu030 6467623_9     COMPLETED
         gpu030 6467623_9.b+  COMPLETED
         gpu030 6467623_9.e+  COMPLETED
         gpu030 6467623_9.0   COMPLETED
         gpu030 6467623_10    COMPLETED
         gpu030 6467623_10.+  COMPLETED
         gpu030 6467623_10.+  COMPLETED
         gpu030 6467623_10.0  COMPLETED
         gpu030 6467623_11    COMPLETED
         gpu030 6467623_11.+  COMPLETED
         gpu030 6467623_11.+  COMPLETED
         gpu030 6467623_11.0  COMPLETED
         gpu030 6467623_12    COMPLETED
         gpu030 6467623_12.+  COMPLETED
         gpu030 6467623_12.+  COMPLETED
         gpu030 6467623_12.0  COMPLETED
         gpu030 6467623_13    COMPLETED
         gpu030 6467623_13.+  COMPLETED
         gpu030 6467623_13.+  COMPLETED
         gpu030 6467623_13.0  COMPLETED
         gpu030 6467623_14    COMPLETED
         gpu030 6467623_14.+  COMPLETED
         gpu030 6467623_14.+  COMPLETED
         gpu030 6467623_14.0  COMPLETED
         gpu030 6467623_15    COMPLETED
         gpu030 6467623_15.+  COMPLETED
         gpu030 6467623_15.+  COMPLETED
         gpu030 6467623_15.0  COMPLETED
         gpu030 6467623_16    COMPLETED
         gpu030 6467623_16.+  COMPLETED
         gpu030 6467623_16.+  COMPLETED
         gpu030 6467623_16.0  COMPLETED
         gpu030 6467623_17    COMPLETED
         gpu030 6467623_17.+  COMPLETED
         gpu030 6467623_17.+  COMPLETED
         gpu030 6467623_17.0  COMPLETED
         gpu030 6467623_18    COMPLETED
         gpu030 6467623_18.+  COMPLETED
         gpu030 6467623_18.+  COMPLETED
         gpu030 6467623_18.0  COMPLETED
         gpu030 6467623_19    COMPLETED
         gpu030 6467623_19.+  COMPLETED
         gpu030 6467623_19.+  COMPLETED
         gpu030 6467623_19.0  COMPLETED
         gpu030 6467623_20    COMPLETED
         gpu030 6467623_20.+  COMPLETED
         gpu030 6467623_20.+  COMPLETED
         gpu030 6467623_20.0  COMPLETED
         gpu030 6467623_21    COMPLETED
         gpu030 6467623_21.+  COMPLETED
         gpu030 6467623_21.+  COMPLETED
         gpu030 6467623_21.0  COMPLETED
         gpu030 6467623_22    COMPLETED
         gpu030 6467623_22.+  COMPLETED
         gpu030 6467623_22.+  COMPLETED
         gpu030 6467623_22.0  COMPLETED
         gpu030 6467623_23    COMPLETED
         gpu030 6467623_23.+  COMPLETED
         gpu030 6467623_23.+  COMPLETED
         gpu030 6467623_23.0  COMPLETED
         gpu030 6467623_24    COMPLETED
         gpu030 6467623_24.+  COMPLETED
         gpu030 6467623_24.+  COMPLETED
         gpu030 6467623_24.0  COMPLETED
         gpu030 6467623_25    COMPLETED
         gpu030 6467623_25.+  COMPLETED
         gpu030 6467623_25.+  COMPLETED
         gpu030 6467623_25.0  COMPLETED
         gpu030 6467623_26    COMPLETED
         gpu030 6467623_26.+  COMPLETED
         gpu030 6467623_26.+  COMPLETED
         gpu030 6467623_26.0  COMPLETED
         gpu030 6467623_27    COMPLETED
         gpu030 6467623_27.+  COMPLETED
         gpu030 6467623_27.+  COMPLETED
         gpu030 6467623_27.0  COMPLETED
         gpu030 6467623_28    COMPLETED
         gpu030 6467623_28.+  COMPLETED
         gpu030 6467623_28.+  COMPLETED
         gpu030 6467623_28.0  COMPLETED
         gpu030 6467623_29    COMPLETED
         gpu030 6467623_29.+  COMPLETED
         gpu030 6467623_29.+  COMPLETED
         gpu030 6467623_29.0  COMPLETED
         gpu030 6467623_30    COMPLETED
         gpu030 6467623_30.+  COMPLETED
         gpu030 6467623_30.+  COMPLETED
         gpu030 6467623_30.0  COMPLETED
         gpu030 6467623_31    COMPLETED
         gpu030 6467623_31.+  COMPLETED
         gpu030 6467623_31.+  COMPLETED
         gpu030 6467623_31.0  COMPLETED
         gpu030 6467623_32    COMPLETED
         gpu030 6467623_32.+  COMPLETED
         gpu030 6467623_32.+  COMPLETED
         gpu030 6467623_32.0  COMPLETED
         gpu030 6467623_33    COMPLETED
         gpu030 6467623_33.+  COMPLETED
         gpu030 6467623_33.+  COMPLETED
         gpu030 6467623_33.0  COMPLETED
         gpu030 6467623_34    COMPLETED
         gpu030 6467623_34.+  COMPLETED
         gpu030 6467623_34.+  COMPLETED
         gpu030 6467623_34.0  COMPLETED
         gpu030 6467623_35    COMPLETED
         gpu030 6467623_35.+  COMPLETED
         gpu030 6467623_35.+  COMPLETED
         gpu030 6467623_35.0  COMPLETED
         gpu030 6467623_36    COMPLETED
         gpu030 6467623_36.+  COMPLETED
         gpu030 6467623_36.+  COMPLETED
         gpu030 6467623_36.0  COMPLETED
         gpu030 6467623_37    COMPLETED
         gpu030 6467623_37.+  COMPLETED
         gpu030 6467623_37.+  COMPLETED
         gpu030 6467623_37.0  COMPLETED
         gpu030 6467623_38    COMPLETED
         gpu030 6467623_38.+  COMPLETED
         gpu030 6467623_38.+  COMPLETED
         gpu030 6467623_38.0  COMPLETED
         gpu030 6467623_39    COMPLETED
         gpu030 6467623_39.+  COMPLETED
         gpu030 6467623_39.+  COMPLETED
         gpu030 6467623_39.0  COMPLETED
         gpu030 6467623_40    COMPLETED
         gpu030 6467623_40.+  COMPLETED
         gpu030 6467623_40.+  COMPLETED
         gpu030 6467623_40.0  COMPLETED
         gpu030 6467623_41    COMPLETED
         gpu030 6467623_41.+  COMPLETED
         gpu030 6467623_41.+  COMPLETED
         gpu030 6467623_41.0  COMPLETED
         gpu030 6467623_42    COMPLETED
         gpu030 6467623_42.+  COMPLETED
         gpu030 6467623_42.+  COMPLETED
         gpu030 6467623_42.0  COMPLETED
         gpu030 6467623_43    COMPLETED
         gpu030 6467623_43.+  COMPLETED
         gpu030 6467623_43.+  COMPLETED
         gpu030 6467623_43.0  COMPLETED
         gpu030 6467623_44    COMPLETED
         gpu030 6467623_44.+  COMPLETED
         gpu030 6467623_44.+  COMPLETED
         gpu030 6467623_44.0  COMPLETED
         gpu030 6467623_45    COMPLETED
         gpu030 6467623_45.+  COMPLETED
         gpu030 6467623_45.+  COMPLETED
         gpu030 6467623_45.0  COMPLETED
         gpu030 6467623_46    COMPLETED
         gpu030 6467623_46.+  COMPLETED
         gpu030 6467623_46.+  COMPLETED
         gpu030 6467623_46.0  COMPLETED
         gpu030 6467623_47    COMPLETED
         gpu030 6467623_47.+  COMPLETED
         gpu030 6467623_47.+  COMPLETED
         gpu030 6467623_47.0  COMPLETED
         gpu030 6467623_48    COMPLETED
         gpu030 6467623_48.+  COMPLETED
         gpu030 6467623_48.+  COMPLETED
         gpu030 6467623_48.0  COMPLETED
         gpu030 6467623_49    COMPLETED
         gpu030 6467623_49.+  COMPLETED
         gpu030 6467623_49.+  COMPLETED
         gpu030 6467623_49.0  COMPLETED
         gpu030 6467623_50    COMPLETED
         gpu030 6467623_50.+  COMPLETED
         gpu030 6467623_50.+  COMPLETED
         gpu030 6467623_50.0  COMPLETED
         gpu030 6467623_51    COMPLETED
         gpu030 6467623_51.+  COMPLETED
         gpu030 6467623_51.+  COMPLETED
         gpu030 6467623_51.0  COMPLETED
         gpu030 6467623_52    COMPLETED
         gpu030 6467623_52.+  COMPLETED
         gpu030 6467623_52.+  COMPLETED
         gpu030 6467623_52.0  COMPLETED
         gpu030 6467623_53    COMPLETED
         gpu030 6467623_53.+  COMPLETED
         gpu030 6467623_53.+  COMPLETED
         gpu030 6467623_53.0  COMPLETED
         gpu030 6467623_54    COMPLETED
         gpu030 6467623_54.+  COMPLETED
         gpu030 6467623_54.+  COMPLETED
         gpu030 6467623_54.0  COMPLETED
         gpu030 6467623_55    COMPLETED
         gpu030 6467623_55.+  COMPLETED
         gpu030 6467623_55.+  COMPLETED
         gpu030 6467623_55.0  COMPLETED
         gpu030 6467623_56    COMPLETED
         gpu030 6467623_56.+  COMPLETED
         gpu030 6467623_56.+  COMPLETED
         gpu030 6467623_56.0  COMPLETED
         gpu030 6467623_57    COMPLETED
         gpu030 6467623_57.+  COMPLETED
         gpu030 6467623_57.+  COMPLETED
         gpu030 6467623_57.0  COMPLETED
         gpu030 6467623_58    COMPLETED
         gpu030 6467623_58.+  COMPLETED
         gpu030 6467623_58.+  COMPLETED
         gpu030 6467623_58.0  COMPLETED
         gpu030 6467623_59    COMPLETED
         gpu030 6467623_59.+  COMPLETED
         gpu030 6467623_59.+  COMPLETED
         gpu030 6467623_59.0  COMPLETED
         gpu030 6467623_60    COMPLETED
         gpu030 6467623_60.+  COMPLETED
         gpu030 6467623_60.+  COMPLETED
         gpu030 6467623_60.0  COMPLETED
         gpu030 6467623_61    COMPLETED
         gpu030 6467623_61.+  COMPLETED
         gpu030 6467623_61.+  COMPLETED
         gpu030 6467623_61.0  COMPLETED
         gpu030 6467623_62    COMPLETED
         gpu030 6467623_62.+  COMPLETED
         gpu030 6467623_62.+  COMPLETED
         gpu030 6467623_62.0  COMPLETED
         gpu030 6467623_63    COMPLETED
         gpu030 6467623_63.+  COMPLETED
         gpu030 6467623_63.+  COMPLETED
         gpu030 6467623_63.0  COMPLETED
         gpu030 6467623_64    COMPLETED
         gpu030 6467623_64.+  COMPLETED
         gpu030 6467623_64.+  COMPLETED
         gpu030 6467623_64.0  COMPLETED
         gpu030 6467623_65    COMPLETED
         gpu030 6467623_65.+  COMPLETED
         gpu030 6467623_65.+  COMPLETED
         gpu030 6467623_65.0  COMPLETED
         gpu030 6467623_66    COMPLETED
         gpu030 6467623_66.+  COMPLETED
         gpu030 6467623_66.+  COMPLETED
         gpu030 6467623_66.0  COMPLETED
         gpu030 6467623_67    COMPLETED
         gpu030 6467623_67.+  COMPLETED
         gpu030 6467623_67.+  COMPLETED
         gpu030 6467623_67.0  COMPLETED
         gpu030 6467623_68    COMPLETED
         gpu030 6467623_68.+  COMPLETED
         gpu030 6467623_68.+  COMPLETED
         gpu030 6467623_68.0  COMPLETED
         gpu030 6467623_69    COMPLETED
         gpu030 6467623_69.+  COMPLETED
         gpu030 6467623_69.+  COMPLETED
         gpu030 6467623_69.0  COMPLETED
         gpu030 6467623_70    COMPLETED
         gpu030 6467623_70.+  COMPLETED
         gpu030 6467623_70.+  COMPLETED
         gpu030 6467623_70.0  COMPLETED
         gpu030 6467623_71    COMPLETED
         gpu030 6467623_71.+  COMPLETED
         gpu030 6467623_71.+  COMPLETED
         gpu030 6467623_71.0  COMPLETED
         gpu030 6467623_72    COMPLETED
         gpu030 6467623_72.+  COMPLETED
         gpu030 6467623_72.+  COMPLETED
         gpu030 6467623_72.0  COMPLETED
         gpu030 6467623_73    COMPLETED
         gpu030 6467623_73.+  COMPLETED
         gpu030 6467623_73.+  COMPLETED
         gpu030 6467623_73.0  COMPLETED
         gpu030 6467623_74    COMPLETED
         gpu030 6467623_74.+  COMPLETED
         gpu030 6467623_74.+  COMPLETED
         gpu030 6467623_74.0  COMPLETED
         gpu030 6467623_75    COMPLETED
         gpu030 6467623_75.+  COMPLETED
         gpu030 6467623_75.+  COMPLETED
         gpu030 6467623_75.0  COMPLETED
         gpu030 6467623_76    COMPLETED
         gpu030 6467623_76.+  COMPLETED
         gpu030 6467623_76.+  COMPLETED
         gpu030 6467623_76.0  COMPLETED
         gpu030 6467623_77    COMPLETED
         gpu030 6467623_77.+  COMPLETED
         gpu030 6467623_77.+  COMPLETED
         gpu030 6467623_77.0  COMPLETED
         gpu030 6467623_78    COMPLETED
         gpu030 6467623_78.+  COMPLETED
         gpu030 6467623_78.+  COMPLETED
         gpu030 6467623_78.0  COMPLETED
         gpu030 6467623_79    COMPLETED
         gpu030 6467623_79.+  COMPLETED
         gpu030 6467623_79.+  COMPLETED
         gpu030 6467623_79.0  COMPLETED
         gpu030 6467623_80    COMPLETED
         gpu030 6467623_80.+  COMPLETED
         gpu030 6467623_80.+  COMPLETED
         gpu030 6467623_80.0  COMPLETED
         gpu030 6467623_81    COMPLETED
         gpu030 6467623_81.+  COMPLETED
         gpu030 6467623_81.+  COMPLETED
         gpu030 6467623_81.0  COMPLETED
         gpu030 6467623_82    COMPLETED
         gpu030 6467623_82.+  COMPLETED
         gpu030 6467623_82.+  COMPLETED
         gpu030 6467623_82.0  COMPLETED
         gpu030 6467623_83    COMPLETED
         gpu030 6467623_83.+  COMPLETED
         gpu030 6467623_83.+  COMPLETED
         gpu030 6467623_83.0  COMPLETED
         gpu030 6467623_84    COMPLETED
         gpu030 6467623_84.+  COMPLETED
         gpu030 6467623_84.+  COMPLETED
         gpu030 6467623_84.0  COMPLETED
         gpu030 6467623_85    COMPLETED
         gpu030 6467623_85.+  COMPLETED
         gpu030 6467623_85.+  COMPLETED
         gpu030 6467623_85.0  COMPLETED
         gpu030 6467623_86    COMPLETED
         gpu030 6467623_86.+  COMPLETED
         gpu030 6467623_86.+  COMPLETED
         gpu030 6467623_86.0  COMPLETED
         gpu030 6467623_87    COMPLETED
         gpu030 6467623_87.+  COMPLETED
         gpu030 6467623_87.+  COMPLETED
         gpu030 6467623_87.0  COMPLETED
         gpu030 6467623_88    COMPLETED
         gpu030 6467623_88.+  COMPLETED
         gpu030 6467623_88.+  COMPLETED
         gpu030 6467623_88.0  COMPLETED
         gpu030 6467623_89    COMPLETED
         gpu030 6467623_89.+  COMPLETED
         gpu030 6467623_89.+  COMPLETED
         gpu030 6467623_89.0  COMPLETED
         gpu030 6467623_90    COMPLETED
         gpu030 6467623_90.+  COMPLETED
         gpu030 6467623_90.+  COMPLETED
         gpu030 6467623_90.0  COMPLETED
         gpu030 6467623_91    COMPLETED
         gpu030 6467623_91.+  COMPLETED
         gpu030 6467623_91.+  COMPLETED
         gpu030 6467623_91.0  COMPLETED
         gpu030 6467623_92    COMPLETED
         gpu030 6467623_92.+  COMPLETED
         gpu030 6467623_92.+  COMPLETED
         gpu030 6467623_92.0  COMPLETED
         gpu030 6467623_93    COMPLETED
         gpu030 6467623_93.+  COMPLETED
         gpu030 6467623_93.+  COMPLETED
         gpu030 6467623_93.0  COMPLETED
         gpu030 6467623_94    COMPLETED
         gpu030 6467623_94.+  COMPLETED
         gpu030 6467623_94.+  COMPLETED
         gpu030 6467623_94.0  COMPLETED
         gpu030 6467623_95    COMPLETED
         gpu030 6467623_95.+  COMPLETED
         gpu030 6467623_95.+  COMPLETED
         gpu030 6467623_95.0  COMPLETED
         gpu030 6467623_96      RUNNING
         gpu030 6467623_96.+    RUNNING
         gpu030 6467623_96.+    RUNNING
         gpu030 6467623_96.0    RUNNING
         gpu030 6467623_97    COMPLETED
         gpu030 6467623_97.+  COMPLETED
         gpu030 6467623_97.+  COMPLETED
         gpu030 6467623_97.0  COMPLETED
         gpu030 6467623_98    COMPLETED
         gpu030 6467623_98.+  COMPLETED
         gpu030 6467623_98.+  COMPLETED
         gpu030 6467623_98.0  COMPLETED
         gpu030 6467623_99    COMPLETED
         gpu030 6467623_99.+  COMPLETED
         gpu030 6467623_99.+  COMPLETED
         gpu030 6467623_99.0  COMPLETED
         gpu030 6467623_100   COMPLETED
         gpu030 6467623_100+  COMPLETED
         gpu030 6467623_100+  COMPLETED
         gpu030 6467623_100+  COMPLETED
         gpu030 6467623_101   COMPLETED
         gpu030 6467623_101+  COMPLETED
         gpu030 6467623_101+  COMPLETED
         gpu030 6467623_101+  COMPLETED
         gpu030 6467623_102   COMPLETED
         gpu030 6467623_102+  COMPLETED
         gpu030 6467623_102+  COMPLETED
         gpu030 6467623_102+  COMPLETED
         gpu030 6467623_103   COMPLETED
         gpu030 6467623_103+  COMPLETED
         gpu030 6467623_103+  COMPLETED
         gpu030 6467623_103+  COMPLETED
         gpu030 6467623_104   COMPLETED
         gpu030 6467623_104+  COMPLETED
         gpu030 6467623_104+  COMPLETED
         gpu030 6467623_104+  COMPLETED
         gpu030 6467623_105   COMPLETED
         gpu030 6467623_105+  COMPLETED
         gpu030 6467623_105+  COMPLETED
         gpu030 6467623_105+  COMPLETED
         gpu030 6467623_106   COMPLETED
         gpu030 6467623_106+  COMPLETED
         gpu030 6467623_106+  COMPLETED
         gpu030 6467623_106+  COMPLETED
         gpu030 6467623_107   COMPLETED
         gpu030 6467623_107+  COMPLETED
         gpu030 6467623_107+  COMPLETED
         gpu030 6467623_107+  COMPLETED
         gpu030 6467623_108     RUNNING
         gpu030 6467623_108+    RUNNING
         gpu030 6467623_108+    RUNNING
         gpu030 6467623_108+    RUNNING
         gpu030 6467623_109   COMPLETED
         gpu030 6467623_109+  COMPLETED
         gpu030 6467623_109+  COMPLETED
         gpu030 6467623_109+  COMPLETED
         gpu030 6467623_110   COMPLETED
         gpu030 6467623_110+  COMPLETED
         gpu030 6467623_110+  COMPLETED
         gpu030 6467623_110+  COMPLETED
         gpu030 6467623_111   COMPLETED
         gpu030 6467623_111+  COMPLETED
         gpu030 6467623_111+  COMPLETED
         gpu030 6467623_111+  COMPLETED
         gpu030 6467623_112   COMPLETED
         gpu030 6467623_112+  COMPLETED
         gpu030 6467623_112+  COMPLETED
         gpu030 6467623_112+  COMPLETED
         gpu030 6467623_113   COMPLETED
         gpu030 6467623_113+  COMPLETED
         gpu030 6467623_113+  COMPLETED
         gpu030 6467623_113+  COMPLETED
         gpu030 6467623_114   COMPLETED
         gpu030 6467623_114+  COMPLETED
         gpu030 6467623_114+  COMPLETED
         gpu030 6467623_114+  COMPLETED
         gpu030 6467623_115   COMPLETED
         gpu030 6467623_115+  COMPLETED
         gpu030 6467623_115+  COMPLETED
         gpu030 6467623_115+  COMPLETED
         gpu030 6467623_116   COMPLETED
         gpu030 6467623_116+  COMPLETED
         gpu030 6467623_116+  COMPLETED
         gpu030 6467623_116+  COMPLETED
         gpu030 6467623_117   COMPLETED
         gpu030 6467623_117+  COMPLETED
         gpu030 6467623_117+  COMPLETED
         gpu030 6467623_117+  COMPLETED
         gpu030 6467623_118   COMPLETED
         gpu030 6467623_118+  COMPLETED
         gpu030 6467623_118+  COMPLETED
         gpu030 6467623_118+  COMPLETED
         gpu030 6467623_119   COMPLETED
         gpu030 6467623_119+  COMPLETED
         gpu030 6467623_119+  COMPLETED
         gpu030 6467623_119+  COMPLETED
         gpu030 6467623_120     RUNNING
         gpu030 6467623_120+    RUNNING
         gpu030 6467623_120+    RUNNING
         gpu030 6467623_120+    RUNNING
         gpu030 6467623_121   COMPLETED
         gpu030 6467623_121+  COMPLETED
         gpu030 6467623_121+  COMPLETED
         gpu030 6467623_121+  COMPLETED
         gpu030 6467623_122   COMPLETED
         gpu030 6467623_122+  COMPLETED
         gpu030 6467623_122+  COMPLETED
         gpu030 6467623_122+  COMPLETED
         gpu030 6467623_123   COMPLETED
         gpu030 6467623_123+  COMPLETED
         gpu030 6467623_123+  COMPLETED
         gpu030 6467623_123+  COMPLETED
         gpu030 6467623_124   COMPLETED
         gpu030 6467623_124+  COMPLETED
         gpu030 6467623_124+  COMPLETED
         gpu030 6467623_124+  COMPLETED
         gpu030 6467623_125   COMPLETED
         gpu030 6467623_125+  COMPLETED
         gpu030 6467623_125+  COMPLETED
         gpu030 6467623_125+  COMPLETED
         gpu030 6467623_126   COMPLETED
         gpu030 6467623_126+  COMPLETED
         gpu030 6467623_126+  COMPLETED
         gpu030 6467623_126+  COMPLETED
         gpu030 6467623_127   COMPLETED
         gpu030 6467623_127+  COMPLETED
         gpu030 6467623_127+  COMPLETED
         gpu030 6467623_127+  COMPLETED
         gpu030 6467623_128   COMPLETED
         gpu030 6467623_128+  COMPLETED
         gpu030 6467623_128+  COMPLETED
         gpu030 6467623_128+  COMPLETED
         gpu030 6467623_129   COMPLETED
         gpu030 6467623_129+  COMPLETED
         gpu030 6467623_129+  COMPLETED
         gpu030 6467623_129+  COMPLETED
         gpu030 6467623_130   COMPLETED
         gpu030 6467623_130+  COMPLETED
         gpu030 6467623_130+  COMPLETED
         gpu030 6467623_130+  COMPLETED
         gpu030 6467623_131   COMPLETED
         gpu030 6467623_131+  COMPLETED
         gpu030 6467623_131+  COMPLETED
         gpu030 6467623_131+  COMPLETED
         gpu030 6467623_132     RUNNING
         gpu030 6467623_132+    RUNNING
         gpu030 6467623_132+    RUNNING
         gpu030 6467623_132+    RUNNING
  None assigned 6467623_[13+    PENDING

Thank you,
Best wishes