Hi there,
we have installed new nodes on Baobab:
- gpu[020] (members of the shared-gpu partition)
capello@login2:~$ scontrol show Node=gpu020
NodeName=gpu020 Arch=x86_64 CoresPerSocket=64
CPUAlloc=0 CPUTot=64 CPULoad=0.01
AvailableFeatures=EPYC-7742,V8,COMPUTE_CAPABILITY_8_0,COMPUTE_TYPE_AMPERE
ActiveFeatures=EPYC-7742,V8,COMPUTE_CAPABILITY_8_0,COMPUTE_TYPE_AMPERE
Gres=gpu:ampere:2
NodeAddr=gpu020 NodeHostName=gpu020 Version=20.11.3
OS=Linux 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019
RealMemory=256000 AllocMem=0 FreeMem=251941 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=1500000 Weight=10 Owner=N/A MCS_label=N/A
Partitions=shared-gpu
BootTime=2021-04-15T11:11:41 SlurmdStartTime=2021-04-15T11:12:19
CfgTRES=cpu=64,mem=250G,billing=64
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Comment=(null)
capello@login2:~$
This is the first GPU node with an NVIDIA A100 card based on the Ampere architecture, two notes:
- CUDA 11 is the minimum required version (cf. CUDA 11 Features Revealed | NVIDIA Developer Blog )
- Slurm GRES and features updated (cf. hpc:hpc_clusters [eResearch Doc] )
Thx, bye,
Luca