It’s now already the end of the year 2020, and a lot of things happened on the HPC systems during this year!
Some of the changes are visible, some not.
Important topics for year 2020 in no particular order
- We have received Yggdrasil hardware at the beginning of the year.
- We are trying to meet the users and interested person through our monthly HPC-lunch (next one will be the 4th of February 2020)
- We have a brand new HPC documentation
- Yggdrasil configuration was done with the help of the Astro IT team
- Yggdrasil is now in production!
- New storage bay, 2PB on a G700, added to the NASAC, thanks to our colleagues from DiSTIC.
- We are now opening UNIGE HPC facilities to HES-SO/Genève and simplifying account requests thanks to the outsiders’ web interface and associated backend.
- Cluster authentication migrated on Active Directory and SSH key stored on the AD
- A lot of “private compute nodes” were added to Yggdrasil and Baobab, ranging from standard compute nodes to high-end GPU servers and machines with a lot of RAM.
- We created a ton of issues to manage our and your problems on HPC clusters.
- We solved a little less than a ton of issues, and still have 330+ still open! That should keep us busy for a couple more years!
- We were (and some of us are still) working from home. It is great that we have been able to do almost everything remotely without much issues, but we are missing to see our colleagues.
- The NASAC is now fully backed up on tapes and replicated on a secondary datacentre. It was necessary as UNIGE is a good target for Cybercriminals.
- We welcomed the new users from the former Vital-IT cluster infrastructure
- We migrated more than 550TB of data from Vital-IT to the NASAC
- We have improved compute nodes reliability with our health-check scripts at runtime and during (re)installation.
- We are maintaining GitLab up to date, there is now more than 1500 users on it.
- We are working on the life cycle management for HPC and GitLab users and data.
- We wrote with other colleagues a project proposal to create a Scientific IT support focusing on scientific research, to help users with their code and algorithms. More on this next year…
- We have improved our support response time. Most email, tickets and requests usually get a follow up in the next 24h.
- We also update you on the status of the cluster, almost in real time, when an important issue is detected : Current issues on Baobab
- We wrote a lot of scripts to monitor and to do NASAC accounting and invoicing.
- We installed a lot of software, new toolchain, new CUDA version etc.
- We introduced the new interactive and long run partitions
Roadmap for 2021:
- Renew Baobab with a brand new cluster!
- Continue to improve our current clusters Baobab and Yggdrasil, we are aiming at having a better resource usage.
- And a lot more of course!
We hope you enjoy using our clusters and that you will continue to trust us with your computations!
We wish you all the best, Merry Christmas and happy new year
Kudos if you are still reading!
HPC team, Yann, Luca, Massimo