[Tutorial] See your past computation usage using openxdmod

Dear all, ever wondered how many CPU or GPU hour you consumed on Baobab service?

We have a tool for that: https://openxdmod.hpc.unige.ch

In this tutorial, I’ll explain you how to extract your past year CPU hours usage as PI (i.e. for all your users). Follow those simple steps to do so:

  1. When you open the site, you are seeing a summary dashboard. First click on the usage tab (1) as shown on the picture.

  1. Select the metric you are interested by (1). In our example we are focusing on CPU hours, you may be interested by GPU hours too.

  2. Select a start and end date or use the predefined values in the dropdown menu (2)

  1. Click with the left button of your mouse on a dot on the graphic (1) and select by which metric you want to drill-down the data. In this example, we’ll do it by PI (2). You can of course select another field if you are interested for example by usage per user.

  1. We are interested by the usage of one PI. To filter the data, click on the filter icon (1) and select the PI (2) you are interested by and validate (3). As the list may be long you can use the search toolbox for easier lookup.

image

  1. By default the agregation unit is aumatic and depend on the time frame you are looking at. You can change the aggregation unit manually, for example to see your past usage with a sum by month. (1) and (2)

  1. If you prefer, you display your usage as tabular data instead of graphic as done in this example (1) and (2)

Don’t forget that your users are using CPU’s and probably GPU’s too. In this case, you need to start a second time the procedure with the GPU metric to get the GPU usage.

Feel free to comment this procedure.

Best

Yann

2 Likes

Hi @Yann.Sagon,

That’s awesome, thanks!

I have a few questions/comments :

  • I was wondering if we could have access to the number of users for one PI too (like when there are a lot of hours, is it because there were more people?)
  • Is it possible to have a cumulative plot on the interface, or we just have to download the data and do this on our own?
  • Finally, I was wondering if you would have an estimate of the “cost” of a CPU hour, in terms of electrical consumption (to be able to compute a corresponding CO2 emission!). I’m wondering if a rough order of magnitude could be estimated by looking at, per year, the consumption and just do “total electrical consumption/CPU+GPU hours”? (that’s assuming GPU and CPU hours are similar, which I guess not, but…)

Anyway, thanks again for the tutorial, very useful!

Best,
Emeline

@Emeline.Bolmont

Thank you for your feedback!

Yes, it is possible to have the detail per user of a given PI. After viewing the usage per PI, just left click on the graph again (step1) and select user entry (step2).

About power consumption, we have this chapter in our documentation, but unfortunately it is not very useful.

I’ll try to provide a table of kW per CPUh for some of the compute node models we have soon.

I didn’t found the info how to have cumulative plots. Maybe in the admin interface. I’ll check as well.

Thank you so much @Yann.Sagon
That helps a lot!
Looking forward to the power consumption estimates! In the meantime, I’m using 150W (hopefully it’s a good ballpark estimate).
For the cumulative plot, there’s no worries, I’ve done it myself with the table from the webpage.

Hello there !

I’m experiencing difficulty accessing opxenxdmod. The page isn’t loading for me in several browsers (Brave and Firefox on Arch Linux, and Brave on mobile) across different networks (UNIGE and 5G).

I wanted to have a look on opxenxdmod to monitor core usage during my workloads, which could help me optimize resource requests. htop isn’t providing the level of detail I need.

Could you please let me know if others are experiencing similar issues or provide a fix? Thank you for your time and assistance. And if openxdmod is reserved to PI, could you please mention that in the docs ?

Have a nice week!

Dear @Cyrus.Brueggimann

We were adding SSO (single sign on, to use your UNIGE ISIS account) authentification to openxdmod the past days and this has unfortunately created some instability. The instance is now working again, and as extra bonus you can now login to it with your account in case you want to create custom reports that you want to save for example.

OpenXDMoD is useful to see your past usage only, it isn’t working as realtime analytic.

Give a try to the CLI seff.

1 Like

Thanks a lot for the useful answer. I finally ssh into the compute node and used htop there. I also added some logging tools in my python script to get more info if I can not look into htop regularly.
By the way, seff provided me not useful info at all. Htop on the compute node :


seff <job.id> from the login node while my script is running :

Job ID: 13864590
Cluster: baobab
User/Group: bruggim9/hpc_users
State: RUNNING
Nodes: 1
Cores per node: 64
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 17:57:20 core-walltime
Job Wall-clock time: 00:16:50
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 1000.00 GB (1000.00 GB/node)
WARNING: Efficiency statistics may be misleading for RUNNING jobs.

I do not know if this is normal or not ?

Dear @Cyrus.Brueggimann

Indeed, seff is for finished job only. Using htop is a good practice.

By the way: you requested 1TB of RAM and you are using 230G, is this correct? You requested as well 64 cores and only ~2 are in use.

Good afternoon Yann. For your info (and not getting bashed about irresponsible resource usage).

It was at the beginning of my script, but it ramps up until almost 1000 GB of RAM is used (I still store chunks in the ROM while the RAM is full).
As for CPU usage, it is very intermittent (maybe because of I/O limitations ?), even after setting up Dask for hyperthreading. It is also recommended in the HPC docs to request the full available cpus in a node if the whole RAM is needed, which makes sense to me, especially in the bigmem partition.

Have a nice day !

No bashing intended! I was asking just in case it wasn’t intentional but it seems it is needed, so perfect like that.

If you use the whole RAM of a compute node, the justification of using all the CPUs is that anyway nobody will be able to use the remaining CPUs. In this case, if your job can benefit to have more CPU, better request them. If your job uses only let say two cores and adding more CPU doesn’t speedup things, it may be better to not request more CPUs than needed as next year we’ll bill by CPU usage.

1 Like