Hi HPC team.
I was wondering if the fairshare is being updated? It looks like it is still the same for all users regardless of the amount they used the cluster?
Thanks,
Matt
Hi HPC team.
I was wondering if the fairshare is being updated? It looks like it is still the same for all users regardless of the amount they used the cluster?
Thanks,
Matt
Hi @Matthew.Leigh ,
The faishare calculation is quite complicated. Recently we discover some behavior we did not expected. We had opened a ticket to slurm to get more informations.
We are testing different configurations, but the fairshare calculation take times to propagate to the entire cluster.
Why ?
The faishare is a floating number that ranges between 0.000000 and 1.000000. This number depends on the usage of ALL users. When we change a configuration, slurm recalculates the priorities of ALL. On our side, to understand the impact, we have to change the configuration one by one (waiting for the complete recalculation) to be sure of the expected behavior.
The faishare priority value is not the same as the faishare value
The fairshare priority depend of the the faishare value
Currently
Your fairshare can completly change from day to another due to the option PriorityDecayHalfLife
which forget the half usage of each user after xx time during the given time period. We are manipulating the weight priority for the fairshare priority calculation to be as precise as possible., but we noticed slurm rounded the fairshare priority to zero.
What next ?
We are testing different weight to find the best. The sprio is going to print out weird faishare priority values because of the current calculation and will evolve according to the new calculation that it will make little by little.
When we changed something, first your faishare priority value will be maximal and will decrease after recalculation (and could be rounded to zero if your faishare is closed to 0.00000)
For example yesterday you saw everybody with 12k or 0 fairshare priority, but today after the recalculation:
we have different fairshare:
USER JOBID PARTITION PRIORITY SITE AGE ASSOC FAIRSHARE JOBSIZE PARTITION QOS TRES
oattia 60789168 private-a 20302 0 300 0 5000 2 15000 0
falkiewi 62594504 private-k 25749 0 2 0 10746 2 15000 0
leighm 62593365 private-d 22563 0 3 0 7558 2 15000 0
karrenbr 62580853 shared-gp 6818 0 24 0 3042 2 3750 0
The fairshare is one of our top priority for now but we can only slowly move forward on the issue due to the propagate time. (But we are close to the resolution )
Hope this explanation will help you.