CPU Affinity on Baobab for NVIDIA DALI

I’m trying to use Nvidia DALI on baobab through singularity. On my local machine with 2 GPUs I see a 4-5x speedup in data-preprocessing time for machine learning jobs. I would like to try to launch large scale jobs such as SimCLR which I worked on recently at Google. However, I am seeing an issue related to CPU affinity when running these jobs.

  • Here is the related github issue discussing with the Nvidia devs.

Is it possible from my side to change the affinity of a job? As an example, I have a pytorch distributed-data-parallel job spanning 13 nodes and the error occurs on a few of them:

As far as I can tell the taskset required by nvidia is not being adheared to:

I have tried a few settings requesting a variety of CPU configurations, i.e.:

#SBATCH --cpus-per-task=2
#SBATCH --cpus-per-task=3
#SBATCH --cpus-per-task=6

to no avail.

So after some more digging it looks like the affinity does match during spawning the job (i.e in the bash script), eg:

but once it creates a child process it has the effect from the first post.