When I use x2go to execute distributed arrays calculations in matlab, the program works. I compiled Matlab code to run in a batch script on the cluster for the same program. But I get an error that parpool failed to start. Any ideas how to fix or make the script wait until the parpool starts successfully?
Hi there,
Can you explain a bit more about how you start your analysis?
It is impossible to help without knowing at least the sbatch
and the error you get, sorry.
Moreover, have you compiled your code as explained in the UNIGE HPC documentation (cf. hpc:applications_and_libraries [eResearch Doc])?
Thx, bye,
Luca
Hi Luca,
The batch script is shown here:
The MATLAB script is found here: interactionRadius/gaussianOverlap3D_v9c_batch.m at main · Okeus/interactionRadius · GitHub
When I uncomment lines 109 and 112, the script runs much faster on a desktop PC by processing matrix subsets in a queue.
If I run the script as shown in the image above on the HPC, without tall arrays, it executes perfectly. However, when I try to run distributed arrays or tall arrays on the HPC, I get an out of memory error. The program runs much faster on the HPC than on a desktop PC because I can run the jobs in parallel. However, I cannot take advantage of distributed or tall arrays to further speed up the program on the HPC.
Hi there,
Thank you for the details, you simply need more memory per CPU or, maybe better, you can also ask a higher overall memory (independently from the number of CPUs) using the --mem=${SIZE_IN_MB}
parameters, as explained in the UNIGE HPC documentation (cf. hpc:slurm [eResearch Doc]).
NB, you can check how memory your job has used with sacct -j 6206772 --format MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize
.
Thx, bye,
Luca