Dear Yann:

Thanks for your reply. I was a bit schematic when quoting the batch commands; the whole batch script I’m using is the following:

```
#!/bin/bash
if [ $# -ne 1 ]; then
echo 'file required'
fi
inputname=$1
filename=$(basename $inputname)
jobfolder=jobdata-$filename
jobscript=$jobfolder/run.sbatch
mkdir -p $jobfolder
echo "#!/bin/bash
#SBATCH --job-name=asg-$filename
#SBATCH --output=$jobfolder/job.out
#SBATCH --error=$jobfolder/job.err
#SBATCH --partition=dpt-EL7
#SBATCH --time=00-04:00:00
#SBATCH --cpus-per-task= x
#SBATCH --nodes=1
module load GCC/7.3.0-2.30 OpenMPI/3.1.1 Python/2.7.15
export OPENBLAS_NUM_THREADS= x
srun python $inputname" > $jobscript
sbatch $jobscript
```

where instead of “x” I write 1 or 8 depending on whether I want to do a single-thread run or multithread using 8 cpu’s.

I’ve also written a little toy program that I think reproduces the problem I pointed out in the first post. It creates a 1D array (whose items have values that depend on their position in the array). Then I use this array to construct a tridiagonal matrix M and, for several values of a variable t, I compute the (weighted) average of the elements of the first column of exp(M*t), which I call “complexity”, denoted by C(t) in the plots I’ll show below. The Python script is the following:

```
import numpy as np
from scipy.sparse import diags
from scipy.special import binom
import numpy.linalg as la
import time
def bSequence (L,N):
D = int(binom(L,N))
b = [None]*(D**2)
for n in range(D**2):
if (n <= L):
b[n] = n / 7.
elif(n>L):
b[n] = b[L]-(n-L)*(b[L])/(float(D**2)-L)
return b
def makeM (b):
b = np.delete(b,0)
b = np.delete(b,np.where(b == 0))
diagonals = [b,-b]
M = diags(diagonals, [-1, 1]).toarray()
return M
def Spectrum (M): # Compute the spectrum of M, which will be used to exponentiate it.
assert la.norm( M+np.conj(M.T) ) < 1e-8 # This method is only correct for skew-hermitian matrices.
M = 1j*M
assert la.norm( M-M.conj().T < 1e-8 ) # We created a hermitian matrix out of the original one.
evals,P = la.eigh(M) # The method la.eigh() is faster than la.eig()
Pinv = P.conj().T # The eigenvectors matrix is unitary!
evals = -1j*evals # We recover the eigenvalues of the original matrix. The eigenvectors are just the same as those of the "artificial" hermitian matrix.
return [evals, P, Pinv]
def Complexity(evals,P,Pinv,t0,dt,ts,su):
v = Pinv[:,0] # Take only the first column of the inverse eigenvectors matrix.
t = t0 - dt
deltat = dt
timesteps = ts
speedUp = su
tVals = [None]*(timesteps*speedUp)
CVals = [None]*(timesteps*speedUp)
i=-1
for k in range(speedUp):
actualStep = deltat*(2**k)
for j in range(timesteps):
i += 1
t += actualStep
tVals[i] = t
Diagonal = diags(np.exp(t*evals),0).toarray() # Exponentiate every eigenvalue (item-wise) and put the resulting list in a diagonal matrix.
phit = P.dot(Diagonal).dot(v) # First column of exp(M*t)
C = 0.
for n in range(1,len(phit)):
C += n * np.conj(phit[n]) * phit[n]
CVals[i] = np.real(C) # K-complexity is real, so this makes sure to remove possible residual imaginary parts due to numerical precision.
return [tVals,CVals] # I will want to plot CVals vs tVals, see below.
if __name__ == '__main__':
L = 6
N = 3
start_time = time.time()
t0 = 0 # Initial time value.
dt = 0.01 # Value of the time step.
ts = 100 # Number of time steps (for every scale).
su = 5 # Number of scales spanned.
b = bSequence(L,N)
M = makeM(b)
evals,P,Pinv = Spectrum(M)
result = Complexity(evals,P,Pinv,t0,dt,ts,su) # result is a list with two elements, each element is a 1D array.
filename = 'Toy_Complexity_L'+str(L)+'_N'+str(N)+'_Tin'+str(t0)+'_dt'+str(dt)+'_ts'+str(ts)+'_su'+str(su)
elapsed_time = time.time() - start_time
print (elapsed_time)
np.save(filename,result)
```

After running this program in the cluster (the single-thread execution time is roughly 30 seconds), I download the created file which contains the two arrays and I plot the second array (result[1]) vs the first array (result[0]). I find different results depending on whether I use a single thread or multithreading. When using 8 cpu’s there is some seemingly “non-deterministic” noise (and I call it non-deterministic because it changes after every execution even though the script is exactly the same and there are no randomly assigned variables). See the plot below:

The result that I obtain from running the Python script on my PC, which multithreads using 4 cpu’s, agrees with the red line (i.e. 1-thread result from the cluster). The Python version in my computer is 3.7. I’ve also tried to use the newer Python version you suggested in your post:

```
ml GCC/8.2.0-2.31.1 OpenMPI/3.1.3 Python/3.7.2 SciPy-bundle/2019.03
```

But in that case I get error messages of the form:

```
srun: error: node092: task 0: Illegal instruction
```

… are some nodes problematic for using this Python version?

Thanks for your time and help.

Adrián S.G.