Eigenvalue with Tensorflow

Hello all,

I’m trying to run a tensorflow/keras code where there is an eigenvalue calculation in the loss function. The code runs fine on my personal computer, but on Baobab I get the following error message:

File "run.py", line 313, in customLoss
        e,_= tf.linalg.eigh(matrix)
      File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/TensorFlow/1.7.0-Python-3.6.4/lib/python3.6/site-packages/tensorflow/python/ops/linalg_ops.py", line 348, in self_adjoint_eig
        e, v = gen_linalg_ops.self_adjoint_eig_v2(tensor, compute_v=True, name=name)
      File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/TensorFlow/1.7.0-Python-3.6.4/lib/python3.6/site-packages/tensorflow/python/ops/gen_linalg_ops.py", line 1639, in self_adjoint_eig_v2
        "SelfAdjointEigV2", input=input, compute_v=compute_v, name=name)
      File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/TensorFlow/1.7.0-Python-3.6.4/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
      File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/TensorFlow/1.7.0-Python-3.6.4/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
      File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/TensorFlow/1.7.0-Python-3.6.4/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
        self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access
    InvalidArgumentError (see above for traceback): Got info = 8 for batch index 0, expected info = 0. Debug_info = heevd
             [[Node: loss/activation_1_loss/SelfAdjointEigV2 = SelfAdjointEigV2[T=DT_FLOAT, compute_v=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](loss/activation_1_loss/mul_3)]]

It’s weird because I have very similar code using eigenvalue calculations which is running fine on Baobab. I’m trying to pinpoint what the problem is but it’s rather obscure. I tried to pinpoint the problem by reducing my code as much as possible. I get the same error already for a loss function like

def customLoss(x,y_pred):
    y_pred = tf.slice(y_pred,begin=(0,0),size=(batch_size,9)) #extract relevant elements
    matrix = K.reshape(y_pred,(-1,3,3)) #reshape to 3x3 matrices
    matrix = matrix + K.permute_dimensions(matrix,(0,2,1)) #add transpose to each matrix in batch
    e,_= tf.linalg.eigh(matrix) #get eigenvalues
    return -K.min(e)

In fact with this code it runs well for a few epochs and then gets the error.

Has anyone run into problems like this related to getting eigenvalues of matrices?

p.s. I load the following for this:
## TensorFlow
module load GCC/6.4.0-2.28 OpenMPI/2.1.2 TensorFlow/1.7.0-Python-3.6.4 matplotlib/2.1.2-Python-3.6.4 Keras/2.1.6-Python-3.6.4
module load cuDNN/7.0.5-CUDA-9.1.85

In the end I upgraded to Tensorflow 2 and then it worked!

p.s. thanks for tensorflow 2! :slight_smile:

Dear @Tamas.Krivachy, indeed TensorFlow has a fast pace release cycle.

I installed Right now TF 2.1 for CUDA!

ml fosscuda/2019b TensorFlow/2.1.0-Python-3.7.4


1 Like