[baobab][python package][virtualenv] Error raised on "shap" package load - Illegal instruction (core dumped)

Primary informations

Username: pascheo
Cluster: baobab

Description

I’m unable to load a python package installed in my virtualenv, as it raises an error on load.
The package is “shap” (https://shap.readthedocs.io/).
The package works correctly both on my two local computers (windows and Ubuntu) and on google colab.
I have also tried installing a different version of shap (e.g. “shap==0.42.1”), but it yields the same error: “Illegal instruction (core dumped)”.

Steps to Reproduce

run:

module load GCCcore/12.3.0 GCC/12.3.0 Python/3.11.3 Python-bundle-PyPI/2023.06 SciPy-bundle/2023.07
virtualenv ~/baobab_python_env
. ~/baobab_python_env/bin/activate
~/baobab_python_env/bin/pip install shap
sbatch shap_test.sh

with in the same directory, the files “shap_test.sh”:

#!/bin/sh
#SBATCH --job-name=SHAP
#SBATCH --partition=debug-cpu
#SBATCH --time=00:02:00
#SBATCH --mem=3000

module load GCCcore/12.3.0 GCC/12.3.0 Python/3.11.3 Python-bundle-PyPI/2023.06 SciPy-bundle/2023.07 

. ~/baobab_python_env/bin/activate

# ~/baobab_python_env/bin/pip install --upgrade shap

srun ~/baobab_python_env/bin/python shap_test.py

and the minimal example “shap_test.py”:


import shap

Expected Result

The “shap” package being loaded normally.

Actual Result

Error: “Illegal instruction (core dumped)”

Many thanks to whomever would be able to help! :slight_smile:

Have a great day,
Olivier

Dear @Olivier.Pasche

the issue is probably related to https://doc.eresearch.unige.ch/hpc/faq#illegal_instruction

As a workaround, I’ve added to our documentation a way to install a Python package by building from source. hpc:applications_and_libraries [eResearch Doc]

I’ve as well installed centrally the SHAP package. New software installed: SHAP version 0.42.1

Dear @Yann.Sagon,

Thanks for your answer, your suggested workaround and installing shap.

After some further investigation (as it still didn’t work with “–no-binary”), the issue seems to come from package interaction with matplotlib, as, even on the login node, the following minimal example works:

# To have python 3.11 and virtualenv available:
module load GCCcore/12.3.0 GCC/12.3.0 Python/3.11.3 Python-bundle-PyPI/2023.06 SciPy-bundle/2023.07

# Create virtualenv:
rm -r bvenv/
virtualenv ~/bvenv
. ~/bvenv/bin/activate

~/bvenv/bin/pip install --no-binary shap shap

~/bvenv/bin/python
>> import shap

but this throws the “Illegal instruction (core dumped)” error:

# To have python 3.11 and virtualenv available:
module load GCCcore/12.3.0 GCC/12.3.0 Python/3.11.3 Python-bundle-PyPI/2023.06 SciPy-bundle/2023.07

# Create virtualenv:
rm -r bvenv/
virtualenv ~/bvenv
. ~/bvenv/bin/activate

~/bvenv/bin/pip install --no-binary matplotlib matplotlib
~/bvenv/bin/pip install --no-binary shap shap

~/bvenv/bin/python
>> import shap

As I need several other unavailable packages, I need a custom environment.
I tried installing all the other packages I need except matplotlib in the virtualenv, and shap loads correctly, so it really seems to be caused by just matplotlib being installed in the same virtualenv…?
(This behaviour doesn’t occur on my local computer or on google colab.)

Any ideas on what might cause this?
Should I try installing the packages outside the virtualenv in “~/.local/*” (maybe not ideal)?

Thanks again and have a nice day,
Olivier

Hello,

I determined that the issue is because OpenBLAS was compiled on a newer compute node than login node.

[1303094.151572] traps: python[2286126] trap invalid opcode ip:14d60c68dc9e sp:7fff3f5b4ae0 error:0 in libopenblas_skylakexp-r0.3.23.so[14d60c3bf000+bd3000]

I’ll recompile OpenBLAS on a legacy node tomorrow.

Dear @Olivier.Pasche this is done, I’ve recompiled OpenBLAS on the oldest node we still have. It took longer than expected due to a bug in our EasyBuild recipy.

1 Like