Software/system: Problem with Qt plugin

Primary informations

Username: kruckow
Cluster: Yggdrasil

Description

I was trying to run a python script to create plots. This script was tested and works when run in a jupyter lab notebook.
Because it need to be run often, it should be run via command line as a slrum array.

Steps to Reproduce

This is unfortunately a bit difficult, because it needs some stuff only installed in a conda environment. You’d need to have the newest development version of POSYDON.

Expected Result

It was expected to create the plots like tested when using jupyter lab.

Actual Result

But it turned out, that running the same script on command line with python, python3 or ipython results in an error about the graphical interface, namely the Qt plugin.

Error message

qt.qpa.plugin: Could not load the Qt platform plugin “xcb” in “” even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl, xcb.

Aborted

Further details

POSYDON uses PyQt5 in some modules, which probably causes the failed try to load the Qt plugin.
The Qt plugin is not part of POSYDON, it should come from the shared installations on the system.
For a colleague the script runs with python, we do use the same PyQt5 version, but I have a slightly newer version of python 3.7 than he. According to python, it should use the Qt version 5.15.2. In the internet I found the most common reason for this error are missing libraries in Qt6.

1 Like

Hi @Matthias.Kruckow ,

I need more detail like the version of each software you use, could you please share me your sbatch ?

I’m a bit confused. Like I described the problem is not in the slurm job, its in the communication between python and the Qt plugin.
Nevertheless, here is the used sbatch script:

#!/bin/bash
#SBATCH --account=fragkos
#SBATCH --partition=public-cpu
#SBATCH -N 1
#SBATCH --cpus-per-task 1
#SBATCH --ntasks-per-node 1
#SBATCH --time=24:00:00
#SBATCH --job-name=psygrid
#SBATCH --mem-per-cpu=4G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=matthias.kruckow@unige.ch
#SBATCH --array=0-1
#SBATCH --output=./logs/plot_grid.out

srun python /home/users/k/kruckow/scratch/POSYDON//bin/run-pipeline /srv/beegfs/scratch/shares/astro/posydon/POSYDON_GRIDS_v2/ ./step_3.csv $SLURM_ARRAY_TASK_ID 0

The error even happen when I run the srun as a normal command line.

python /home/users/k/kruckow/scratch/POSYDON//bin/run-pipeline /srv/beegfs/scratch/shares/astro/posydon/POSYDON_GRIDS_v2/ ./step_3.csv 0 0

The script reads the file step_3.csv, the $SLURM_ARRAY_TASK_ID tells which line below the header should be read. Here, the step_3.csv file:

path_to_grid,path_to_plot
/srv/beegfs/scratch/shares/astro/posydon/POSYDON_GRIDS_v2/HMS-HMS/1e-04_Zsun/LITE/grid_low_res_combined_rerun_1PISN.h5,/srv/beegfs/scratch/shares/astro/posydon/POSYDON_GRIDS_v2/HMS-HMS/1e-04_Zsun/plots/grid_low_res_combined_rerun_1PISN
/srv/beegfs/scratch/shares/astro/posydon/POSYDON_GRIDS_v2/HMS-HMS/1e-04_Zsun/LITE/grid_random_combined_rerun_1PISN.h5,/srv/beegfs/scratch/shares/astro/posydon/POSYDON_GRIDS_v2/HMS-HMS/1e-04_Zsun/plots/grid_random_combined_rerun_1PISN

Yes, I’d like to know all the used software and versions myself. Unfortunately, python is loading a lot of stuff in the background without telling me, even not when running python with the --debug option.
So, please let me know, whether there is a better way to figure out, how the error is caused.

There are two part in the sbatch

  1. Slurm resources definition
  2. Job definition

The sbatch give more informations what you think.

I now understand that you are not using an installed module but something you compile yourself.

With a little research, I understood that POSYDON is a tool developed in cooperation with different organizations of which UNIGE is a part.

With the sbatch, I get the logs path, jobname, path to your binary and work directory etc… So this information are IMPORTANT (almost mandatory) to understand your issue. :slight_smile:

So, what I understand is that you ran the job via jupyter lab and it’s working but through slurm you have the QT5 error message.

The difference beetween the two way to run it’s the graphical interface.

I find this in the POSYDON documentation:

http://posydon.org/POSYDON/run_mesa_grids/fixed/fixed.html

posydon-setup-grid --grid-type fixed --inifile example_grid.ini --submission-type slurm

Did you init with the option --submission-type slurm ?

I invite you to join the HPC-lunch of Today to talk about it :slight_smile:

Yes, I’m part of the group working on POSYDON. So, yes I got the slurm option correct, otherwise I would not have had the script to run with sbatch.

Again, the problem has nothing to do with slurm! The only reason, why I mentioned slurm is to motivate, why I can’t run it in jupyter lab always. I need to run it via command line to be able to use slurm.

The issue is, that the person who wrote the script, which I need to run, doesn’t get the error, when running it via slurm. But I do. The differences between us two is, that I do have a newer account and therefore some newer libraries. The Qt plugin is not part of POSYDON, this should be a system wide shared installation, which is used there. Thus, I was wondering, whether there might be different shared installations. The person, where it works, has an account which was created a long time ago, e.g. he still have his home under /home/NAME, while mine is at /home/users/k/kruckow/.

I’m sorry, but the libraries are not “account” dependent. You are using a POSYDON software that you built yourself, in your home, so using the libraries it contains. Are you sure you are using the same version as your colleague?

Hello, not sure if this is useful, but I’ve had the same error when trying to do plotting with matplotlib on a script submitted via sbatch. In that case, I could only get it to run without the xcb error message when adding these lines after the #SBATCH statements (before the call to the python script):

unset DISPLAY
export XDG_RUNTIME_DIR=""

I’ll admit I don’t really know what these lines do, but I’ve seen it on previous threads and it’s working for me…

The little test script I used:

#!/bin/bash 
#SBATCH --job-name=test_plotting
#SBATCH --partition=shared-cpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=2G
#SBATCH --time=15:00
#SBATCH --output="slurm-%j.out" 

unset DISPLAY
export XDG_RUNTIME_DIR=""

python test.py

With test.py:

import matplotlib.pyplot as plt

plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
plt.savefig("test.jpg", format="JPEG")
plt.close()

Hi @Matthias.Kruckow

we don’t have Qt on the compute nodes:

(baobab)-[root@cpu186 ~]$ rpm -qa | grep -i qt
(baobab)-[root@cpu186 ~]$

so, which version do you use? In you sbatch you aren’t loading any module, nor for qt, nor for python.

Best

Thanks to Genevieve.Savard, this was helping me to identify and solve the issue.

I only need the unset DISPLAY.

The issue was that we logged in to yggdrasil via ssh -X to be able to display the plots generated on the cluster. This caused to set the shell variable $DISPLAY differently for me and my colleague because we logged in from different computers. Thus, using unset DISPLAY solves the issue for running the script as it resets the value of $DISPLAY (here a warning: displaying images won’t work anymore; including it in the sbatch script should be fine when it well opens a new subshell, which won’t effect you current one – what it should do). Another solution would be to login without the X-option when starting the SLURM job, but using it when you want to inspect the produced images.

Btw. the export XDG_RUNTIME_DIR="" is creating or/overwriting the content of the shell variable ${XDG_RUNTIME_DIR}, which wasn’t needed in my case.

1 Like