Illegal instruction in C++ code

pietro.brighi · October 22, 2020, 8:52am

Hi all!
I’m running a C++ code that requires ITensor, which I built on the login node; after running it in the debug partition (where it works fine), I’m getting “Illegal instruction” error on some nodes in dpt partition.
If I understand correctly, this is due to some incompatibility with the CPUs of those nodes, do you have any idea how could I overcome this issue?
Best
Pietro

Pablo.Strasser · October 22, 2020, 11:13am

The list and type of node is given here https://baobabmaster.unige.ch/enduser/src/enduser/enduser.html#compute-nodes . You can specify the generation of the node by using the --constraint=“V2|V3|V4|V5|V6” flag for having node of generation 2 to 6. It is also possible to compile your software by ensuring you give a -m flag to gcc on an architecture that is old enough. If I remember correcly its something like westmere (-m westmere). Note that when you add a flag to gcc you may need to recompile all the library also as a single library using a too modern instruction will make you crash.

The simplest solution is to just restrict your node choice by a constraint.

pietro.brighi · October 23, 2020, 5:37am

Thank you Pablo, constraining to newer nodes worked.
Best
Pietro

Yann.Sagon · October 26, 2020, 9:27am

Hello,

thanks to @Pablo.Strasser for the answer. In adition, in Baobab we compile the software with “core2” value for “-m” option to solve this issue. You can as well compile your software on an “old” compute node and it will work on the whole cluster.

Best

Yann

Pablo.Strasser · October 27, 2020, 12:01am

In addition note that compiling with an “-m=core2” you compile your code to be compatible with the whole cluster but you may also loose performance on the newer nodes as some instruction will not be used (e.g AVX, AVX2). So it may be a good idea to restrict your code to more modern node if your code gain a lot of performance from newer instructions.

Yann.Sagon · October 28, 2020, 4:23pm

Open question: if you compile your software with full optimization but with a compiler that was compiled using “-m=core2” do you expect you software to use AVX? If you rely on other libs provided by EasyBuild, you’ll have the non optimized ones as well.

Does someone knows a software that has a huge performance improvement when using AVX and other fancy stuff? If yes, I would be interested to do a benchmark of that software. Thanks

Pablo.Strasser · October 28, 2020, 11:12pm

To my knowledge except compiler bug, the binary produced by a compiler is always the same whatever compiler and optimisation flag was used to compile the compiler of course the optimisation level used to compile the libraries will have an effect.

Software that to my knowledge are known to be affected by compiler optimisation are numerical analysis libraries, especially Blas and Blas like librairies like Lapack. Intel MKL is also a library written by intel and optimized on there hardware. Tensorflow give a warning when run on an AVX capable machine with a binary that was not compiled with AVX. A quick google search found me this “benchmark” on tensorflow https://medium.com/@mychen76/build-tensorflow-to-get-free-performance-increase-from-cpu-839f4e1187ee .

To be more precise I expect all software that would take advantage of a GPU to gain performance when compiled with AVX on a CPU.

Yann.Sagon · October 29, 2020, 7:21am

Hello,

Yes indeed, but as TensorFlow on Baobab should be run on GPU nodes only, this is probably not a real issue.

Probably yes. So I guess we should do a comparison with a software that isn’t ported to GPU and do benefite of AVX,FMA etc.