Hi all!
I’m running a C++ code that requires ITensor, which I built on the login node; after running it in the debug partition (where it works fine), I’m getting “Illegal instruction” error on some nodes in dpt partition.
If I understand correctly, this is due to some incompatibility with the CPUs of those nodes, do you have any idea how could I overcome this issue?
Best
Pietro
The list and type of node is given here https://baobabmaster.unige.ch/enduser/src/enduser/enduser.html#compute-nodes . You can specify the generation of the node by using the --constraint=“V2|V3|V4|V5|V6” flag for having node of generation 2 to 6. It is also possible to compile your software by ensuring you give a -m flag to gcc on an architecture that is old enough. If I remember correcly its something like westmere (-m westmere). Note that when you add a flag to gcc you may need to recompile all the library also as a single library using a too modern instruction will make you crash.
The simplest solution is to just restrict your node choice by a constraint.
Thank you Pablo, constraining to newer nodes worked.
Best
Pietro
Hello,
thanks to @Pablo.Strasser for the answer. In adition, in Baobab we compile the software with “core2” value for “-m” option to solve this issue. You can as well compile your software on an “old” compute node and it will work on the whole cluster.
Best
Yann
In addition note that compiling with an “-m=core2” you compile your code to be compatible with the whole cluster but you may also loose performance on the newer nodes as some instruction will not be used (e.g AVX, AVX2). So it may be a good idea to restrict your code to more modern node if your code gain a lot of performance from newer instructions.
Open question: if you compile your software with full optimization but with a compiler that was compiled using “-m=core2” do you expect you software to use AVX? If you rely on other libs provided by EasyBuild, you’ll have the non optimized ones as well.
Does someone knows a software that has a huge performance improvement when using AVX and other fancy stuff? If yes, I would be interested to do a benchmark of that software. Thanks
To my knowledge except compiler bug, the binary produced by a compiler is always the same whatever compiler and optimisation flag was used to compile the compiler of course the optimisation level used to compile the libraries will have an effect.
Software that to my knowledge are known to be affected by compiler optimisation are numerical analysis libraries, especially Blas and Blas like librairies like Lapack. Intel MKL is also a library written by intel and optimized on there hardware. Tensorflow give a warning when run on an AVX capable machine with a binary that was not compiled with AVX. A quick google search found me this “benchmark” on tensorflow https://medium.com/@mychen76/build-tensorflow-to-get-free-performance-increase-from-cpu-839f4e1187ee .
To be more precise I expect all software that would take advantage of a GPU to gain performance when compiled with AVX on a CPU.
Hello,
Yes indeed, but as TensorFlow on Baobab should be run on GPU nodes only, this is probably not a real issue.
Probably yes. So I guess we should do a comparison with a software that isn’t ported to GPU and do benefite of AVX,FMA etc.