I'm going to past in some of our (other) conversation here so others can benefit :-) [ a few clean-up edits too ]

Hi Wimpie, happy to help if I can. Default matlab linked with MKL gets a nice improvement on AMD with that MKL_DEBUG_CPU_TYPE=5 environment variable set! I really like the new AMD processors and AVX2 is actually a very good vector unit. It's unfortunate to have to resort to hacks like that but that's how it is for now.

The first thing to do is see what libs are linked into that mex file...

I found this https://www.mathworks.com/m...

It may or may not be linked with MKL. If it is I would expect that debug env to work for that too.

If you can rebuild that mex file then it would be good to try that! The only way you can really tell is to experiment. I've found that OpenBLAS works really well with some programs on AMD. It's based on the excellent Goto BLAS. You can also try to link with MKL and use the debug flag.

If you rebuild with OpenBLAS then you might want to see about switching Matlab itself over to using that. I think they have made that reasonably easy to do but I don't know the details. ???

At the end of the day it's just going to take some experimenting. I think this is really worthwhile to do! It's part of your education and these kinds of problems will come up throughout your career :-)

Best wishes --Don

>>>

Thank you for the advice!

I will look at the mex file to try and get a better understanding of what we are working with. I like the idea of experimenting with a few alternatives. After reading up on IPOPT, it seems like there is a lot of room for optimization by using MA86 (that natively supports multi-threading) instead of MA57 and compiling IPOPT with METIS. All these terms are relatively new to me. But I am keen to explore to get a feel of how they will impact performance of the solver!

In one of your articles you did a benchmark using AVX512, how did you enforce that? While googling "BLAS for AMD" I found a link for AMD BLIS that provides an API for native BLAS and LAPACK calls. Do you think it would be possible to use that instead of MKL, or am I talking about two different things here? Sorry about the confusing questions, I am still learning exactly where MKL, AVX, BLAS etc. fit into the bigger picture.

>>>

... you are doing the right thing! You need to look "under the hood" and see what your code is doing. There is often room for improvement.

It's possible when building code to set compiler flags that control (restrict) the level of optimization including the "SIMD" instructions a.k.a. AVXxxx Often when code is built it is set to the capability of the native architecture and everything less than that. The "code-path" is chosen at runtime. Usually in my testing I'm just using whatever I get :-) There are some runtime environments that can be set like the debug flag for force "type 5" that is really setting the code-path to AXV2 like "Haswell" architecture.

AMD's BLIS v2 library is very good but I've had some trouble trying to use it in the past. I did try to set it for use when I did the numpy testing but performance was poor (much worse than it should have been???) That is not the fault of the library, just the way I was trying to get it to work with numpy.

The library that will be the easiest to try will be OpenBLAS since it is widely used. But BLIS is definitely worth looking into! I had great luck with it in the recent HPC testing I did. (but that was code compiled by AMD, not me)

From what you have mentioned, I would try to get the threading working well first with the "MA86" support you mentioned (I'm not familiar with any of that)

If you can add some understanding of performance-tuning to your "tool-belt" you will forever be glad you did :-)

Posted on 2020-05-18 15:44:02