A couple of weeks ago I asked a question about the performance of matrix multiplication.

I was told that in order to enhance the performance of my program I should use some specialised matrix classes rather than my own class.

StackOverflow users recommended:

uBLAS

EIGEN

BLAS

At first I wanted to use uBLAS however reading documentation it turned out that this library doesn't support matrix-matrix multiplication.

After all I decided to use EIGEN library. So I exchanged my matrix class to Eigen::MatrixXd - however it turned out that now my application works even slower than before. Time before using EIGEN was 68 seconds and after exchanging my matrix class to EIGEN matrix program runs for 87 seconds.

Parts of program which take the most time looks like that

TemplateClusterBase* TemplateClusterBase::TransformTemplateOne( vector<Eigen::MatrixXd*>& pointVector, Eigen::MatrixXd& rotation ,Eigen::MatrixXd& scale,Eigen::MatrixXd& translation ) { for (int i=0;i<pointVector.size();i++ ) { //Eigen::MatrixXd outcome = Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i]) + translation; //delete prototypePointVector[i]; // ((rotation*scale)* (*prototypePointVector[i]) + translation).ConvertToPoint(); MatrixHelper::SetX(*prototypePointVector[i],MatrixHelper::GetX(outcome)); MatrixHelper::SetY(*prototypePointVector[i],MatrixHelper::GetY(outcome)); //assosiatedPointIndexVector[i] = prototypePointVector[i]->associatedTemplateIndex = i; } return this; }

and

Eigen::MatrixXd AlgorithmPointBased::UpdateTranslationMatrix( int clusterIndex ) { double membershipSum = 0,outcome = 0; double currentPower = 0; Eigen::MatrixXd outcomePoint = Eigen::MatrixXd(2,1); outcomePoint << 0,0; Eigen::MatrixXd templatePoint; for (int i=0;i< imageDataVector.size();i++) { currentPower =0; membershipSum += currentPower = pow(membershipMatrix[clusterIndex][i],m); outcomePoint.noalias() += (*imageDataVector[i] - (prototypeVector[clusterIndex]->rotationMatrix*prototypeVector[clusterIndex]->scalingMatrix* ( *templateCluster->templatePointVector[prototypeVector[clusterIndex]->assosiatedPointIndexVector[i]]) ))*currentPower ; } outcomePoint.noalias() = outcomePoint/=membershipSum; return outcomePoint; //.ConvertToMatrix(); }

As You can see, these functions performs a lot of matrix operations. That is why I thought using Eigen would speed up my application. Unfortunately (as I mentioned above), the program works slower.

Is there any way to speed up these functions?

Maybe if I used DirectX matrix operations I would get better performance ?? (however I have a laptop with integrated graphic card).