Become a leader in the IoT community!
New DevHeads get a 320-point leaderboard boost when joining the DevHeads IoT Integration Community. In addition to learning and advising, active community leaders are rewarded with community recognition and free tech stuff. Start your Legendary Collaboration now!
You may already have come across this, but while I have not looked at it in (many!) years – the bible when I did my MSC in Computational Mathematics was “Numerical Recipes in C”
https://www.grad.hr/nastava/gs/prg/NumericalRecipesinC.pdf
Where do you actually use SSE instructions? The code you provided does not contain any, as far as I see.
No .. it’s not used in the code I pushed. It was an alternative , didn’t push it
Regarding cache locality, try to access memory consecutively. So, if it is your first index, that is multiplied to get the memory position, then keep this fixed to prevent big steps in memory. In your case this would mean the inner most loop should iterate over `j` not over `k`, as `k` is used as first index for the access to `other`. So try:
“`cpp
for (int i = 0; i < rows_; ++i) { for (int j = 0; j < other.cols_; ++j) { result(i, j) = 0.0; } for (int k = 0; k < cols_; ++k) { for (int j = 0; j < other.cols_; ++j) { result(i, j) += (*this)(i, k) * other(k, j); } } } ``` This should boost performance quite a bit.
Thanks @manuel_70200 I understand the logic behind iterating over j in the innermost loop to improve cache locality. By keeping the first memory access index fixed (i), I would increase the chance of accessing adjacent elements in the same cache line.
It makes perfect sense in the context of reducing L3 cache misses. I’ll definitely give this approach a try and compare the performance with the original loop structure. It would be interesting to see how much this optimization impacts the overall execution time, especially for larger matrices ✅ ,
If you check out my new commits I explored using `blocking` also
Thanks @ming_58391 👍
you may also search the web for efficient matrix multiplication. this is a standard topic and therefore there should be enough articles how to optimize it
Yh thanks @manuel_70200 , I tried out using blocking, been pretty okay using it since then , I could try out other optimization techniques for some other projects prolly
CONTRIBUTE TO THIS THREAD