Improving the efficiency of Standard Matrix Multiplication Algorithm?_问答_开发者

Improving the efficiency of Standard Matrix Multiplication Algorithm?

开发者 https://www.devze.com 2023-03-25 15:10 出处：网络

How can I improve the efficiency of standard matrix multiplication algorithm? The main operation involved in this approach is:C[i][j]+=A[i][p]*B[p][j]

How can I improve the efficiency of standard matrix multiplication algorithm?

The main operation involved in this approach is: C[i][j]+=A[i][p]*B[p][j]

What can be done to improve the efficiency of the algor开发者_运维问答ithm?

You might want to have a look at using a BLAS (Basic Linear Algebra Subroutine) library, specifically Intel offer their MKL here, AMD have their ACML here and there's also the (opensource) Goto BLAS here.

The (dense) matrix-matrix multiply kernel will be a ?GEMM call, where the ? indicates the floating point type. For example DGEMM will call the double routine.

Unless you're extremely confident you know what you're doing with low-level optimisations, these libraries will probably offer better performance than something you can code by hand.

If you do want to have a go at coding this yourself then you may want to consider the following:

Use "vector" instructions. SSE, SSE2..4 instructions are widely supported, some newer CPU's will also support AVX instructions.
Nested loop unrolling to maximise the ratio of floating point operations to load/store operations.
Block-wise algorithms to ensure effective cache use.
Multi-threading.

This reference might give you an idea of the current state of things:

High-performance implementation of the level-3 BLAS - K Goto.

Hope this helps.

I would suggest reading Chapter 1 of Golub and Van Loan, which addresses this exact question.

Cache blocking - making sure you're properly using and reusing values in the cache
Better algorithm - the "by-definition" way to multiply matrices is not optimal, take a look at Strassen's algorithm
Parallelization - if your machine has more than one core and/or processor, you can divide and conquer
SIMD - take advantage of SSE vector instructions in modern CPU architectures
GPGPU - modern GPUs are optimized to do just this sort of thing. Look into CUDA and OpenCL.

Note that using these methods does not guarantee better performance. There is a lot of tuning required give a significant speedup. There is a lot of money going into figuring out how to multiply matrices quickly so there is no shortage of journal articles on the subject.

If the question concerns multiple matrix multiplications - M1 x M2 x ... x Mn - then there is another optimization technique based on dynamic programming, which is sort of another ball-game. Note that this doesn't apply to improving the efficiency of multiplying two matrices; however, if you are multiplying three or more matrices in a pairwise fashion, then you can optimize at an even higher level. Just thought I'd throw this answer on the heap to round out the information.

Well there's Strassen's Algorithm, which, depending on the size of your matrix, is ever so slightly faster than the standard algorithm that you listed. There are of course even faster algorithms, but they arent so simple to implement.

The standard algorithm is O(N^3), Strassen's algo is O(N^2.8), and Coppersmith-Winograd is O(N^2.3)