How can I improve the efficiency of standard matrix multiplication algorithm?
The main operation involved in this approach is: C[i][j]+=A[i][p]*B[p][j]
What can be done to improve the efficiency of the algor开发者_运维问答ithm?
You might want to have a look at using a BLAS (Basic Linear Algebra Subroutine) library, specifically Intel offer their MKL here, AMD have their ACML here and there's also the (opensource) Goto BLAS here.
The (dense) matrix-matrix multiply kernel will be a ?GEMM
call, where the ?
indicates the floating point type. For example DGEMM
will call the double
routine.
Unless you're extremely confident you know what you're doing with low-level optimisations, these libraries will probably offer better performance than something you can code by hand.
If you do want to have a go at coding this yourself then you may want to consider the following:
- Use "vector" instructions.
SSE, SSE2..4
instructions are widely supported, some newerCPU
's will also supportAVX
instructions. - Nested loop unrolling to maximise the ratio of floating point operations to load/store operations.
- Block-wise algorithms to ensure effective cache use.
- Multi-threading.
This reference might give you an idea of the current state of things:
High-performance implementation of the level-3 BLAS - K Goto.
Hope this helps.
I would suggest reading Chapter 1 of Golub and Van Loan, which addresses this exact question.
- Cache blocking - making sure you're properly using and reusing values in the cache
- Better algorithm - the "by-definition" way to multiply matrices is not optimal, take a look at Strassen's algorithm
- Parallelization - if your machine has more than one core and/or processor, you can divide and conquer
- SIMD - take advantage of SSE vector instructions in modern CPU architectures
- GPGPU - modern GPUs are optimized to do just this sort of thing. Look into CUDA and OpenCL.
Note that using these methods does not guarantee better performance. There is a lot of tuning required give a significant speedup. There is a lot of money going into figuring out how to multiply matrices quickly so there is no shortage of journal articles on the subject.
If the question concerns multiple matrix multiplications - M1 x M2 x ... x Mn - then there is another optimization technique based on dynamic programming, which is sort of another ball-game. Note that this doesn't apply to improving the efficiency of multiplying two matrices; however, if you are multiplying three or more matrices in a pairwise fashion, then you can optimize at an even higher level. Just thought I'd throw this answer on the heap to round out the information.
Well there's Strassen's Algorithm, which, depending on the size of your matrix, is ever so slightly faster than the standard algorithm that you listed. There are of course even faster algorithms, but they arent so simple to implement.
The standard algorithm is O(N^3), Strassen's algo is O(N^2.8), and Coppersmith-Winograd is O(N^2.3)
精彩评论