I spend many times to look for any good freeware C++ class for doing good calculations with matrix. There are no good solution, except expensive ones. So I had to cope with the problem and make a whole class alone.
A little code style guide : In a good matrix library, you should be able to write such things :
matrix A(10, 10), B; // A is a 10x10 matrix, B uninitialised A.row(1, 5 ) = 1.0; // setting 1.0 to the row 1 to 5 A.row(6, 10) = 2.0; // setting 2.0 to the row 6 to 10 B = colMult(A, A.row(1)); // B is the column scaling multiplication // of A by first row of A cout << b.norm(norm_infini); // outputs the infini norm of the vector A = B + 1.0; // a is b matrix plus 1.0 to each elemen C = B/A; // solve the linear problem
What should be in this package ? What should I keep from the original package ? What is useful and what isn’t ? Let’s learn about my way of thinking : Every operators should be provided, at least +, -, *, /, and operators related with double too, the decompositions methods should use a managing class, so that it is quick to add our own algorithm. Special care about arguments passing are welcome, but it should only be in DEBUGs release, Outputs is so feeling dependant that there may be only a small one, Submatrix handling is definitely essential, have just a minimal use of temporary matrix. That’s what I’d love in a good package … (I wonder if I’ve done all of these !)
Now I think a small introduction about the creation is needed. I didn’t start the code from scratch and a look to matclass and Intel Math Kernel Library is a good start.
MatClass is a C++ Class for numerical computation.
So, what was done with this package ?
I removed all code related with tracing :
As said in Intel’ site : The Intel Math Kernel Library provides developers of scientific and engineering software with a set of linear algebra (the Basic Linear Algebra Subroutines–BLAS) and fast Fourier transform functions as static and dynamic libraries for the Windows* NT* and Windows 95 operating systems. The Math Kernel Library is highly optimized software for Pentium® Pro and Pentium II processors. The level 3 BLAS have also been multithreaded to provide even more performance on multiple CPU systems running Windows NT. The Math Kernel Library also provides good performance on Pentium and other Intel processors.
Well, so I have a high performance level 3 BLAS library. Then in the matrix library, I just remove all standard code for adds, products, sums and so on, with calls to this library. I also made a standard header for this library so that everyone could call it from C/C++.
Well, the result is the matrix class :
This little program constructs a PASCAL matrix, use Cholesky decomposition and test the result.
#include "matrix.h" void main() { INDEX M = 50, i, j; matrix a(M, M), b, d; cout << "testing cholesky decomposition " << endl; // matrix comstruction a.col(1)=1.0; // set the a first column elements to 1.0 a.row(1)=1.0; // set the a first row elements to 1.0 for (j=2; j<=M; j++) for (i=2; i<=M; i++) a(i, j)=a(i-1, j)+a(i, j-1); // use a choldec object for decomposition choldec c(a); if (c.ok()) // decompose the matrix { c.chol(b); // copy the decomposed matrix in b d = b.MultT(b); // d must be equal to a cout << "residu " << norm(a-d) << endl; } }
What about the benchmark ? What speed can you achieve with this library ? It depends on the number of processors you have. As descripted in the Intel Math Kernel Library, the level 3 BLAS functions are multithreaded. So you can make a dgemm with many processors. I give here the timing results I get with a double processor pentium II 266 MHz / 192 Mo RAM computer under NT4.0.
| function | calls | timing |
|---|---|---|
| Element addressing | ||
| xmat | 10000000 | delay per call : 0.3891 us |
| mat | 10000000 | delay per call : 0.3375 us |
| operator () | 10000000 | delay per call : 0.3594 us |
| rmat | 10000000 | delay per call : 0.6 us |
| tran [ 10000000 [ delay per call : 0.3484 us | ||
| vec | 20000 | delay per call : 0.8 us |
| Scaling : operator *= double | ||
| operator *= | 100 calls for a 1000 x 1000 matrix | per call : 0.07562s, delay per element : 0.07562 us, 13.224 MFlops |
| Normation : norm() | ||
| norm | 100 calls for a 1000000 vector | delay per call : 0.03391 s delay per element : 0.03391 us, 29.49 MFlops |
| Addition of matrix : c = a+b | ||
| operator +(mat,mat) | 10 calls for a 1000 x 1000 matrix | delay per call : 0.3906 s delay per element : 0.3906 us, 2.5602 MFlops |
| Matrix vector product : c = a*b | ||
| operator *(mat,vec) | 100 calls for a 1000 x 1000 matrix | delay per call : 0.04282 s delay per element : 0.04282 us, 46.707 MFlops |
| Matrix matrix product : c = a*b, 1 processor | ||
| operator *(mat,mat) | 1 call for a 1000 x 1000 matrix | delay per call : 10.516 s delay per element : 10.516 us, 190.19 MFlops |
| Matrix matrix product : c = a*b, 2 processors | ||
| operator *(mat,mat) | 1 call for a 1000 x 1000 matrix | delay per call : 5.484 s delay per element : 5.484 us, 364.7 MFlops |
| c.AddMult(2.0,a,b) versus c+=2.0*a*b | ||
| c += 2.0*a*b | 50 calls for a 1000 x 1000 matrix | delay per call : 0.32968 s delay per element : 0.32968 us, 12.133 MFlops |
| c.AddMult(2.0, a, b) | 50 calls for a 1000 x 1000 matrix | delay per call : 0.04344 s delay per element : 0.04344 us, 92.081 MFlops |
To use this library you have to download the Intel Math Kernel Library. It have to be on your library path when you compile.