A CPP Matrix class

Introduction

Aims

I spend many times to look for any good freeware C++ class for doing good calculations with matrix. There are no good solution, except expensive ones. So I had to cope with the problem and make a whole class alone.

an easy to use class so that everyone is able to add the code in his program with very little changement,
high speed matrix computation. This condition could only be achieved when you specify the environment your are using. As I’m programming under Windows NT/Intel, the code should be really fast under these platforms,
must be free,
I’m thinking that more and more scientifics calculations will be done under the PC platform, so I wanted that the code should be easily adapted to every OS (that’s a wish not a feature …).
short time of coding. As I’m not crazy, my work was freely inspired from two sources described below thank you guys !

A little code style guide : In a good matrix library, you should be able to write such things :

matrix A(10, 10), B;   // A is a 10x10 matrix, B uninitialised
A.row(1, 5 ) = 1.0; // setting 1.0 to the row 1 to 5
A.row(6, 10) = 2.0; // setting 2.0 to the row 6 to 10
B = colMult(A, A.row(1));  // B is the column scaling multiplication
                           // of A by first row of A
cout << b.norm(norm_infini); // outputs the infini norm of the vector
A = B + 1.0; // a is b matrix plus 1.0 to each elemen
C = B/A; // solve the linear problem

What should be in this package ? What should I keep from the original package ? What is useful and what isn’t ? Let’s learn about my way of thinking : Every operators should be provided, at least +, -, *, /, and operators related with double too, the decompositions methods should use a managing class, so that it is quick to add our own algorithm. Special care about arguments passing are welcome, but it should only be in DEBUGs release, Outputs is so feeling dependant that there may be only a small one, Submatrix handling is definitely essential, have just a minimal use of temporary matrix. That’s what I’d love in a good package … (I wonder if I’ve done all of these !)

Now I think a small introduction about the creation is needed. I didn’t start the code from scratch and a look to matclass and Intel Math Kernel Library is a good start.

The MatClass library

MatClass is a C++ Class for numerical computation.

Offers a general purpose dense, real matrix class,
Has a family of decomposition classes based on LU, Cholesky, Householder QR and SVD,
Has a family of OLS regression classes based on above decompositons,
A family of special function classes,
Random number class,
Has a simplified I/O structure.
The documents are a very thorough tex manual, with discussion of design philosophy, but currently the manual does not cover all the features of the I/O. I have a version of the matclass in local you can downloaded from here.

So, what was done with this package ?

I removed all code related with tracing :

the class implemented a very interesting tracing code to show you where you missing arguments crashed the code. As I want speed, I get ride of this. But I always replaced the checks by an assert so that in the DEBUG version of the library, an false assertion warns you from mistakes.
I removed the I/O structure :
I have no use for this ability so I remove it, but it can be added again easily.
I removed some specials class :
I get rid of the random class as there are too prolific, and some others I didn’t remember.
Now that lighting of the code was done, what about the speed ? I could have coded some of the routines in asm myself. But speeding 2 months for this isn’t necessary. Why ? Because of Intel’s Math Kernel Library !

Intel Math Kernel Library

As said in Intel’ site : The Intel Math Kernel Library provides developers of scientific and engineering software with a set of linear algebra (the Basic Linear Algebra Subroutines–BLAS) and fast Fourier transform functions as static and dynamic libraries for the Windows* NT* and Windows 95 operating systems. The Math Kernel Library is highly optimized software for Pentium® Pro and Pentium II processors. The level 3 BLAS have also been multithreaded to provide even more performance on multiple CPU systems running Windows NT. The Math Kernel Library also provides good performance on Pentium and other Intel processors.

Well, so I have a high performance level 3 BLAS library. Then in the matrix library, I just remove all standard code for adds, products, sums and so on, with calls to this library. I also made a standard header for this library so that everyone could call it from C/C++.

Matrix class

Well, the result is the matrix class :

works with double dense matrices,
can make an +, -, *, / operator efficiently,
have comparison operators,
can do LU, Cholesky, Householder QR, and SVD decompositions,
have build-in matlab functions,
handles sub-matrices by reference or by copy,
have a standard interface to BLAS so that everyone who wanted to use under another platform could manage it easely.

Example of use

This little program constructs a PASCAL matrix, use Cholesky decomposition and test the result.

#include "matrix.h"
 
void main()
{
   INDEX M = 50, i, j;
   matrix a(M, M), b, d;
 
   cout << "testing cholesky decomposition " << endl; 
 
   // matrix comstruction 
   a.col(1)=1.0; // set the a first column elements to 1.0
   a.row(1)=1.0; // set the a first row elements to 1.0
 
   for (j=2; j<=M; j++) 
     for (i=2; i<=M; i++) 
        a(i, j)=a(i-1, j)+a(i, j-1);
 
   // use a choldec object for decomposition 
   choldec c(a); 
 
   if (c.ok()) // decompose the matrix
   {
       c.chol(b); // copy the decomposed matrix in b
       d = b.MultT(b); 
       // d must be equal to a
       cout << "residu " << norm(a-d) << endl; 
   }
}

Documentation

Matrix Documentation

Benchmarking

What about the benchmark ? What speed can you achieve with this library ? It depends on the number of processors you have. As descripted in the Intel Math Kernel Library, the level 3 BLAS functions are multithreaded. So you can make a dgemm with many processors. I give here the timing results I get with a double processor pentium II 266 MHz / 192 Mo RAM computer under NT4.0.

function	calls	timing
Element addressing
xmat	10000000	delay per call : 0.3891 us
mat	10000000	delay per call : 0.3375 us
operator ()	10000000	delay per call : 0.3594 us
rmat	10000000	delay per call : 0.6 us
tran [ 10000000 [ delay per call : 0.3484 us
vec	20000	delay per call : 0.8 us
Scaling : operator *= double
operator *=	100 calls for a 1000 x 1000 matrix	per call : 0.07562s, delay per element : 0.07562 us, 13.224 MFlops
Normation : norm()
norm	100 calls for a 1000000 vector	delay per call : 0.03391 s delay per element : 0.03391 us, 29.49 MFlops
Addition of matrix : c = a+b
operator +(mat,mat)	10 calls for a 1000 x 1000 matrix	delay per call : 0.3906 s delay per element : 0.3906 us, 2.5602 MFlops
Matrix vector product : c = a*b
operator *(mat,vec)	100 calls for a 1000 x 1000 matrix	delay per call : 0.04282 s delay per element : 0.04282 us, 46.707 MFlops
Matrix matrix product : c = a*b, 1 processor
operator *(mat,mat)	1 call for a 1000 x 1000 matrix	delay per call : 10.516 s delay per element : 10.516 us, 190.19 MFlops
Matrix matrix product : c = a*b, 2 processors
operator *(mat,mat)	1 call for a 1000 x 1000 matrix	delay per call : 5.484 s delay per element : 5.484 us, 364.7 MFlops
c.AddMult(2.0,a,b) versus c+=2.0ab
c += 2.0ab	50 calls for a 1000 x 1000 matrix	delay per call : 0.32968 s delay per element : 0.32968 us, 12.133 MFlops
c.AddMult(2.0, a, b)	50 calls for a 1000 x 1000 matrix	delay per call : 0.04344 s delay per element : 0.04344 us, 92.081 MFlops

Downloading

To use this library you have to download the Intel Math Kernel Library. It have to be on your library path when you compile.

Intel Math Kernel Library
Matrix Library v1.02 : fix version for MKL5.1 : matrix102.zip
Matrix library v1.01 : fix version for MKL3.1 : matrix101.zip
Matrix library 1.00 : version for MKL 2.1 : matrix100.zip