Architectures

# Cache Benchmarking

Source Code

1. Introduction

My aim is measuring the speed up of cache. To measure the speedup of cache I have implemented an operation, which is not using cache, and get the difference from operation, which is using cache. In the numeric examples below I have assumed the cache size is 512 KB.

2. Design

Selected operation is addition but it is too fast to calculate the time difference so we have to repeat this operation until it is possible to measure. This forces us to use arrays. There are two threads under design.

-Design of loop, using cache

-Design of loop, not using cache

2.1. Without cache

We want an addition operation, which is not using cache. This means our operands are not loaded to cache. My array size is 1000 * 256, which is overloading cache size for an addition operation. (a[1000][256]=b[1000][256]*c[1000][256])

Second method to force cache is reading data by jumping over cache size. If we implement a loop jumping 100 by 100 over the arrays, cache should be updated.

2.2. With cache

We want an addition operation, which is using cache. This means our operands are loaded into cache and cache will not be updated until the end of operation. This time my array sizes fits to cache. (a[1000][65]=b[1000][65]+c[1000][65])

2.3. Time

We need to read system time and make some operations over this time. This time calculations done by system calls.

3. Implementation

3.1. implementation without cache

for (i=0;i<100;i++)

for(j=0;j<256;j++)

for(k=0;k<10;k++)

{a[k*100+i][j]=b[k*100+i][j]+c[k*100+i][j];}

for this piece of code there are 3 loops and a simple addition and assignment operation.

Loops are implemented to jump over cache size. Still our array sizes are 1000*256 this loop will start to jump from 0 and jump 100 by 100 until 1000 than will return to 1 and jump 100 by 100 until 901 and than return to 2 , jump 100 by 100 until 902 ...

This makes an inefficient use of cache because in each case cache should be refreshed.

3.2. Implementation with cache

for (i=0;i<10;i++)

for(j=0;j<65;j++)

for(k=0;k<100;k++)

{a[i*10+k][j]=d[i*10+k][j]+e[i*10+k][j];}

in this c code there are 3 loops to make a simple array addition and assignation. I have used 3 loops here to make loop times equal in the implementation at 3.1. Only difference here is the array

float getdiff(struct timeval endtv, struct timeval starttv)

{

float diff=0;

diff=(endtv.tv_sec-starttv.tv_sec)*1000000+

(endtv.tv_usec-starttv.tv_usec);

return diff;

}

his function returns the difference between times.

3.3. Implementation Environment

I have used P2 200 MHZ pc with 32MB ram and 512 KB cache. Code is written under linux by using C.

4. Conclusion

Cache increases speed of calculations only if hit ratio is high. Programmers may consider the usage of cache for great number of operations. When cache size increases hit ratio increases.