Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- /* Two-dimensional array access */
- // Example of good memory access - Array a is accessed along the rows. This approach ensures good performance from the memory system
- for (int i = 0; i < n; i ++)
- for (int j = 0; j < n; j++)
- sum += a[i][j];
- // Example of bad memory access - Array a is accessed columnwise. This approach results in poor utilization of the memory system. The larger array, the worse its performance will be.
- for (int j = 0; j < n; j++)
- for (int i = 0; i < n; i ++)
- sum += a[i][j];
- /* Loop unrolling */
- // A short loop nest - Loop overheads are relatively high when each iteration has a small number of operations.
- for(int i = 1; i < n; i++){
- a[i] = b[i] + 1;
- c[i] = a[i] + a[i - 1] + b[i - 1];
- }
- // A unrolled loop - The loop has been unrolled by a factor of 2 to reduce the loop overheads.
- for(int i = 1; i < n; i+= 2){
- a[i] = b[i] + 1;
- c[i] = a[i] + a[i - 1] + b[i - 1];
- a[i + 1] = b[i + 1] + 1;
- c[i + 1] = a[i + 1] + a[i] + b[i];
- }
- /* Loop fusion */
- // A pair of loops that both access array a - The second loop reuses element a[i], but by the time it is executed, the cache line this element is part of may no longer be in the cache.
- for(int i = 0; i < n; i++)
- a[i] = b[i] * 2;
- for(int i = 0; i < n; i ++){
- x[i] = x[i] * 2;
- c[i] = a[i] + 2;
- }
- // An example of loop fusion - The pair of loops have been combined and the statements reordered. This permits the values of array a to be immediately reused.
- for(int i = 0; i < n; i ++){
- a[i] = b[i] * 2;
- c[i] = a[i] + 2;
- x[i] = x[i] * 2;
- }
- /* Loop fission */
- // A loop with poor cache utilization and bad memory access - if we can split off the updates to array c from the rest of the work, loop interchange can be applied to fix this problem.
- for(int i =0; i < n; i ++){
- c[i] = exp(i / n);
- for(int j = 0; j < m; j++){
- a[j][i] = b[j][i] + d[j] * e[i];
- }
- }
- // Loop fission - The loop nest here has been split into a pair of loops, followed by loop interchange applied to the second loop to improve cache usage.
- for(int i = 0; i < n; i++)
- c[i] = exp(i / n);
- for(int j = 0; j < m; j++)
- for(int i = 0; i < n; i++)
- a[j][i] = b[j][i] + d[j] * e[i];
- /* Loop tiling */
- // A nested loop implementing an array transpose operation - Loop interchange does not improve its use of cache or TLB. A fresh approach is needed.
- for(int i = 0; i < n; i++)
- for(int j = 0; j < n; j++)
- b[i][j] = a[j][i];
- // Loop tiling applied to matrix transpose - Here we have used loop tiling to split the inner loop into a pair of loops. This reduces TLB and cache misses.
- for(int j1 = 0; j1 < n; j1 += nbj)
- for(int i = 0; i < n; i++)
- for(int j2 = 0; j2 < MIN(n - j1, nbj); j2++)
- b[i][j1 + j2] = a[j1 + j2][i];
Advertisement
Add Comment
Please, Sign In to add comment