Affine Transformations

for Optimizing Parallelism and Locality

An Affine Partitioning Theory

It is well known that many loop transformations can be used to improve parallelization as well as the memory subsystem performance for both uniprocessor and multiprocessor systems. Many transformations have been proposed in the past including unimodular transformations (interchange, skew and reversal), fusion, fission, reindexing, scaling, and statement reordering. We have developed a new transformation framework called affine partitioning that unifies all the above transformations.

Affine Partitioning Based Algorithms

Developed an algorithm that finds the optimal affine partitioning that maximizes the degree of parallelism while minimizing the degree of synchronization.
Developed an algorithm that maximizes parallelism and minimizes communication between processors.
Developed an algorithm that uses affine partitioning to improve the data locality of uniprocessor and multiprocessor programs.
Introduced the concept of distributed blocking, a generalization of blocking from perfect to arbitrary loop nestings.
Developed an algorithm that integrates affine partitioning with blocking.
Developed an algorithm that uses affine partitioning to maximize the opportunity of array contraction.

Experimental Results

The affine partitioning algorithm has been prototyped in the SUIF2 Compiler Infrastructure. Data locality optimizations have shown to triple uniprocessor performance on some programs. A combination of affine transformations to improve parallelization and locality has allowed a speed up of 20 times on a 32-processor, which represents over a three-fold improvement over the previously best results.

Publications

Improving Parallelism And Data Locality With Affine Partitioning
A. W. Lim
Ph.D. thesis, Stanford University, Computer Science Department, September 2001.
(Also available in gzip'ed postscript format.)
Blocking and Array Contraction Across Arbitrarily Nested Loops Using Affine Partitioning
A. W. Lim, S.-W. Liao and M. S. Lam
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June, 2001.
Cache Optimizations With Affine Partitioning
A. W. Lim and M. S. Lam.
Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, Virginia, March, 2001.
An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication
A. W. Lim, G. I. Cheong and M. S. Lam
Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, June, 1999.
Maximizing Parallelism and Minimizing Synchronization with Affine Partitions
A. W. Lim and M. S. Lam
Parallel Computing, Vol. 24, Issue 3-4, May 1998, Pages 445-475. (Also available in PDF format.)
Maximizing Parallelism and Minimizing Synchronization with Affine Transforms
A. W. Lim and M. S. Lam
Conference Record of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January, 1997.
Finding Affine Partitions that Maximize Parallelism and Minimize Synchronization
A. W. Lim and M. S. Lam
Proceedings of the Sixth Workshop on Compilers for Parallel Computers, Aachen, Germany, December, 1996.
Communication-Free Parallelization via Affine Transformations
A. W. Lim and M. S. Lam
Proceedings of the 7th Workshop on Languages and Compilers for Parallel Computing, August, 1994.