Can we do better?
Previous version is synchronous and parallelism is reduced at each step.
Pipeline the algorithm
Run the resulting algorithm on a linear array of processors.
Communication is nearest-neighbor
Results in O(n) steps of O(n) operations
Previous slide
Next slide
Back to first slide
View graphic version