Conclusion
Using a striped partitioning method, it is natural to pipeline the Gaussian elimination algorithm to achieve best performance.
Pipelined algorithms work best on a linear array of processors.
- Or something that can be linearly mapped
Would it be better to block partition?
- How would it affect the algorithm?