Efficient Technique for Partitioning and Programming Linear Algebra Algorithms on Concurrent VLSI Architectures