Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs


The massive parallelism provided by the graphics processing units (GPUs) offers tremendous performance in many high-performance computing applications. One such application is Sparse Matrix-Vector (SpMV) multiplication, which is an essential building block for numerous scientific and engineering applications. Researchers who propose new storage techniques for sparse matrix-vector multiplication focus mainly on a single evaluation metrics or performance characteristics which is usually the throughput performance of sparse matrix-vector multiplication in FLOPS. However, such an evaluation does not provide a deeper insight nor allow to compare new SpMV techniques with their competitors directly. In this chapter, we explain the notable performance characteristics of the GPU architectures and SpMV computations. We discuss various strategies to improve the performance of SpMV on GPUs. We also discuss a few performance criteria that are usually overlooked by the researchers during the evaluation process. We also analyze various well-known schemes such as COO, CSR, ELL, DIA, HYB, and CSR5 using the discussed performance characteristics.

Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies