Accelerating Sparse LU Factorization

Open Access
- Author:
- Lakshmidevi, Harieasswar
- Area of Honors:
- Computer Science
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Kamesh Madduri, Thesis Supervisor
John Joseph Hannan, Thesis Honors Advisor - Keywords:
- LU Factorization
GPU Acceleration
Sparse linear algebra
Symbolic factorization
Sparse linear solvers - Abstract:
- Many scientific and engineering problems require solving large-scale linear systems. Most matrices arising from real-world applications such as linear programming, chemical engineering, circuit simulation, fluid dynamics modeling, etc., are sparse in nature. Direct methods where the matrix A is decomposed into lower and upper triangular matrices L and U, respectively, where A = LU, is the predominant and the most general strategy due to its robustness and numerical stability. This factorization is also the most expensive step in sparse direct solvers. Sparse LU factorization involves much more complicated algorithms than its dense counterparts. The main complication arises from the new non-zeroes, also known as fill-ins, introduced in the L and U factored matrices during the factorization of a sparse matrix. Sparse LU factorization usually consists of three distinct steps: preprocessing, symbolic factorization, and numerical factorization. The first two steps aim to reduce and detect the fill-ins and handle the allocation of the factored matrix, and the latter computes the numerical values of L and U factors. Numerical factorization is the expensive step in this process, accounting for 80% of the runtime. However, with the recent rise of Graphics Processing Units (GPUs), many efforts have successfully accelerated the numerical factorization kernels and other related steps of factorization steps. This leaves the symbolic phase as the new bottleneck. This thesis extensively evaluates the current sparse LU algorithms designed for different parallel architectures, primarily focusing on the symbolic factorization routine. We implement and analyze the CPU-based symbolic factorization algorithms with a discussion on opportunities for parallelism and GPU adaptation of the symbolic phase.