Author: Saurabh Raje (University of Utah)
Advisor: P. Sadayappan (University of Utah)
Abstract: Sparse tensor contractions (SpTC) are a bottleneck for several algorithms in scientific computing, data science, artificial intelligence and graphics. The SpTC operation is any expression of the form R(l0,l1, r0) = X(l0, l1, c0) * Y(r0, c0) where two tensors are multiplied along several dimensions to form a multidimensional result. Sparse tensor networks are an extension of this problem such that there are more than two inputs. This thesis aims to accelerate both—the SpTC primitive and sparse tensor networks with multiple SpTC terms. We develop kernels and IR optimizations to improve code-generation for sparse tensor networks.
To generate efficient code for a sparse tensor network, several inter-dependent optimizations must be made on the intermediate representation (IR). This includes sparse tensor mode order, and loop fusion to reduce intermediate tensors. Correctness requirements impose constraints on these variables.
We develop CoNST, a code-generator that co-optimizes these variables. An integer constraint system is solved by the Z3 SMT solver and the result lowers to a unique fused loop structure and tensor mode layouts for the entire contraction tree. CoNST outperforms state-of-the-art compilers by orders of magnitude in run-time.
To accelerate the SpTC operation, we perform the first analysis of data-access costs and memory requirements for loop orders. We develop FaSTCC, a hash-based parallel implementation of the SpTC operation that uses the fastest loop order with minimal memory overhead. FaSTCC introduces a new 2D tiled contraction-index-outer scheme and a corresponding tile-aware design. It outperforms previous state-of-the-art by 2-5x on up to 64 CPU threads.
Thesis Canvas: pdf