The International Conference for High Performance Computing, Networking, Storage, and Analysis

Doctoral Showcase Archive

Sketch-Based Algorithmic Frameworks for Genome-Scale Mapping


Author: Tazin Rahman (Washington State University)

Advisor: Ananth Kalyanaraman (Washington State University, Pacific Northwest National Laboratory (PNNL))

Abstract: Sketching is a widely used class of techniques aimed at generating compact representations of longer biological sequences. Instead of comparing sequences, sketches allow us to sample from a subspace of k-mers and use those samples for comparison, saving both time and memory in the end application. One of the key metrics to consider here is density, which refers to the fraction of the sampled k-mers retained by the sketch. While a lower density is preferable for space considerations, it could also impact the sensitivity of the mapping process.

In this work, we study sketch-based data sparsification with high performance computing to improve scalability in mapping. Our contributions are twofold: 1) we present a scalable parallel algorithmic framework for alignment-free mapping, called JEM-mapper, and 2) we present a sketch library called MHSketch by extending JEM-mapper to adopt different sequence sketching schemes. Experimental evaluation demonstrates the ability of our approach to significantly reduce density and reap performance benefits from it. In particular, results show that MHSketch achieves accurate mapping while reducing time-to-solution (speedups between 2.2x to 9.3x), and drastically reducing memory usage (>92% savings) compared to other tools.


Thesis Canvas: pdf



Back to Doctoral Showcase Archive Listing