Shaden Smith (University of Minnesota) and Yang You (University of California, Berkeley) are the recipients of the 2017 ACM/IEEE-CS George Michael Memorial HPC Fellowships. Smith is being recognized for his work on efficient and parallel large-scale sparse tensor factorization for machine learning applications. You is being recognized for his work on designing accurate, fast, and scalable machine learning algorithms on distributed systems.
Shaden Smith’s research is in the general area of parallel and high performance computing with a special focus on developing algorithms for sparse tensor factorization. Sparse tensor factorization facilitates the analysis of unstructured and high dimensional data.
Smith has made several fundamental contributions that have already advanced the state of the art on sparse tensor factorization algorithms. For example, he developed serial and parallel algorithms in the area of Canonical Polyadic Decomposition (CPD) that are over five times faster than existing open source and commercial approaches. He also developed algorithms for Tucker decompositions that are up to 21 times faster and require 28 times less memory than existing algorithms. Smith’s algorithms can efficiently operate on systems containing a small number of multi-core/manycore processors to systems containing tens of thousands of cores.
Yang You’s research interests include scalable algorithms, parallel computing, distributed systems and machine learning. As computers increasingly use more time and energy to transfer data (i.e., communicate), the invention or identification of algorithms that reduce communication within systems is becoming increasingly essential. In well-received research papers, You has made several fundamental contributions that reduce the communications between levels of a memory hierarchy or between processors over a network.
In his most recent work, “Scaling Deep Learning on GPU and Knights Landing Clusters,” You’s goal is to scale up the speed of training neural networks so that networks which are relatively slow to train can be redesigned for high performance clusters. This approach has reduced the percentage of communication from 87% to 14% and resulted in a five-fold increase in speed.