ACM Doctoral Dissertation Award
Canada - 2014
citation
For his dissertation, "An Architecture for Fast and General Data Processing on Large Clusters," nominated by the University of California at Berkeley.
Press ReleaseCreator Of Advanced Data Processing Architecture Wins 2014 Doctoral Dissertation Award
Matei Zaharia won the 2014 Doctoral Dissertation Award for his innovative solution to tackling the surge in data processing workloads, and accommodating the speed and sophistication of complex multi-stage applications and more interactive ad-hoc queries. His work proposed a new architecture for cluster computing systems, achieving best-in-class performance in a variety of workloads while providing a simple programming model that lets users easily and efficiently combine them.
To address the limited processing capabilities of single machines in an age of growing data volumes and stalling process speeds, Zaharia developed Resilient Distributed Datasets (RDDs). As described in his dissertation “An Architecture for Fast and General Data Processing on Large Clusters,” RDDs are a distributed memory abstraction that lets programmers perform computations on large clusters in a faulttolerant manner. He implements RDDs in the open source Apache Spark system, which matches or exceeds the performance of specialized systems in many application domains, achieving up to speeds 100 times faster for certain applications. It also offers stronger fault tolerance guarantees and allows these workloads to be combined.
Zaharia, an assistant professor at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), completed his dissertation at the University of California, Berkeley, which nominated him. A graduate of the University of Waterloo, where he won a gold medal at the ACM International Collegiate Programming Contest (ICPC) in 2005, he earned a Bachelor of Mathematics (B. Math) degree. He is a co-founder and Chief Technology Officer of Databricks, the company that is commercializing Apache Spark.
He will receive the Doctoral Dissertation Award and its $20,000 prize at the annual ACM Awards Banquet on June 20 in San Francisco, CA. Financial sponsorship of the award is provided by Google Inc.
Honorable Mention for the 2014 ACM Doctoral Dissertation Award went to John Criswell of the University of Rochester, and John C. Duchi of Stanford University. They will share a $10,000 prize, with financial sponsorship provided by Google Inc.
Criswell’s dissertation, “Secure Virtual Architecture: Security for Commodity Software Systems,” describes a compiler-based infrastructure designed to address the challenges of securing systems that use commodity operating systems like UNIX or Linux. This Secure Virtual Architecture (SVA) can protect both operating system and application code through compiler instrumentation techniques. He completed a Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign, which nominated him for this award.
Duchi’s dissertation, “Multiple Optimality Guarantees in Statistical Learning,” explores tradeoffs that occur in modern statistical and machine learning applications. The criteria for these tradeoffs – computation, communication, privacy – must be optimized to maintain statistical performance. He explores examples from optimization, and shows some of the practical benefits that a focus on multiple optimality criteria can bring about. A graduate of the University of California, Berkeley with an M.A. degree in Statistics and a Ph.D. degree in Computer Science, he was also an undergraduate and masters student at Stanford University. He was nominated by UC Berkeley for this award.
ACM will present these and other awards at the ACM Awards Banquet on June 20, 2015 in San Francisco, CA.
Background
Matei Zaharia won the 2014 Doctoral Dissertation Award for his innovative solution to tackling the surge in data processing workloads, and accommodating the speed and sophistication of complex multi-stage applications and more interactive ad-hoc queries. His work proposed a new architecture for cluster computing systems, achieving best-in-class performance in a variety of workloads while providing a simple programming model that lets users easily and efficiently combine them.
To address the limited processing capabilities of single machines in an age of growing data volumes and stalling process speeds, Zaharia developed Resilient Distributed Datasets (RDDs). As described in his dissertation “An Architecture for Fast and General Data Processing on Large Clusters,” RDDs are a distributed memory abstraction that lets programmers perform computations on large clusters in a faulttolerant manner. He implements RDDs in the open source Apache Spark system, which matches or exceeds the performance of specialized systems in many application domains, achieving up to speeds 100 times faster for certain applications. It also offers stronger fault tolerance guarantees and allows these workloads to be combined.
Zaharia, an assistant professor at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), completed his dissertation at the University of California, Berkeley, which nominated him. A graduate of the University of Waterloo, where he won a gold medal at the ACM International Collegiate Programming Contest (ICPC) in 2005, he earned a Bachelor of Mathematics (B. Math) degree. He is a co-founder and Chief Technology Officer of Databricks, the company that is commercializing Apache Spark.