Moses Charikar

Digital Library

ACM Fellows

USA - 2020


For design of efficient algorithmic techniques for big data, hashing, approximation algorithms, and metric embeddings

Press Release

ACM Paris Kanellakis Theory and Practice Award

USA - 2012


With Andrei Broder and Piotr Indyk, for their groundbreaking work on Locality-Sensitive Hashing that has had great impact in many fields of computer science including computer vision, databases, information retrieval, machine learning, and signal processing.

Hashing is one of the fundamental techniques in computer science: generating challenging theoretical questions as well as finding application in almost all fields of computer science. In many of these applications a refined form of hashing is required, where keys that are close in the original space are more likely to be mapped to the same hash value. This is the problem that locality-sensitive hashing addresses. As a result similarity search problems can be solved in the hashed space instead of the original space, leading often to significant improvements either in efficiency or quality.

Motivated by the need to efficiently compute near-duplicate web pages, Andrei Broder introduced specific locality-sensitive hash functions, called minhash functions, and used them to estimate the similarity of sets. Piotr Indyk and the late Rajeev Motwani extended locality-sensitive hash functions to a wider range of distance functions and showed how to use them to design approximate nearest neighbor algorithms with sub-linear query time. Moses Charikar subsequently introduced a highly efficient family of locality-sensitive hash functions, called simhash functions, for angular distances as well as a new variant of the nearest neighbor search algorithm.

Locality-sensitive hashing techniques have been widely used in many fields of computer science including computer vision, databases, information retrieval, data mining, machine learning, and signal processing. Furthermore, they have inspired the development of other hashing-based methods for similarity search such as parameter sensitive hashing, semantic hashing, spectral hashing, and kernelized LSH.

The work has also enabled other significant theoretical developments. In particular, it has led to the design of nearest neighbor algorithms for more complex metric spaces that play an important role in various applications. It has also resulted in new discoveries in, and connections to, other areas such as vector quantization, communication complexity, and streaming algorithms.