EconPapers    
Economics at your fingertips  
 

Local Density Estimation in High Dimensions

Xian Wu (), Moses Charikar () and Vishnu Natchu ()
Additional contact information
Xian Wu: Simons Institute and University of California Berkeley, Berkeley, California 94705
Moses Charikar: Department of Computer Science, Stanford University, Stanford, California 94305
Vishnu Natchu: Laserlike, Inc., Mountain View, California 94035

Mathematics of Operations Research, 2022, vol. 47, issue 4, 2614-2640

Abstract: An important question that arises in the study of high-dimensional vector representations learned from data are, given a set D of vectors and a query q , estimate the number of points within a specified distance threshold of q . We develop two estimators, LSH count and multiprobe count that use locality-sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and sample complexity of our schemes and demonstrate their effectiveness in experiments on a standard word embedding data set.

Keywords: Primary: 68P20; secondary: 68P10; locality-sensitive hashing; cosine similarity; spherical range counting; importance sampling (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2021.1221 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:47:y:2022:i:4:p:2614-2640

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:ormoor:v:47:y:2022:i:4:p:2614-2640