4.4 Article

Efficient and effective similarity search over probabilistic data based on Earth Mover's Distance

期刊

VLDB JOURNAL
卷 21, 期 4, 页码 535-559

出版社

SPRINGER
DOI: 10.1007/s00778-011-0258-2

关键词

Probabilistic data management; Similarity search; Earth mover's distance; Tree-based indexing

资金

  1. Singapore NRF [R-252-000-376-279]
  2. National Natural Science Foundation of China [60933001, 61003058]
  3. Fundamental Research Funds for the Central Universities [N100704001]
  4. National Basic Research Program of China (973 Program) [2012CB316201]

向作者/读者索取更多资源

Advances in geographical tracking, multimedia processing, information extraction, and sensor networks have created a deluge of probabilistic data. While similarity search is an important tool to support the manipulation of probabilistic data, it raises new challenges to traditional relational databases. The problem stems from the limited effectiveness of the distance metrics employed by existing database systems. On the other hand, several more complicated distance operators have proven their values for better distinguishing ability in specific probabilistic domains. In this paper, we discuss the similarity search problem with respect to Earth Mover's Distance (EMD). EMD is the most successful distance metric for probability distribution comparison but is an expensive operator as it has cubic time complexity. We present a new database indexing approach to answer EMD-based similarity queries, including range queries and k-nearest neighbor queries on probabilistic data. Our solution utilizes primal-dual theory from linear programming and employs a group of B (+) trees for effective candidate pruning. We also apply our filtering technique to the processing of continuous similarity queries, especially with applications to frame copy detection in real-time videos. Extensive experiments show that our proposals dramatically improve the usefulness and scalability of probabilistic data management.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据