4.6 Article

Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data

Journal

IEEE ACCESS
Volume 5, Issue -, Pages 24105-24119

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2017.2768385

Keywords

Message identification (M-I) divergence; discrete distribution estimation; divergence estimation; big data analysis; outlier detection

Funding

  1. China Major State Basic Research Development Program (973 Program) [2012CB316100(2)]
  2. National Natural Science Foundation of China [61771283]

Ask authors/readers for more resources

Message identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two distributions. Furthermore, by choosing an appropriate parameter of M-I divergence, it is possible to amplify the information distance between adjacent distributions while maintaining enough gap between two nonadjacent ones. Therefore, M-I divergence can play a vital role in distinguishing distributions more clearly. In this paper, we first define a parametric M-I divergence in the view of information theory and then present its major properties. In addition, we design a M-I divergence estimation algorithm by means of the ensemble estimator of the proposed weight kernel estimators, which can improve the convergence of mean squared error from O(Gamma(-j/d)) to O(F-1) (Gamma(-1)(j is an element of(0, d]). We also discuss the decision with M-I divergence for clustering or classification, and investigate its performance in a statistical sequence model of big data for the outlier detection problem.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available