4.7 Article

A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 29, Issue 6, Pages 1144-1157

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2016.2531684

Keywords

Big sensing data; cloud computing; data chunk; data compression; similarity model; scalability; MapReduce

Funding

  1. Australian Research Council [ARC LP140100816]
  2. STRATUS Project (Security Technologies Returning Accountability, Trust and User-Centric Services in the Cloud)

Ask authors/readers for more resources

Big sensing data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity. Cloud computing provides a promising platform for big sensing data processing and storage as it provides a flexible stack of massive computing, storage, and software services in a scalable manner. Current big sensing data processing on Cloud have adopted some data compression techniques. However, due to the high volume and velocity of big sensing data, traditional data compression techniques lack sufficient efficiency and scalability for data processing. Based on specific on-Cloud data compression requirements, we propose a novel scalable data compression approach based on calculating similarity among the partitioned data chunks. Instead of compressing basic data units, the compression will be conducted over partitioned data chunks. To restore original data sets, some restoration functions and predictions will be designed. MapReduce is used for algorithm implementation to achieve extra scalability on Cloud. With real world meteorological big sensing data experiments on U-Cloud platform, we demonstrate that the proposed scalable compression approach based on data chunk similarity can significantly improve data compression efficiency with affordable data accuracy loss.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available