4.5 Article

Improving MapReduce Performance by Balancing Skewed Loads

期刊

CHINA COMMUNICATIONS
卷 11, 期 8, 页码 85-108

出版社

CHINA INST COMMUNICATIONS
DOI: 10.1109/CC.2014.6911091

关键词

MapReduce; cloud computing; skewed loads; performance prediction; support vector machines

资金

  1. National High-Tech Research and Development Plan of China [2011AA01A204, 2012AA01A306]
  2. National Natural Science Foundation of China [61202041, 91330117]

向作者/读者索取更多资源

Map Reduce has emerged as a popular computing model used in datacenters to process large amount of datasets. In the map phase, hash partitioning is employed to distribute data that sharing the same key across data center-scale cluster nodes. However, we observe that this approach can lead to uneven data distribution, which can result in skewed loads among reduce tasks, thus hamper performance of Map Reduce systems. Moreover, worker nodes in Map Reduce systems may differ in computing capability due to (1) multiple generations of hardware in non-virtualized data centers, or (2) co-location of virtual machines in virtualized data centers. The heterogeneity among cluster nodes exacerbates the negative effects of uneven data distribution. To improve MapReduce performance in heterogeneous clusters, we propose a novel load balancing approach in the reduce phase. This approach consists of two components: (1) performance prediction for reducers that run on heterogeneous nodes based on support vector machines models, and (2) heterogeneity-aware partitioning (HAP), which balances skewed data for reduce tasks. We implement this approach as a plug-in in current MapReduce system. Experimental results demonstrate that our proposed approach distributes work evenly among reduce tasks, and improves MapReduce performance with little overhead.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据