Journal
JOURNAL OF SUPERCOMPUTING
Volume 72, Issue 6, Pages 2059-2079Publisher
SPRINGER
DOI: 10.1007/s11227-014-1335-2
Keywords
Hadoop; Heterogeneous cluster; MapReduce; Scheduling; Workflow
Categories
Funding
- Key Program of National Natural Science Foundation of China [61133005, 61432005]
- National Natural Science Foundation of China [61103047, 61370095]
Ask authors/readers for more resources
The MapReduce framework is considered to be an effective resolution for huge and parallel data processing. This paper treats a massive data processing workflow as a DAG graph consisting of MapReduce jobs. In a heterogeneous computing environment, the computation speed can be different even on the same slot depending on various jobs. For this problem, this paper proposes an optimized MapReduce workflow scheduling algorithm. This algorithm comprises a job prioritizing phase and a task assignment phase. First, the jobs can be classified as I/O-intensive and computing-intensive, and the priorities of all jobs are computed according to their corresponding types. Then, the suitable slots are allocated for each block, and the MapReduce tasks in the workflow are scheduled with respect to data locality. The experimental results show that the optimized MapReduce workflow scheduling algorithm can improve the performance of task scheduling and the rationality of resources allocation in heterogeneous computing.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available