☆ 4.7 Article

Dependency-Aware Network Adaptive Scheduling of Data-Intensive Parallel Jobs

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2019)

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

卷 30, 期 3, 页码 515-529

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2018.2866993

关键词

Adaptive task scheduler; network adaptive; job dependency; data-parallel clusters

类别

Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

US National Science Foundation [CNS-1422119]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Datacenter clusters often run data-intensive jobs in parallel for improving resource utilization and cost efficiency. The performance of parallel jobs is often constrained by the cluster's hard-to-scale network bisection bandwidth. Various solutions have been proposed to address the issue, however, most of them do not consider inter-job data dependencies and schedule jobs independently from one another. In this work, we find that aggregating and co-locating the data and tasks of dependent jobs offer an extra opportunity for data locality improvement that can help to greatly enhance the performance of jobs. We propose and design Dawn, a dependency-aware network-adaptive scheduler that includes an online plan and an adaptive task scheduler. The online plan, taking job dependencies into consideration, determines where (i.e., preferred racks) to place tasks in order to proactively aggregate dependent data. The task scheduler, based on the output of online plan and dynamic network status, adaptively schedules tasks to co-locate with the dependent data in order to take advantage of data locality. We implement Dawn on Apache Yarn and evaluate it on physical and virtual clusters using various machine learning and query workloads. Results show that Dawn effectively improves cluster throughput by up to 73 and 38 percent compared to Fair Scheduler and ShuffleWatcher, respectively. Dawn not only significantly enhances the performance of jobs with dependency, but also works well for jobs without dependency.

Dependency-Aware Network Adaptive Scheduling of Data-Intensive Parallel Jobs

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Dependency-Aware Network Adaptive Scheduling of Data-Intensive Parallel Jobs

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文