☆ 4.4 Article

Enhancing Reliability of Workflow Execution Using Task Replication and Spot Instances

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS (2016)

Journal

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS

Volume 10, Issue 4, Pages -

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/2815624

Keywords

Algorithms; Reliability; Performance; Experimentation; Fault tolerance; workflows; cloud; scheduling; spot instances; task duplication; task retry

Funding

ARC (Australian Research Council)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Cloud environments offer low-cost computing resources as a subscription-based service. These resources are elastically scalable and dynamically provisioned. Furthermore, cloud providers have also pioneered new pricing models like spot instances that are cost-effective. As a result, scientific workflows are increasingly adopting cloud computing. However, spot instances are terminated when the market price exceeds the users bid price. Likewise, cloud is not a utopian environment. Failures are inevitable in such large complex distributed systems. It is also well studied that cloud resources experience fluctuations in the delivered performance. These challenges make fault tolerance an important criterion in workflow scheduling. This article presents an adaptive, just-in-time scheduling algorithm for scientific workflows. This algorithm judiciously uses both spot and on-demand instances to reduce cost and provide fault tolerance. The proposed scheduling algorithm also consolidates resources to further minimize execution time and cost. Extensive simulations show that the proposed heuristics are fault tolerant and are effective, especially under short deadlines, providing robust schedules with minimal makespan and cost.

Enhancing Reliability of Workflow Execution Using Task Replication and Spot Instances

Journal

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Enhancing Reliability of Workflow Execution Using Task Replication and Spot Instances

Journal

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper