4.3 Article

UAV-enabled computation migration for complex missions: A reinforcement learning approach

期刊

IET COMMUNICATIONS
卷 14, 期 15, 页码 2472-2480

出版社

INST ENGINEERING TECHNOLOGY-IET
DOI: 10.1049/iet-com.2019.1188

关键词

decision making; learning (artificial intelligence); autonomous aerial vehicles; remotely operated vehicles; Markov processes; UAV-enabled computation migration; complex missions; reinforcement learning approach; computation offloading; remote areas; traditional edge infrastructures; unmanned aerial vehicle-enabled edge; near-users edge computing service; computation migration problem; typical task-flows; proper UAV; UAV-ground communication data rate; UAV location; missions response time; computation migration decision making problem; advantage actor-critic reinforcement; average response time

资金

  1. National Natural Science Foundation of China [61671295]
  2. Shanghai Key Laboratory of Digital Media Processing
  3. National Fundamental Research Key Project [JCKY2017203B082]
  4. Key Project in the Science and Technology on Communication Network Laboratory [KX172600030]

向作者/读者索取更多资源

The implementationof computation offloading is a challenging issue in the remote areas where traditional edge infrastructures are sparsely deployed. In this study, the authors propose a unmanned aerial vehicle (UAV)-enabled edge computing framework, where a group of UAVs fly around to provide the near-users edge computing service. They study the computation migration problem for the complex missions, which can be decomposed as some typical task-flows considering the inter-dependency of tasks. Each time a task appears, it should be allocated to a proper UAV for execution, which is defined as the computation migration or task migration. Since the UAV-ground communication data rate is strongly associated with the UAV location, selecting a proper UAV to execute each task will largely benefit the missions response time. They formulate the computation migration decision making problem as a Markov decision process, in which the state contains the extracted observations from the environment. To cope with the dynamics of the environment, they propose an advantage actor-critic reinforcement learning approach to learn the near-optimal policy on-the-fly. Simulation results show that the proposed approach has a desirable convergence property, and can significantly reduce the average response time of missions compared with the benchmark greedy method.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据