4.7 Article

On-Board Deep Q-Network for UAV-Assisted Online Power Transfer and Data Collection

期刊

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY
卷 68, 期 12, 页码 12215-12226

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TVT.2019.2945037

关键词

Unmanned aerial vehicle; microwave power transfer; online resource allocation; deep reinforcement learning; Markov decision process

资金

  1. National Funds through FCT/MCTES (Portuguese Foundation for Science and Technology), within the CISTER Research Unit [CEC/04234]
  2. Operational Competitiveness Programme and Internationalization (COMPETE 2020) through the European Regional Development Fund (ERDF)
  3. FCT [POCI-01-0145-FEDER-029074]

向作者/读者索取更多资源

Unmanned Aerial Vehicles (UAVs) with Microwave Power Transfer (MPT) capability provide a practical means to deploy a large number of wireless powered sensing devices into areas with no access to persistent power supplies. The UAV can charge the sensing devices remotely and harvest their data. A key challenge is online MPT and data collection in the presence of on-board control of a UAV (e.g., patrolling velocity) for preventing battery drainage and data queue overflow of the devices, while up-to-date knowledge on battery level and data queue of the devices is not available at the UAV. In this paper, an on-board deep Q-network is developed to minimize the overall data packet loss of the sensing devices, by optimally deciding the device to be charged and interrogated for data collection, and the instantaneous patrolling velocity of the UAV. Specifically, we formulate a Markov Decision Process (MDP) with the states of battery level and data queue length of devices, channel conditions, and waypoints given the trajectory of the UAV; and solve it optimally with Q-learning. Furthermore, we propose the on-board deep Q-network that enlarges the state space of the MDP, and a deep reinforcement learning based scheduling algorithm that asymptotically derives the optimal solution online, even when the UAV has only outdated knowledge on the MDP states. Numerical results demonstrate that our deep reinforcement learning algorithm reduces the packet loss by at least 69.2%, as compared to existing non-learning greedy algorithms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据