4.7 Article

A Reliable Reinforcement Learning for Resource Allocation in Uplink NOMA-URLLC Networks

Journal

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS
Volume 21, Issue 8, Pages 5989-6002

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TWC.2022.3144618

Keywords

Deep SARSA-lambda learning; non-orthogonal multiple access; power allocation; ultra-reliable low-latency communication; user clustering

Funding

  1. U.K. Engineering and Physical Science Research Council (EPSRC) [EP/R006466/1]

Ask authors/readers for more resources

This paper proposes a deep learning approach for optimizing uplink resource allocation in NOMA-URLLC networks. It addresses challenges such as user clustering, instantaneous feedback system, and optimal resource allocation, and achieves better performance compared to traditional OMA systems.
In this paper, we propose a deep state-action-reward-state-action (SARSA) lambda learning approach for optimising the uplink resource allocation in non-orthogonal multiple access (NOMA) aided ultra-reliable low-latency communication (URLLC). To reduce the mean decoding error probability in time-varying network environments, this work designs a reliable learning algorithm for providing a long-term resource allocation, where the reward feedback is based on the instantaneous network performance. With the aid of the proposed algorithm, this paper addresses three main challenges of the reliable resource sharing in NOMA-URLLC networks: I) user clustering; 2) Instantaneous feedback system; and 3) Optimal resource allocation. All of these designs interact with the considered communication environment. Lastly, we compare the performance of the proposed algorithm with conventional Q-learning and SARSA Q-learning algorithms. The simulation outcomes show that: 1) Compared with the traditional Q learning algorithms, the proposed solution is able to converge within 200 episodes for providing as low as 10(-2) long-term mean error; 2) NOMA assisted URLLC outperforms traditional OMA systems in terms of decoding error probabilities; and 3) The proposed feedback system is efficient for the long-term learning process.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available