☆ 4.7 Article

Optimal passenger-seeking policies on E-hailing platforms using Markov decision process and imitation learning

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES (2020)

期刊

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES

卷 111, 期 -, 页码 91-113

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.trc.2019.12.005

关键词

Markov Decision Process (MDP); Imitation learning; E-hailing

类别

Transportation Science & Technology

资金

[Michigan/DiDi 17-PAF07456]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Vacant taxi drivers' passenger seeking process in a road network generates additional vehicle miles traveled, adding congestion and pollution into the road network and the environment. This paper aims to employ a Markov Decision Process (MDP) to model idle e-hailing drivers' optimal sequential decisions in passenger-seeking. Transportation network companies (TNC) or e-hailing (e.g., Didi, Uber) drivers exhibit different behaviors from traditional taxi drivers because e-hailing drivers do not need to actually search for passengers. Instead, they reposition themselves so that the matching platform can match a passenger. Accordingly, we incorporate e-hailing drivers' new features into our MDP model. The reward function used in the MDP model is uncovered by leveraging an inverse reinforcement learning technique. We then use 44,160 Didi drivers' 3-day trajectories to train the model. To validate the effectiveness of the model, a Monte Carlo simulation is conducted to simulate the performance of drivers under the guidance of the optimal policy, which is then compared with the performance of drivers following one baseline heuristic, namely, the local hotspot strategy. The results show that our model is able to achieve a 17.5% improvement over the local hotspot strategy in terms of the rate of return. The proposed MDP model captures the supply-demand ratio considering the fact that the number of drivers in this study is sufficiently large and thus the number of unmatched orders is assumed to be negligible. To better incorporate the competition among multiple drivers into the model, we have also devised and calibrated a dynamic adjustment strategy of the order matching probability.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.7

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Where Reinforcement Learning Meets Process Control: Review and Guidelines

Ruan de Rezende Faria, Bruno Didier Olivier Capron, Argimiro Resende Secchi, Mauricio B. de Souza Jr

Summary: This paper provides a literature review on the application of reinforcement learning in process control and optimization. It introduces new perspectives on simulation-based training, transfer learning, and online process control, and presents a framework for hyperparameter optimization to achieve feasible algorithms and deep neural networks. The study also demonstrates an experiment in batch process control using the deep-deterministic-policy-gradient algorithm modified with adversarial imitation learning.

PROCESSES (2022)