Journal
OPERATIONS RESEARCH
Volume 67, Issue 1, Pages 72-89Publisher
INFORMS
DOI: 10.1287/opre.2018.1773
Keywords
bilevel programming; attacker-defender; interdiction; learning; incomplete information; online optimization; robust optimization
Funding
- Air Force Office of Scientific Research [FA2386-12-1-3032]
- Defense Thread Defense Agency [HDTRA1-14-1-0065]
- National Science Foundation [CMMI-1634835]
- Complex Engineering Systems Institute, ISCI [ICM-FIC: P05-004-F, CONICYT: FB0816]
Ask authors/readers for more resources
We present a framework for a class of sequential decision-making problems in the context of general interdiction problems, in which a leader and a follower repeatedly interact. At each period, the leader allocates resources to disrupt the performance of the follower (e.g., as in defender-attacker or network interdiction problems), who, in turn, minimizes some cost function over a set of activities that depends on the leader's decision. Although the follower has complete knowledge of the follower's problem, the leader has only partial information and needs to learn about the cost parameters, available resources, and the follower's activities from the feedback generated by the follower's actions. We measure policy performance in terms of its time-stability, defined as the number of periods it takes for the leader to match the actions of an oracle with complete information. In particular, we propose a class of greedy and robust policies and show that these policies are weakly optimal, eventually match the oracle's actions, and provide a real-time certificate of optimality. We also study a lower bound on any policy performance based on the notion of a semioracle. Our numerical experiments demonstrate that the proposed policies consistently outperform a reasonable benchmark and perform fairly close to the semioracle.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available