Journal
STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION
Volume 59, Issue 5, Pages 1521-1542Publisher
SPRINGER
DOI: 10.1007/s00158-018-2145-6
Keywords
Reinforcement learning; Tradespace; Decision making under uncertainty; Sequential decision process; Design; Multi-fidelity
Categories
Funding
- National Science Foundation (NSF) under NSF [CMMI-1455444]
- College of Engineering at Pennsylvania State University
Ask authors/readers for more resources
In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available