The efficacy of Hyper-Heuristics in tackling NP-hard Combinatorial Optimization problems has been widely shown by the extensive literature on the topic [1] [2]. Moreover, the recent successful results in Deep Reinforcement Learning research (see [3] for a thorough overview) lead to the idea of applying such methodologies in an online optimization setting. In this work, an optimization problem arising in a Cloud Computing setting is presented and discussed. Then, a selection Hyper-Heuristic using different conflicting policies to select among low-level heuristics is detailed. Such heuristics are selected according to one of the conflicting policy according to a distribution defined by a Multi-Objective Simulated Annealing [4] procedure, which explores the Pareto-Front by varying the parameters of the distribution, thus obtaining a well-sampled Pareto-Front. In other words, the hyper-heuristic learns to optimize while optimizing the learning. In order to test the effectiveness of the method, an experimental campaign on a case of practical interest is presented and discussed. References: [1] Ke-Lin Du, M.N.S. Swamy, Search and Optimization by Metaheuristics: Techniques and Algorithms Inspired by Nature, Birkhauser, 2016. [2] Michel Gendreau, Jean-Yves Potvin (Eds.), Handbook of Metaheuristics, Springer, 2010. [3] Richard S. Sutton, Andrew J. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 2018, Second edition. [4] Mahmoud H. Alrefaei, Ali H. Diabat, A simulated annealing technique for multi-objective simulation optimization, Journal of Cleaner Production, Volume 215, Issue 8, 15 December 2009, Pages 3029-3035.

A Multi-Objective Reinforcement Learning Based Hyper-Heuristic

Massimo Paolucci;Roberto Ronco
2019-01-01

Abstract

The efficacy of Hyper-Heuristics in tackling NP-hard Combinatorial Optimization problems has been widely shown by the extensive literature on the topic [1] [2]. Moreover, the recent successful results in Deep Reinforcement Learning research (see [3] for a thorough overview) lead to the idea of applying such methodologies in an online optimization setting. In this work, an optimization problem arising in a Cloud Computing setting is presented and discussed. Then, a selection Hyper-Heuristic using different conflicting policies to select among low-level heuristics is detailed. Such heuristics are selected according to one of the conflicting policy according to a distribution defined by a Multi-Objective Simulated Annealing [4] procedure, which explores the Pareto-Front by varying the parameters of the distribution, thus obtaining a well-sampled Pareto-Front. In other words, the hyper-heuristic learns to optimize while optimizing the learning. In order to test the effectiveness of the method, an experimental campaign on a case of practical interest is presented and discussed. References: [1] Ke-Lin Du, M.N.S. Swamy, Search and Optimization by Metaheuristics: Techniques and Algorithms Inspired by Nature, Birkhauser, 2016. [2] Michel Gendreau, Jean-Yves Potvin (Eds.), Handbook of Metaheuristics, Springer, 2010. [3] Richard S. Sutton, Andrew J. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 2018, Second edition. [4] Mahmoud H. Alrefaei, Ali H. Diabat, A simulated annealing technique for multi-objective simulation optimization, Journal of Cleaner Production, Volume 215, Issue 8, 15 December 2009, Pages 3029-3035.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1066348
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact