This article investigates the feasibility of implementing a reinforcement learning agent able to plan the trajectory of a simple automated vehicle 2D model in a motorway simulation. The goal is to use it to implement a non-player vehicle in serious games for driving. The agent extends a Deep Q Learning agent developed by Eduard Leurent in Stable Baselines by adding rewards in order to better meet the traffic laws. The motorway environment was enhanced as well, in order to increase realism. A multilayer perceptron model, processing cinematic inputs from the ego and other vehicles, was tested in different traffic conditions and outperformed the original model and other policies such as a heuristic and a minimal-reward one. Our experience stresses the importance of defining episode metrics to assess agent behavior, keeping into accounts factors related to safety (e.g., keeping a safe time to collision) and consumption (e.g., limiting accelerations and decelerations). This is key to define rewards and penalties able to properly train the model to meet the traffic laws while keeping a high-speed performance.

Adapting Autonomous Agents for Automotive Driving Games

Bellotti F.;Berta R.;Capello A.;Cossu M.;De Gloria A.;Lazzaroni L.;
2021-01-01

Abstract

This article investigates the feasibility of implementing a reinforcement learning agent able to plan the trajectory of a simple automated vehicle 2D model in a motorway simulation. The goal is to use it to implement a non-player vehicle in serious games for driving. The agent extends a Deep Q Learning agent developed by Eduard Leurent in Stable Baselines by adding rewards in order to better meet the traffic laws. The motorway environment was enhanced as well, in order to increase realism. A multilayer perceptron model, processing cinematic inputs from the ego and other vehicles, was tested in different traffic conditions and outperformed the original model and other policies such as a heuristic and a minimal-reward one. Our experience stresses the importance of defining episode metrics to assess agent behavior, keeping into accounts factors related to safety (e.g., keeping a safe time to collision) and consumption (e.g., limiting accelerations and decelerations). This is key to define rewards and penalties able to properly train the model to meet the traffic laws while keeping a high-speed performance.
2021
978-3-030-92181-1
978-3-030-92182-8
File in questo prodotto:
File Dimensione Formato  
Leurent caric - correzione.pdf

accesso chiuso

Tipologia: Documento in Post-print
Dimensione 551.59 kB
Formato Adobe PDF
551.59 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1094517
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact