Adapting Autonomous Agents for Automotive Driving Games

IRIS

This article investigates the feasibility of implementing a reinforcement learning agent able to plan the trajectory of a simple automated vehicle 2D model in a motorway simulation. The goal is to use it to implement a non-player vehicle in serious games for driving. The agent extends a Deep Q Learning agent developed by Eduard Leurent in Stable Baselines by adding rewards in order to better meet the traffic laws. The motorway environment was enhanced as well, in order to increase realism. A multilayer perceptron model, processing cinematic inputs from the ego and other vehicles, was tested in different traffic conditions and outperformed the original model and other policies such as a heuristic and a minimal-reward one. Our experience stresses the importance of defining episode metrics to assess agent behavior, keeping into accounts factors related to safety (e.g., keeping a safe time to collision) and consumption (e.g., limiting accelerations and decelerations). This is key to define rewards and penalties able to properly train the model to meet the traffic laws while keeping a high-speed performance.

Adapting Autonomous Agents for Automotive Driving Games

Campodonico G.;Bellotti F.;Berta R.;Capello A.;Cossu M.;De Gloria A.;Lazzaroni L.;Taccioli T.;Davio F.

2021-01-01

Abstract

This article investigates the feasibility of implementing a reinforcement learning agent able to plan the trajectory of a simple automated vehicle 2D model in a motorway simulation. The goal is to use it to implement a non-player vehicle in serious games for driving. The agent extends a Deep Q Learning agent developed by Eduard Leurent in Stable Baselines by adding rewards in order to better meet the traffic laws. The motorway environment was enhanced as well, in order to increase realism. A multilayer perceptron model, processing cinematic inputs from the ego and other vehicles, was tested in different traffic conditions and outperformed the original model and other policies such as a heuristic and a minimal-reward one. Our experience stresses the importance of defining episode metrics to assess agent behavior, keeping into accounts factors related to safety (e.g., keeping a safe time to collision) and consumption (e.g., limiting accelerations and decelerations). This is key to define rewards and penalties able to properly train the model to meet the traffic laws while keeping a high-speed performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	ISBN
	
				978-3-030-92181-1
978-3-030-92182-8
			
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Leurent caric - correzione.pdf accesso chiuso Tipologia: Documento in Post-print Dimensione 551.59 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	551.59 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1094517

Citazioni

ND

6

ND

social impact