This article investigates the feasibility of implementing a reinforcement learning agent able to plan the trajectory of a simple automated vehicle 2D model in a motorway simulation. The goal is to use it to implement a non-player vehicle in serious games for driving. The agent extends a Deep Q Learning agent developed by Eduard Leurent in Stable Baselines by adding rewards in order to better meet the traffic laws. The motorway environment was enhanced as well, in order to increase realism. A multilayer perceptron model, processing cinematic inputs from the ego and other vehicles, was tested in different traffic conditions and outperformed the original model and other policies such as a heuristic and a minimal-reward one. Our experience stresses the importance of defining episode metrics to assess agent behavior, keeping into accounts factors related to safety (e.g., keeping a safe time to collision) and consumption (e.g., limiting accelerations and decelerations). This is key to define rewards and penalties able to properly train the model to meet the traffic laws while keeping a high-speed performance.
Adapting Autonomous Agents for Automotive Driving Games
Bellotti F.;Berta R.;Capello A.;Cossu M.;De Gloria A.;Lazzaroni L.;
2021-01-01
Abstract
This article investigates the feasibility of implementing a reinforcement learning agent able to plan the trajectory of a simple automated vehicle 2D model in a motorway simulation. The goal is to use it to implement a non-player vehicle in serious games for driving. The agent extends a Deep Q Learning agent developed by Eduard Leurent in Stable Baselines by adding rewards in order to better meet the traffic laws. The motorway environment was enhanced as well, in order to increase realism. A multilayer perceptron model, processing cinematic inputs from the ego and other vehicles, was tested in different traffic conditions and outperformed the original model and other policies such as a heuristic and a minimal-reward one. Our experience stresses the importance of defining episode metrics to assess agent behavior, keeping into accounts factors related to safety (e.g., keeping a safe time to collision) and consumption (e.g., limiting accelerations and decelerations). This is key to define rewards and penalties able to properly train the model to meet the traffic laws while keeping a high-speed performance.File | Dimensione | Formato | |
---|---|---|---|
Leurent caric - correzione.pdf
accesso chiuso
Tipologia:
Documento in Post-print
Dimensione
551.59 kB
Formato
Adobe PDF
|
551.59 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.