The fusion of different sensory information in an integrated way for an Artificial Intelligence (AI) model is the most advancing research direction in building a self-aware agent. Single data modalities are short in representing and exploiting the representation in any real scenario. Information fusion plays a vital role in a holistic understanding of a situation in addition to sensor fail-safe applications. Nowadays, training an AI model involves experience modeling from expert-generated data. These datasets are vast, highly non-linear, and high dimensional. Integrating these highly complex and dynamic experiences robustly is challenging. Understanding different structures in higher dimensional sensory input, also fusing them with lower dimensional information, has been a primary objective in the progress of powerful generative models. Recent advancements in sensory information integration for the development of explainable AI models are progressing through the combination of probabilistic self-expressive Bayesian models and Deep Learning approaches. Dynamic Bayesian Networks (DBNs) are unified general probabilistic representations and inference mechanisms for time-series domains that can be used to introduce explainability in an AI model through their causal-effect relations. Integrating DBNs with Variational Autoencoders (VAEs), to represent higher dimensional data, makes them powerful and scalable, and can be used for self-supervised learning of higher dimensional data distributions. Autonomous driving is an area where the research direction is advancing to integrate explainable AI models to facilitate the realization of full autonomy. Increasing convenience to free up time, increasing the mobility of people with disabilities, improving environmental health, minimizing human driving errors, and minimizing the economic impact through car sharing and carpooling, make the current AI research trend focus more on generalized AI models for Self-Driving cars. This thesis introduces a novel Multi-Agent Self-Awareness Architecture (MASAA) that integrates generative World Model (WM), Experience Models (Multi-Modal Perception, MMP), Active First-Person Model, Cost model (Multi-level Anomalies) through the Short-Term Memory Module (Multi-Agent Coupled Markov Jump Particle Filters, MAC-MJPFs). In the MASAA framework, a set of modules are introduced to enable an agent to localize itself while interacting with neighboring agents simultaneously. One of the main goals of MASAA is to enhance the generalizability and interpretability of an AI model. To this end, this thesis introduces a novel methodology that integrates multi-sensorial data from proprioceptive and exteroceptive sensors of an agent, coupled in a Hierarchical DBN model in an Active Inference framework. A lower-dimensional unsupervised learning stage, considering both odometry and action modalities, is carried out by applying Null Force Filtering (NFF) and a modified Growing Neural Gas (GNG) clustering algorithm, thus producing lower-dimensional knowledge. A self-supervised higher-dimensional video modality learning stage using VAEs, guided by the learned lower-dimensional vocabularies, creates integrated multi-sensorial inference knowledge. An online model-based active learning in continuous and discrete state spaces and action spaces for decision-making in the Active Inference framework is developed. These representation and decision-making models are evaluated using localization and interaction benchmarks.

Imitation learning in multi-sensor self-aware autonomous systems

ALEMAW, ABRHAM SHIFERAW
2025-04-03

Abstract

The fusion of different sensory information in an integrated way for an Artificial Intelligence (AI) model is the most advancing research direction in building a self-aware agent. Single data modalities are short in representing and exploiting the representation in any real scenario. Information fusion plays a vital role in a holistic understanding of a situation in addition to sensor fail-safe applications. Nowadays, training an AI model involves experience modeling from expert-generated data. These datasets are vast, highly non-linear, and high dimensional. Integrating these highly complex and dynamic experiences robustly is challenging. Understanding different structures in higher dimensional sensory input, also fusing them with lower dimensional information, has been a primary objective in the progress of powerful generative models. Recent advancements in sensory information integration for the development of explainable AI models are progressing through the combination of probabilistic self-expressive Bayesian models and Deep Learning approaches. Dynamic Bayesian Networks (DBNs) are unified general probabilistic representations and inference mechanisms for time-series domains that can be used to introduce explainability in an AI model through their causal-effect relations. Integrating DBNs with Variational Autoencoders (VAEs), to represent higher dimensional data, makes them powerful and scalable, and can be used for self-supervised learning of higher dimensional data distributions. Autonomous driving is an area where the research direction is advancing to integrate explainable AI models to facilitate the realization of full autonomy. Increasing convenience to free up time, increasing the mobility of people with disabilities, improving environmental health, minimizing human driving errors, and minimizing the economic impact through car sharing and carpooling, make the current AI research trend focus more on generalized AI models for Self-Driving cars. This thesis introduces a novel Multi-Agent Self-Awareness Architecture (MASAA) that integrates generative World Model (WM), Experience Models (Multi-Modal Perception, MMP), Active First-Person Model, Cost model (Multi-level Anomalies) through the Short-Term Memory Module (Multi-Agent Coupled Markov Jump Particle Filters, MAC-MJPFs). In the MASAA framework, a set of modules are introduced to enable an agent to localize itself while interacting with neighboring agents simultaneously. One of the main goals of MASAA is to enhance the generalizability and interpretability of an AI model. To this end, this thesis introduces a novel methodology that integrates multi-sensorial data from proprioceptive and exteroceptive sensors of an agent, coupled in a Hierarchical DBN model in an Active Inference framework. A lower-dimensional unsupervised learning stage, considering both odometry and action modalities, is carried out by applying Null Force Filtering (NFF) and a modified Growing Neural Gas (GNG) clustering algorithm, thus producing lower-dimensional knowledge. A self-supervised higher-dimensional video modality learning stage using VAEs, guided by the learned lower-dimensional vocabularies, creates integrated multi-sensorial inference knowledge. An online model-based active learning in continuous and discrete state spaces and action spaces for decision-making in the Active Inference framework is developed. These representation and decision-making models are evaluated using localization and interaction benchmarks.
3-apr-2025
File in questo prodotto:
File Dimensione Formato  
phdunige_4705923.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 19.94 MB
Formato Adobe PDF
19.94 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1242616
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact