MarvelHideDroid: Reliable on-the-fly data anonymization based on Android virtualization

IRIS

Modern mobile applications harvest many user-generated events during execution using proper libraries called analytic libraries. The collection of such events allows the app developers to acquire helpful information to further improve the app. The same collected events are likewise an essential source of information for analytic library providers (e.g., Google and Meta) to understand users’ preferences. However, the user is not involved in this process. To counteract this problem, some proposals arose from legal (e.g., General Data Protection Regulation (GDPR)) and research perspectives. Concerning the latter point, some research efforts led to the definition of solutions for the Android ecosystem that allow one to limit the gathering of such data before the analytic libraries collect it or give the user control of the process. To this aim, HideDroid was the first proposal to allow the user to define different privacy levels for each app installed on the device by leveraging k-anonymity and differential privacy techniques. Subsequently, VirtualHideDroid extended HideDroid by taking advantage of the same approach to virtualized Android environments, in which an application (plugin) can run within another application (container). In this scenario, VirtualHideDroid anonymizes user event data running as the container app. However, according to standard threat models regarding virtualized Android environments, assuming that the container app is fully trusted is too optimistic in real deployments. For this reason, in this paper, we extend the work of the original VirtualHideDroid work by assuming that the same tool may be untrusted, i.e., controlled by an external attacker that has access to the container app, thereby having full access to the user data. To solve this problem, we define a new approach, named MarvelHideDroid, which gives reliable anonymization of event data in the Plugin app, even in the event of a malicious/compromised container. Moreover, and differently from VirtualHideDroid, MarvelHideDroid relies on LLM to automatically build up the generalizations required by k-anonymity, resulting in an anonymization strategy that is more reliable against modification in the data structure of the events captured by the analytic libraries. We empirically demonstrate the viability and reliability of the proposal by testing an implementation of MarvelHideDroid on a set of real Android apps in a virtualized environment.

MarvelHideDroid: Reliable on-the-fly data anonymization based on Android virtualization

Pagano F.;Verderame L.;Russo E.;Merlo A.

2025-01-01

Abstract

Modern mobile applications harvest many user-generated events during execution using proper libraries called analytic libraries. The collection of such events allows the app developers to acquire helpful information to further improve the app. The same collected events are likewise an essential source of information for analytic library providers (e.g., Google and Meta) to understand users’ preferences. However, the user is not involved in this process. To counteract this problem, some proposals arose from legal (e.g., General Data Protection Regulation (GDPR)) and research perspectives. Concerning the latter point, some research efforts led to the definition of solutions for the Android ecosystem that allow one to limit the gathering of such data before the analytic libraries collect it or give the user control of the process. To this aim, HideDroid was the first proposal to allow the user to define different privacy levels for each app installed on the device by leveraging k-anonymity and differential privacy techniques. Subsequently, VirtualHideDroid extended HideDroid by taking advantage of the same approach to virtualized Android environments, in which an application (plugin) can run within another application (container). In this scenario, VirtualHideDroid anonymizes user event data running as the container app. However, according to standard threat models regarding virtualized Android environments, assuming that the container app is fully trusted is too optimistic in real deployments. For this reason, in this paper, we extend the work of the original VirtualHideDroid work by assuming that the same tool may be untrusted, i.e., controlled by an external attacker that has access to the container app, thereby having full access to the user data. To solve this problem, we define a new approach, named MarvelHideDroid, which gives reliable anonymization of event data in the Plugin app, even in the event of a malicious/compromised container. Moreover, and differently from VirtualHideDroid, MarvelHideDroid relies on LLM to automatically build up the generalizations required by k-anonymity, resulting in an anonymization strategy that is more reliable against modification in the data structure of the events captured by the analytic libraries. We empirically demonstrate the viability and reliability of the proposal by testing an implementation of MarvelHideDroid on a set of real Android apps in a virtualized environment.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2025

Appare nelle tipologie:

01.01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0045790624008085-main.pdf accesso aperto Tipologia: Documento in versione editoriale Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri	1.85 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1224404

Citazioni

ND

0

ND

social impact