Many solutions for coarse geolocating of users at the time they post a message exist. However, for many important applications, like traffic monitoring and event detection, finer geolocation at the level of city neighborhoods, i.e., at a sub-city level, is needed. Data-driven approaches often do not guarantee good accuracy and efficiency due to the higher number of sub-city level positions to be estimated and the low availability of balanced and large training sets. We claim that external information sources overcome limitations of data-driven approaches in achieving good accuracy for sub-city level geolocation and we present a knowledge-driven approach achieving good results once the reference area of a message is known. Our algorithm, called Sherloc, exploits toponyms in the message, extracts their semantic from a geographic gazetteer, and embeds them into a metric space that captures the semantic distance among them. We identify the semantically closest toponyms to a message and then cluster them with respect to their spatial locations. Sherloc requires no prior training, it can infer the location at sub-city level with high accuracy, and it is not limited to geolocating on a fixed spatial grid.
Sherloc: a knowledge-driven algorithm for geolocating microblog messages at sub-city level
Di Rocco L.;Dassereto F.;Catania B.;Guerrini G.
2021-01-01
Abstract
Many solutions for coarse geolocating of users at the time they post a message exist. However, for many important applications, like traffic monitoring and event detection, finer geolocation at the level of city neighborhoods, i.e., at a sub-city level, is needed. Data-driven approaches often do not guarantee good accuracy and efficiency due to the higher number of sub-city level positions to be estimated and the low availability of balanced and large training sets. We claim that external information sources overcome limitations of data-driven approaches in achieving good accuracy for sub-city level geolocation and we present a knowledge-driven approach achieving good results once the reference area of a message is known. Our algorithm, called Sherloc, exploits toponyms in the message, extracts their semantic from a geographic gazetteer, and embeds them into a metric space that captures the semantic distance among them. We identify the semantically closest toponyms to a message and then cluster them with respect to their spatial locations. Sherloc requires no prior training, it can infer the location at sub-city level with high accuracy, and it is not limited to geolocating on a fixed spatial grid.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.