Many solutions for coarse geolocating of users at the time they post a message exist. However, for many important applications, like traffic monitoring and event detection, finer geolocation at the level of city neighborhoods, i.e., at a sub-city level, is needed. Data-driven approaches often do not guarantee good accuracy and efficiency due to the higher number of sub-city level positions to be estimated and the low availability of balanced and large training sets. We claim that external information sources overcome limitations of data-driven approaches in achieving good accuracy for sub-city level geolocation and we present a knowledge-driven approach achieving good results once the reference area of a message is known. Our algorithm, called Sherloc, exploits toponyms in the message, extracts their semantic from a geographic gazetteer, and embeds them into a metric space that captures the semantic distance among them. We identify the semantically closest toponyms to a message and then cluster them with respect to their spatial locations. Sherloc requires no prior training, it can infer the location at sub-city level with high accuracy, and it is not limited to geolocating on a fixed spatial grid.

Sherloc: a knowledge-driven algorithm for geolocating microblog messages at sub-city level

Di Rocco L.;Dassereto F.;Catania B.;Guerrini G.
2021-01-01

Abstract

Many solutions for coarse geolocating of users at the time they post a message exist. However, for many important applications, like traffic monitoring and event detection, finer geolocation at the level of city neighborhoods, i.e., at a sub-city level, is needed. Data-driven approaches often do not guarantee good accuracy and efficiency due to the higher number of sub-city level positions to be estimated and the low availability of balanced and large training sets. We claim that external information sources overcome limitations of data-driven approaches in achieving good accuracy for sub-city level geolocation and we present a knowledge-driven approach achieving good results once the reference area of a message is known. Our algorithm, called Sherloc, exploits toponyms in the message, extracts their semantic from a geographic gazetteer, and embeds them into a metric space that captures the semantic distance among them. We identify the semantically closest toponyms to a message and then cluster them with respect to their spatial locations. Sherloc requires no prior training, it can infer the location at sub-city level with high accuracy, and it is not limited to geolocating on a fixed spatial grid.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1017984
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact