Geolocation of microblog messages has been largely investigated in the lit- erature. Many solutions have been proposed that achieve good results at the city-level. Existing approaches are mainly data-driven (i.e., they rely on a training phase). However, the development of algorithms for geolocation at sub-city level is still an open problem also due to the absence of good training datasets. In this thesis, we investigate the role that external geographic know- ledge can play in geolocation approaches. We show how di)erent geographical data sources can be combined with a semantic layer to achieve reasonably accurate sub-city level geolocation. Moreover, we propose a knowledge-based method, called Sherloc, to accurately geolocate messages at sub-city level, by exploiting the presence in the message of toponyms possibly referring to the speci*c places in the target geographical area. Sherloc exploits the semantics associated with toponyms contained in gazetteers and embeds them into a metric space that captures the semantic distance among them. This allows toponyms to be represented as points and indexed by a spatial access method, allowing us to identify the semantically closest terms to a microblog message, that also form a cluster with respect to their spatial locations. In contrast to state-of-the-art methods, Sherloc requires no prior training, it is not limited to geolocating on a *xed spatial grid and it experimentally demonstrated its ability to infer the location at sub-city level with higher accuracy.
|Titolo della tesi:||The role of geographic knowledge in sub-city level geolocation algorithms|
|Data di discussione:||14-mar-2019|
|Appare nelle tipologie:||Tesi di dottorato|