Typosquatting consists of registering Internet domain names that closely resemble legitimate, reputable, and well-known ones (e.g., Farebook instead of Facebook). This cyber-attack aims to distribute malware or to phish the victims users (i.e., stealing their credentials) by mimicking the aspect of the legitimate webpage of the targeted organisation. The majority of the detection approaches proposed so far generate possible typo-variants of a legitimate domain, creating thus blacklists which can be used to prevent users from accessing typo-squatted domains. Only few studies have addressed the problem of Typosquatting detection by leveraging a passive Domain Name System (DNS) traffic analysis. In this work, we follow this approach, and additionally exploit machine learning to learn a similarity measure between domain names capable of detecting typo-squatted ones from the analyzed DNS traffic. We validate our approach on a large-scale dataset consisting of 4 months of traffic collected from a major Italian Internet Service Provider.

Deepsquatting: Learning-based typosquatting detection at deeper domain levels

Biggio, Battista;Roli, Fabio
2017-01-01

Abstract

Typosquatting consists of registering Internet domain names that closely resemble legitimate, reputable, and well-known ones (e.g., Farebook instead of Facebook). This cyber-attack aims to distribute malware or to phish the victims users (i.e., stealing their credentials) by mimicking the aspect of the legitimate webpage of the targeted organisation. The majority of the detection approaches proposed so far generate possible typo-variants of a legitimate domain, creating thus blacklists which can be used to prevent users from accessing typo-squatted domains. Only few studies have addressed the problem of Typosquatting detection by leveraging a passive Domain Name System (DNS) traffic analysis. In this work, we follow this approach, and additionally exploit machine learning to learn a similarity measure between domain names capable of detecting typo-squatted ones from the analyzed DNS traffic. We validate our approach on a large-scale dataset consisting of 4 months of traffic collected from a major Italian Internet Service Provider.
2017
9783319701684
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1091986
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 5
social impact