The World-Wide-Web is a complex system naturally represented by a directed network of documents (nodes) connected through hyperlinks (edges). In this work, we focus on one of the most relevant topological properties that characterize the network, i.e. being scale-free. A directed network is scale-free if its in-degree and out-degree distributions have an approximate and asymptotic power-law behavior. If we consider the Web as a whole, it presents empirical evidence of such property. On the other hand, when we restrict the study of the degree distributions to specific sub-categories of websites, there is no longer strong evidence for it. For this reason, many works questioned the almost universal ubiquity of the scale-free property. Moreover, existing statistical methods to test whether an empirical degree distribution follows a power law suffer when dealing with large sample size and/or noisy data. In this paper, we propose an extension of a state-of-the-art method that overcomes such problems by applying a subsampling procedure on the graphs performing Random Walks (RW). We show on synthetic experiments that even small variations of true power-law distributed data causes the state-of-the-art method to reject the hypothesis, while the proposed method is more sound and stable under such variations. Lastly, we perform a study on 3 websites showing that indeed, depending on the sub-categories of website we consider, some accept and some refuse the hypothesis of being power-law. We argue that our method could be used to further explore sub-categories of websites in order to better characterize their topological properties deriving from different generative principles: central or peripheral.

A Robust Method for Statistical Testing of Empirical Power-Law Distributions

Garbarino, Davide;Tozzo, Veronica;Vian, Andrea;Barla, Annalisa
2020-01-01

Abstract

The World-Wide-Web is a complex system naturally represented by a directed network of documents (nodes) connected through hyperlinks (edges). In this work, we focus on one of the most relevant topological properties that characterize the network, i.e. being scale-free. A directed network is scale-free if its in-degree and out-degree distributions have an approximate and asymptotic power-law behavior. If we consider the Web as a whole, it presents empirical evidence of such property. On the other hand, when we restrict the study of the degree distributions to specific sub-categories of websites, there is no longer strong evidence for it. For this reason, many works questioned the almost universal ubiquity of the scale-free property. Moreover, existing statistical methods to test whether an empirical degree distribution follows a power law suffer when dealing with large sample size and/or noisy data. In this paper, we propose an extension of a state-of-the-art method that overcomes such problems by applying a subsampling procedure on the graphs performing Random Walks (RW). We show on synthetic experiments that even small variations of true power-law distributed data causes the state-of-the-art method to reject the hypothesis, while the proposed method is more sound and stable under such variations. Lastly, we perform a study on 3 websites showing that indeed, depending on the sub-categories of website we consider, some accept and some refuse the hypothesis of being power-law. We argue that our method could be used to further explore sub-categories of websites in order to better characterize their topological properties deriving from different generative principles: central or peripheral.
File in questo prodotto:
File Dimensione Formato  
WAW2020-A Robust Method for Statistical Testing of Empirical Power-Law Distributions.pdf

accesso aperto

Descrizione: Contributo in volume
Tipologia: Documento in Post-print
Dimensione 378.59 kB
Formato Adobe PDF
378.59 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1015043
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact