The World-Wide-Web is a complex system naturally represented by a directed network of documents (nodes) connected through hyperlinks (edges). In this work, we focus on one of the most relevant topological properties that characterize the network, i.e. being scale-free. A directed network is scale-free if its in-degree and out-degree distributions have an approximate and asymptotic power-law behavior. If we consider the Web as a whole, it presents empirical evidence of such property. On the other hand, when we restrict the study of the degree distributions to specific sub-categories of websites, there is no longer strong evidence for it. For this reason, many works questioned the almost universal ubiquity of the scale-free property. Moreover, existing statistical methods to test whether an empirical degree distribution follows a power law suffer when dealing with large sample size and/or noisy data. In this paper, we propose an extension of a state-of-the-art method that overcomes such problems by applying a subsampling procedure on the graphs performing Random Walks (RW). We show on synthetic experiments that even small variations of true power-law distributed data causes the state-of-the-art method to reject the hypothesis, while the proposed method is more sound and stable under such variations. Lastly, we perform a study on 3 websites showing that indeed, depending on the sub-categories of website we consider, some accept and some refuse the hypothesis of being power-law. We argue that our method could be used to further explore sub-categories of websites in order to better characterize their topological properties deriving from different generative principles: central or peripheral.
A Robust Method for Statistical Testing of Empirical Power-Law Distributions
Garbarino, Davide;Tozzo, Veronica;Vian, Andrea;Barla, Annalisa
2020-01-01
Abstract
The World-Wide-Web is a complex system naturally represented by a directed network of documents (nodes) connected through hyperlinks (edges). In this work, we focus on one of the most relevant topological properties that characterize the network, i.e. being scale-free. A directed network is scale-free if its in-degree and out-degree distributions have an approximate and asymptotic power-law behavior. If we consider the Web as a whole, it presents empirical evidence of such property. On the other hand, when we restrict the study of the degree distributions to specific sub-categories of websites, there is no longer strong evidence for it. For this reason, many works questioned the almost universal ubiquity of the scale-free property. Moreover, existing statistical methods to test whether an empirical degree distribution follows a power law suffer when dealing with large sample size and/or noisy data. In this paper, we propose an extension of a state-of-the-art method that overcomes such problems by applying a subsampling procedure on the graphs performing Random Walks (RW). We show on synthetic experiments that even small variations of true power-law distributed data causes the state-of-the-art method to reject the hypothesis, while the proposed method is more sound and stable under such variations. Lastly, we perform a study on 3 websites showing that indeed, depending on the sub-categories of website we consider, some accept and some refuse the hypothesis of being power-law. We argue that our method could be used to further explore sub-categories of websites in order to better characterize their topological properties deriving from different generative principles: central or peripheral.File | Dimensione | Formato | |
---|---|---|---|
WAW2020-A Robust Method for Statistical Testing of Empirical Power-Law Distributions.pdf
accesso aperto
Descrizione: Contributo in volume
Tipologia:
Documento in Post-print
Dimensione
378.59 kB
Formato
Adobe PDF
|
378.59 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.