Processing Large-scale Internet Topology Data to Model Autonomous System Networks
Loading...
Authors
Bakhshaliyev, Khalid
Issue Date
2020
Type
Dissertation
Language
Keywords
Internet Measurement , Network Science , Parallel Programming , Topology Modeling
Alternative Title
Abstract
The Internet is one of the most complex man-made systems in the world that also has profoundly impacted society. As a dynamically growing system, understanding Internet topology has been an ongoing effort. In order to better understand and model Internet topology, in this dissertation, we focus on the underlying network characteristics of the Autonomous Systems (AS), which provide physical connectivity among devices and their users. Autonomous systems provide regional or global connectivity to end-users or other autonomous systems. They are operated independently with different technical aptitudes. Hence, autonomous systems provide have varying topological characteristics. To map an autonomous system one needs to obtain the IP-level connectivity of routers through the traceroute tool. We utilize available data sets from measurement platforms such as Ant Census, Caida, Ripe as well as collecting data through our deployed platform, Autonomous System Mapper (ASM). Overall, for this dissertation, we mapped topologies of over 10,000 autonomous systems (out of around 70,000). We analyzed their network characteristics including degree distribution, assortativity, clustering, modularity, and communities.Collected Internet topology measurements need to be processed to infer the underlying network connectivity. In particular, unresponsive routers, IP aliases, and underlying subnetworks need to be resolved to map the genuine link-level connectivity from the traceroute measurements. These processes often involve complex analyses of different scenarios to determine the actual topology based on the measurements obtained from vantage points. For instance, finding the optimal topology with a set of unresponsive routers has shown to be NP-complete and the best approach to unresponsive router resolution requires finding certain graph structures such as stars, complete-bipartite, and triangles in the router-level connectivity graph. Processing big graph data in a fast and efficient manner is one of the challenges of network analysis. Analysis of various graph structures in large topologies requires a large number of lookups and processing of such queries can be optimized via indexing. Graph indexing techniques are developed to enhance various graph mining algorithms and Structural Graph Indexing eases sub-structure lookups for unresponsive router resolution. Additionally, the processing time for solving complex analysis is reduced by either data parallelization or functional parallelization. In this dissertation, we extend the structural graph indexing to operate on parallel computing nodes in a High Performance Computing environment. Finally, we introduce a SubNetwork Generator (SubNetG) that replicates the link-level characteristics of Autonomous Systems. SubNetG produces 2-mode bipartite network topologies that reflect the interaction between subnetworks at layer 2 and routers at layer 3 of the network stack. In our analysis of top-ranked backbone autonomous systems, we observed many have power-law distributions at the router interfaces and subnetwork size in addition to commonly reported degree distribution. In particular, generated topologies capture the relation between the layer-2 (i.e., the subnetwork and interface distributions) and the layer-3 (i.e., the degree distribution) that is missing from the current network generators that produce 1-mode graphs.