Cluster Analysis of Internet Users Based on Hourly Traffic Utilization

Maria Rosario de OLIVEIRA  Rui VALADAS  Antonio PACHECO  Paulo SALVADOR  

Publication
IEICE TRANSACTIONS on Communications   Vol.E90-B   No.7   pp.1594-1607
Publication Date: 2007/07/01
Online ISSN: 1745-1345
DOI: 10.1093/ietcom/e90-b.7.1594
Print ISSN: 0916-8516
Type of Manuscript: PAPER
Category: Fundamental Theories for Communications
Keyword: 
access networks,  cluster analysis,  discriminant analysis,  principal component analysis,  Internet traffic characterization,  traffic measurements,  

Full Text: PDF(3.3MB)
>>Buy this Article


Summary: 
Internet access traffic follows hourly patterns that depend on various factors, such as the periods users stay on-line at the access point (e.g. at home or in the office) or their preferences for applications. The clustering of Internet users may provide important information for traffic engineering and billing. For example, it can be used to set up service differentiation according to hourly behavior, resource optimization based on multi-hour routing and definition of tariffs that promote Internet access in low busy hours. In this work, we propose a methodology for clustering Internet users with similar patterns of Internet utilization, according to their hourly traffic utilization. The methodology resorts to three statistical multivariate analysis techniques: cluster analysis, principal component analysis and discriminant analysis. The methodology is illustrated through measured data from two distinct ISPs, one using a CATV access network and the other an ADSL one, offering distinct traffic contracts. Principal component analysis is used as an exploratory tool. Cluster analysis is used to identify the relevant Internet usage profiles, with the partitioning around medoids and Ward's method being the preferred clustering methods. For the two data sets, these methods lead to the choice of 3 clusters with different hourly traffic utilization profiles. The cluster structure is validated through discriminant analysis. It is also evaluated in terms of several characteristics of the user traffic not used in the cluster analysis, such as the type of applications, the amount of downloaded traffic, the activity duration and the transfer rate, resulting in coherent outcomes.