A new approach to web users clustering and validation: a divergence‐based scheme

Koutsonikola, Vassiliki A.; Petridou, Sophia G.; Vakali, Athena I.; Papadimitriou, Georgios I.

doi:10.1108/17440080910983583

Article navigation

Article Contents

Introduction
Research background review
Motivation and contribution
Distance measures
Notation
Distance measures and the role of divergence
Divergence‐oriented clustering
Problem formulation
The KL‐divergence clustering approach
Clustering phases
Algorithm
Distances' scaling
Clustering validation
Experimentation
Clustering over synthetic datasets
Experimentation under real datasets
Data workload
Clustering evaluation
Conclusions and future work
Notes
References
Further Reading

Research Article| August 28 2009

A new approach to web users clustering and validation: a divergence‐based scheme

Vassiliki A. Koutsonikola;

Vassiliki A. Koutsonikola

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

Vassiliki A. Koutsonikola can be contacted at: vkoutson@csd.auth.gr

Search for other works by this author on:

This Site

PubMed

Google Scholar

Sophia G. Petridou;

Sophia G. Petridou

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

Search for other works by this author on:

This Site

PubMed

Google Scholar

Athena I. Vakali;

Athena I. Vakali

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

Search for other works by this author on:

This Site

PubMed

Google Scholar

Georgios I. Papadimitriou

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Vassiliki A. Koutsonikola can be contacted at: vkoutson@csd.auth.gr

Publisher: Emerald Publishing

Online ISSN: 1744-0092

Print ISSN: 1744-0084

2009

International Journal of Web Information Systems (2009) 5 (3): 348–371.

https://doi.org/10.1108/17440080910983583

Purpose

Web users' clustering is an important mining task since it contributes in identifying usage patterns, a beneficial task for a wide range of applications that rely on the web. The purpose of this paper is to examine the usage of Kullback‐Leibler (KL) divergence, an information theoretic distance, as an alternative option for measuring distances in web users clustering.

Design/methodology/approach

KL‐divergence is compared with other well‐known distance measures and clustering results are evaluated using a criterion function, validity indices, and graphical representations. Furthermore, the impact of noise (i.e. occasional or mistaken page visits) is evaluated, since it is imperative to assess whether a clustering process exhibits tolerance in noisy environments such as the web.

Findings

The proposed KL clustering approach is of similar performance when compared with other distance measures under both synthetic and real data workloads. Moreover, imposing extra noise on real data, the approach shows minimum deterioration among most of the other conventional distance measures.

Practical implications

The experimental results show that a probabilistic measure such as KL‐divergence has proven to be quite efficient in noisy environments and thus constitute a good alternative, the web users clustering problem.

Originality/value

This work is inspired by the usage of divergence in clustering of biological data and it is introduced by the authors in the area of web clustering. According to the experimental results presented in this paper, KL‐divergence can be considered as a good alternative for measuring distances in noisy environments such as the web.

2009

You do not currently have access to this content.

Don't already have an account? Register

A new approach to web users clustering and validation: a divergence‐based scheme

Email Alerts

Cited By

A new approach to web users clustering and validation: a divergence‐based scheme Available to Purchase

Sign in

Client Account

ICE Member Sign In

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

A new approach to web users clustering and validation: a divergence‐based scheme