An investigation of K‐means clustering to high and multi‐dimensional biological data

Baridam, Barileé B.; Montaz Ali, M.

doi:10.1108/K-02-2013-0028

Article navigation

Research Article| April 19 2013

An investigation of K‐means clustering to high and multi‐dimensional biological data

Barileé B. Baridam;

Barileé B. Baridam

Department of Computer Science, University of Pretoria, Pretoria, South Africa

Search for other works by this author on:

This Site

PubMed

Google Scholar

M. Montaz Ali

School of Computational and Applied Mathematics, Witwatersrand University, Johannesburg, South Africa

M. Montaz Ali can be contacted at: montaz.ali@wits.ac.za

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

M. Montaz Ali can be contacted at: montaz.ali@wits.ac.za

Publisher: Emerald Publishing

Online ISSN: 1758-7883

Print ISSN: 0368-492X

2013

Kybernetes (2013) 42 (4): 614–627.

https://doi.org/10.1108/K-02-2013-0028

Purpose

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been criticisms on its performance, in particular, for demanding the value of K before the actual clustering task. It is evident from previous researches that providing the number of clusters a priori does not in any way assist in the production of good quality clusters. The authors' investigations in this paper also confirm this finding. The purpose of this paper is to investigate further, the usefulness of the K‐means clustering in the clustering of high and multi‐dimensional data by applying it to biological sequence data.

Design/methodology/approach

The authors suggest a scheme which maps the high dimensional data into low dimensions, then show that the K‐means algorithm with pre‐processor produces good quality, compact and well‐separated clusters of the biological data mapped in low dimensions. For the purpose of clustering, a character‐to‐numeric conversion was conducted to transform the nucleic/amino acids symbols to numeric values.

Findings

A preprocessing technique has been suggested.

Originality/value

Conceptually this is a new paper with new results.

2013

You do not currently have access to this content.

Don't already have an account? Register

An investigation of K‐means clustering to high and multi‐dimensional biological data

Email Alerts

Cited By

An investigation of K‐means clustering to high and multi‐dimensional biological data Available to Purchase

Sign in

Client Account

ICE Member Sign In

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

An investigation of K‐means clustering to high and multi‐dimensional biological data