A Single-step Clustering Algorithm Based on a New Information-theoretic Sample Association Metric Definition

Turgay Temel

Abstract


A single-step information-theoretic algorithm that is able to identify possible clusters in dataset is presented. The proposed algorithm consists in representation of data scatter in terms of similarity-based sample entropy and probability descriptions. By using these quantities, an information-theoretic association metric called mutual ambiguity between samples is defined, which then is to be employed in determining particular samples called cluster identifiers. For forming individual clusters corresponding to cluster identifiers, a cluster relevance rule is defined. Since cluster identifiers and associative cluster member samples can be identified without recursive or iterative search, the algorithm single-step. The algorithm is tested and justified with experiments by using synthetic and anonymous real datasets. Simulation results demonstrate that the proposed algorithm also exhibits more reliable performance in statistical sense compared to major algorithms.

Keywords


Clustering; Machine learning; Data mining; Information theory

Full Text:

PDF


DOI: http://dx.doi.org/10.14311/NNW.2017.%25x

Refbacks

  • There are currently no refbacks.


Should you encounter an error (non-functional link, missing or misleading information, application crash), please let us know at nnw.ojs@fd.cvut.cz.
Please, do not use the above address for non-OJS-related queries (manuscript status, etc.).
For your convenience we maintain a list of frequently asked questions here. General queries to items not covered by this FAQ shall be directed to the journal editoral office at nnw@fd.cvut.cz.