Novas técnicas de amostragem tendenciosa para os algoritmos de análise de agrupamento k-médias e DBSCAN

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal do Espírito Santo

Resumo

The cluster analysis is a set of techniques designed to identify groups of similar elements in a dataset. Such techniques are used in many different applica tions, such as image segmentation, signal processing, data compression, unsuper vised learning, selection of characteristics, sampling, among others. Although they are important in a wide range of applications, the use of these techniques in large cardinality data is a problem due to the poor scalability of several traditional al gorithms. One way to circumvent this problem is to sample, after all, reducing the cardinality of data sets greatly reduces the computational effort required by the methods. This thesis presents three new sampling methods specifically designed to be used in conjunction with the cluster analysis algorithms k-means and DBSCAN. The experimental results show that those designed for the DBSCAN algorithm obtained better results than the competitors. However, the proposed sampling ap proach for k-means returned lower quality results than DENDIS, a recently proposed method.

Descrição

Palavras-chave

Amostragem, Unsupervised learning, Cluster analysis, Sampling, Aprendizado não supervisionado, Análise de agrupamento

Citação

Avaliação

Revisão

Suplementado Por

Referenciado Por