☆ 4.6 Article

Choosing the number of clusters

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY (2011)

期刊

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

卷 1, 期 3, 页码 252-260

出版社

WILEY PERIODICALS, INC

DOI: 10.1002/widm.15

关键词

-

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

资金

Laboratory for Decision Choice and Analysis, Higher School of Economics, Moscow, Russian Federation

向作者/读者索取更多资源

Protocol

Reagent

摘要

The issue of determining 'the right number of clusters' is attracting ever growing interest. The paper reviews published work on the issue with respect to mixture of distributions, partition, especially in k-means clustering, and hierarchical cluster structures. Some perspective directions for further developments are outlined. (C) 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 252- 260 DOI: 10.1002/widm.15

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Chemistry, Analytical

An Algorithm for Choosing the Optimal Number of Muscle Synergies during Walking

Riccardo Ballarini, Marco Ghislieri, Marco Knaflitz, Valentina Agostini

Summary: This study describes and validates an algorithm for choosing the optimal number of muscle synergies (ChoOSyn) in motor control studies, which can overcome the limitations of VAF-based methods. The algorithm outperformed traditional approaches in terms of correct classifications, mean error, and root mean square error, demonstrating its potential to standardize the selection of muscle synergies across different research laboratories.

SENSORS (2021)

添加到收藏夹

Article Chemistry, Multidisciplinary

The dodeca-coordinated La©B8C4+/0/- molecular wheels: conflicting aromaticity versus double aromaticity

Ying-Jin Wang, Jia-Xin Zhao, Miao Yan, Lin-Yan Feng, Chang-Qing Miao, Cheng-Qi Liu

Summary: In this study, a theoretical investigation on La(c)B8C4q (q=+1, 0, -1) clusters with dodeca-coordinated La atom was reported. The La(c)B8C4q clusters exhibited fascinating molecular wheel structures, with a La atom enclosed by B8C4 monocyclic ring. Neutral La(c)B8C4 cluster and anionic La(c)B8C4- showed 10 sigma and 9 pi/10 pi double aromaticity, while cationic La(c)B8C4+ displayed conflicting aromaticity with 10 sigma and 8 pi bonds.

RSC ADVANCES (2023)

添加到收藏夹

Article Computer Science, Information Systems

Estimating the number of clusters in a ranking data context

Wilson Calmon, Mariana Albi

Summary: This study introduces two methods for estimating the number of clusters in a population, based on the Plackett-Luce model, which perform well in a large-scale simulation study. The results show that these methods have higher accuracy and smaller errors compared to other established and recently proposed methodologies.

INFORMATION SCIENCES (2021)

添加到收藏夹

Article Multidisciplinary Sciences

Inferring transcriptomic cell states and transitions only from time series transcriptome data

Kyuri Jo, Inyoung Sung, Dohoon Lee, Hyuksoon Jang, Sun Kim

Summary: A novel time series clustering framework called TRACS is introduced in this study to infer transcriptomic cellular states solely from time series transcriptome data. By integrating Gaussian process regression, shape-based distance, and ranked pairs algorithm, TRACS can determine patterns corresponding to hidden cellular states through clustering gene expression data.

SCIENTIFIC REPORTS (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Automatic detection of outliers and the number of clusters in k-means clustering via Chebyshev-type inequalities

Peter Olukanmi, Fulufhelo Nelwamondo, Tshilidzi Marwala, Bhekisipho Twala

Summary: This paper addresses two key challenges of k-means clustering. In the first part, it provides estimates for the range of k required for clustering and improves the efficiency of the k-means algorithm through automation. In the second part, it incorporates automatic outlier detection into the k-means process, solving a previous problem regarding complete automation.

NEURAL COMPUTING & APPLICATIONS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Agreeing to Disagree: Choosing Among Eight Topic-Modeling Methods

Qiang Fu, Yufan Zhuang, Jiaxin Gu, Yushu Zhu, Xin Guo

Summary: Topic modeling is a key research area in natural language processing that has inspired innovative studies in various social-science disciplines. However, its application in computational social science is hindered by the lack of systematic comparison between different methods and the challenge of choosing the optimal number of topics. This study reviews and compares eight traditional, generative, and neural methods for topic modeling based on Canadian newspaper articles, using three measures to evaluate their performance and guide the selection of topic-modeling methods in social science research.

BIG DATA RESEARCH (2021)

添加到收藏夹

Article Chemistry, Physical

Improving the analysis of biological ensembles through extended similarity measures

Liwei Chang, Alberto Perez, Ramon Alain Miranda-Quintana

Summary: The paper introduces new algorithms for classifying structural ensembles of macromolecules using extended similarity measures which reduce computational complexity. The approach captures larger ensembles and transitions between states, developing efficient techniques and a novel clustering algorithm utilizing the extended similarity indices. The new metrics are applied to analyze biological systems' ensembles, showing excellent performance and faster processing, with an efficient cost-function for merging clusters.

PHYSICAL CHEMISTRY CHEMICAL PHYSICS (2021)

添加到收藏夹

Article Biochemical Research Methods

REBET: a method to determine the number of cell clusters based on batch effect removal

Zhao-Yu Fang, Cui-Xiang Lin, Yun-Pei Xu, Hong-Dong Li, Qing-Song Xu

Summary: This study introduced a new method, REBET, for determining the number of cell clusters in single-cell RNA-seq data by removing batch effects. The method showed improved accuracy and robustness compared to existing methods, as demonstrated in comparisons on simulated and published datasets.

BRIEFINGS IN BIOINFORMATICS (2021)

添加到收藏夹

Article Multidisciplinary Sciences

DNA databases of an important tropical timber tree species Shorealeprosula (Dipterocarpaceae) for forensic timber identification

Chin Hong Ng, Kevin Kit Siong Ng, Soon Leong Lee, Nurul-Farhanah Zakaria, Chai Ting Lee, Lee Hong Tnah

Summary: International timber trade communities are demanding sustainably sourced timber. This study developed chloroplast DNA and simple sequence repeat databases as tracking tools to trace the origin of timber.

SCIENTIFIC REPORTS (2022)

添加到收藏夹

Article Engineering, Environmental

Acid-Base Clusters during Atmospheric New Particle Formation in Urban Beijing

Rujing Yin, Chao Yan, Runlong Cai, Xiaoxiao Li, Jiewen Shen, Yiqun Lu, Siegfried Schobesberger, Yueyun Fu, Chenjuan Deng, Lin Wang, Yongchun Liu, Jun Zheng, Hongbin Xie, Federico Bianchi, Douglas R. Worsnop, Markku Kulmala, Jingkun Jiang

Summary: Molecular clustering is the initial step of atmospheric new particle formation, generating numerous secondary particles. Ion clusters were found to contain mixed clusters of sulfuric acid and amine, while only sulfuric acid clusters and sulfuric acid-amine clusters were observed in the neutral form. Oxygenated organic molecule clusters charged by nitrate and bisulfate ions were observed in ion clusters and were not correlated with the occurrence of sub-3 nm particles.

ENVIRONMENTAL SCIENCE & TECHNOLOGY (2021)

添加到收藏夹

Article Multidisciplinary Sciences

Genetic diversity and structure of Musa balbisiana populations in Vietnam and its implications for the conservation of banana crop wild relatives

Arne Mertens, Yves Bawin, Samuel Vanden Abeele, Simon Kallow, Dang Toan Vu, Loan Thi Le, Tuong Dang Vu, Rony Swennen, Filip Vandelook, Bart Panis, Steven B. Janssens

Summary: The study found relatively high genetic diversity in populations of Musa balbisiana in China, central Vietnam, and northern Vietnam. Populations in northern Vietnam formed a distinct genetic cluster, possibly due to geographical features such as mountain ranges and river systems. Populations in central Vietnam and on the western side of the Hoang Lien Son mountain range in northern Vietnam are considered native and should be prioritized for conservation.

PLOS ONE (2021)

添加到收藏夹

Article Automation & Control Systems

Machine learning algorithm for cluster analysis of mixed dataset based on instance-cluster closeness metric

K. Balaji, K. Lavanya

Summary: The study proposes an intelligent method for clustering categorical and numerical datasets, addressing issues in existing clustering algorithms and achieving good results.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2021)

添加到收藏夹

Article Multidisciplinary Sciences

Exploring the legacy of Central European historical winter wheat landraces

Andras Cseh, Peter Poczai, Tibor Kiss, Krisztina Balla, Zita Berki, Adam Horvath, Csaba Kuti, Ildiko Karsai

Summary: Historical wheat landraces are valuable resources for genetic diversity, with distinct differences in polymorphisms compared to modern wheat cultivars. This study identified novel rare alleles and polymorphic markers related to important agronomic traits in a Central European bread wheat landrace collection. The research also revealed uneven distribution of polymorphisms along the wheat genome.

SCIENTIFIC REPORTS (2021)

添加到收藏夹

Editorial Material Medicine, General & Internal

Choosing the Right Path toward Polio Eradication

Konstantin Chumakov, Christian Brechot, Robert C. Gallo, Stanley Plotkin

Summary: Choosing the right path towards polio eradication requires reevaluation of the current epidemic strategy, with long-term immunization policies that protect vaccinees and minimize the silent circulation of polioviruses.

NEW ENGLAND JOURNAL OF MEDICINE (2023)

添加到收藏夹

Letter Medicine, General & Internal

Choosing the Right Path toward Polio Eradication

T. Jacob John, Norbert Hirschhorn, Dhanya Dharmapalan, Konstantin Chumakov, Stanley Plotkin

Summary: The authors agree with Chumakov et al.'s dissatisfaction with the use of oral polio vaccine (OPV) and propose using inactivated polio vaccine (IPV) to eradicate polio. They provide examples of countries that have successfully eliminated polio using this method.

NEW ENGLAND JOURNAL OF MEDICINE (2023)

添加到收藏夹

Article Mathematics, Interdisciplinary Applications

Core Clustering as a Tool for Tackling Noise in Cluster Labels

Renato Cordeiro de Amorim, Vladimir Makarenkov, Boris Mirkin

JOURNAL OF CLASSIFICATION (2020)

添加到收藏夹

Article Mathematics, Interdisciplinary Applications

Distance and Consensus for Preference Relations Corresponding to Ordered Partitions

Boris Mirkin, Trevor I. Fenner

JOURNAL OF CLASSIFICATION (2019)

添加到收藏夹

Article Computer Science, Information Systems

Parsimonious generalization of fuzzy thematic sets in taxonomies applied to the analysis of tendencies of research in data science

Dmitry Frolov, Susana Nascimento, Trevor Fenner, Boris Mirkin

INFORMATION SCIENCES (2020)

添加到收藏夹

Article Multidisciplinary Sciences

Least-squares community extraction in feature-rich networks using similarity data

Soroosh Shalileh, Boris Mirkin

Summary: In this study, a doubly-greedy approach was used for community detection in feature-rich networks. By converting feature-space data into similarity matrix format, four different algorithms were developed for automatic determination of the number of communities. Experimental results on real-world and synthetic datasets demonstrated the effectiveness and competitiveness of these algorithms.

PLOS ONE (2021)

添加到收藏夹

Article Computer Science, Information Systems

Summable and nonsummable data-driven models for community detection in feature-rich networks

Soroosh Shalileh, Boris Mirkin

Summary: The study introduces a data-driven model for partitioning nodes in a network to approximate network link data and feature data, involving summary quantitative characteristics of both. The experiments show that the nonsummability version has its own niche and is faster than the other version.

SOCIAL NETWORK ANALYSIS AND MINING (2021)

添加到收藏夹

Article Physics, Multidisciplinary

Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method

Soroosh Shalileh, Boris Mirkin

Summary: This paper proposes a meaningful and effective extension of the K-means algorithm to detect communities in feature-rich networks. The method uses least-squares approximation to the inter-node links and feature values, resulting in a straightforward extension of the conventional K-means clustering method. The metric used is a weighted sum of squared Euclidean distances in both the feature and network spaces.

ENTROPY (2022)

添加到收藏夹

Article Mathematics, Interdisciplinary Applications

Community Detection in Feature-Rich Networks Using Data Recovery Approach

Boris Mirkin, Soroosh Shalileh

Summary: The problem of community detection in a network with node features is addressed by applying the data recovery approach. The proposed method combines the least-squares recovery criteria for both the graph structure and node features, resulting in a new clustering criterion and algorithm. Experimental results on real-world and synthetic data demonstrate the effectiveness of the proposed method.

JOURNAL OF CLASSIFICATION (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Novel Cluster Modeling for the Spatiotemporal Analysis of Coastal Upwelling

Susana Nascimento, Alexandre Martins, Paulo Relvas, Joaquim F. Luis, Boris Mirkin

Summary: This work proposes a spatiotemporal clustering approach using Core-Shell clustering algorithm to model coastal upwelling from satellite SST grid maps, enabling automated derivation of key parameters. Experiments show that the core-shell clustering accurately recognizes upwelling regions and presents consistent regularities across different upwelling seasons.

PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2022 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Interdisciplinary Applications

An Extension of K-Means for Least-Squares Community Detection in Feature-Rich Networks

Soroosh Shalileh, Boris Mirkin

Summary: We propose an extension of the K-means algorithm for community detection in feature-rich networks. By replacing the squared Euclidean distance with cosine distance, we effectively tackle the curse of dimensionality. Our experimental results show that the cosine distance-based version performs the best, especially on larger datasets.

COMPLEX NETWORKS & THEIR APPLICATIONS X, VOL 1 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

A Data Recovery Method for Community Detection in Feature-Rich Networks

Soroosh Shalileh, Boris Mirkin

2020 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM) (2020)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Ordinal Equivalence Classes for Parallel Coordinates

Alexey Myachin, Boris Mirkin

INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I (2019)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Computational Generalization in Taxonomies Applied to: (1) Analyze Tendencies of Research and (2) Extend User Audiences

Dmitry Frolov, Susana Nascimento, Trevor Fenner, Zina Taran, Boris Mirkin

INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2019), PT II (2019)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Deriving Corporate Social Responsibility Patterns in the MSCI Data

Zina Taran, Boris Mirkin

BUSINESS INFORMATION SYSTEMS, PT I (2019)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Method for Generalization of Fuzzy Sets

Dmitry Frolov, Boris Mirkin, Susana Nascimento, Trevor Fenner

ARTIFICIAL INTELLIGENCEAND SOFT COMPUTING, PT I (2019)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Braverman's Spectrum and Matrix Diagonalization Versus iK-Means: A Unified Framework for Clustering

Boris Mirkin

BRAVERMAN READINGS IN MACHINE LEARNING: KEY IDEAS FROM INCEPTION TO CURRENT STATE (2018)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.