Article
Computer Science, Artificial Intelligence
Carlo Baldassi
Summary: We introduce an evolutionary algorithm called recombinator-k-means for optimizing the highly nonconvex kmeans problem. Its defining feature is that its crossover step involves all the members of the current generation, stochastically recombining them with a repurposed variant of the k-means++ seeding algorithm. The recombination also uses a reweighting mechanism that realizes a progressively sharper stochastic selection policy and ensures that the population eventually coalesces into a single solution. We compare this scheme with a state-of-the-art alternative, a more standard genetic algorithm with deterministic pairwise-nearest-neighbor crossover and an elitist selection policy, of which we also provide an augmented and efficient implementation. Extensive tests on large and challenging datasets (both synthetic and real word) show that for fixed population sizes recombinator-k-means is generally superior in terms of the optimization objective, at the cost of a more expensive crossover step. When adjusting the population sizes of the two algorithms to match their running times, we find that for short times the (augmented) pairwise-nearest-neighbor method is always superior, while at longer times recombinator-k-means will match it and, on the most difficult examples, take over. We conclude that the reweighted whole-population recombination is more costly but generally better at escaping local minima Moreover, it is algorithmically simpler and more general (it could be applied even to k-medians or k-medoids, for example).
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
(2022)
Article
Computer Science, Artificial Intelligence
Yi-Cheng Chen, Yen-Liang Chen, Jyun-Yun Lu
Summary: K-Means algorithm is one of the most famous and popular clustering algorithms in the world, known for its simple structure, easy implementation, high efficiency, and fast convergence speed. This article introduces an improvement to past variants of K-Means used in evolutionary clustering, considering both past and future clustering results, and extending K-Means to multiple cycles, resulting in more consistent, stable, and smooth clustering results.
EXPERT SYSTEMS WITH APPLICATIONS
(2021)
Article
Automation & Control Systems
Uri Stemmer
Summary: This research presents a new algorithm operating in the local model of differential privacy for solving the Euclidean k-means problem, significantly reducing additive error while maintaining multiplicative error. The study shows that the obtained additive error in handling the k-means objective is almost optimal in terms of its dependency on the database size.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Article
Computer Science, Interdisciplinary Applications
Ahmed Fahim
Summary: The k-means method divides N objects into k clusters based on mean values, with linear time complexity and dependence on knowing the number of clusters and initial centers. This research introduces a method able to detect near-optimal values for k and initial centers without prior knowledge, resulting in improved final result quality. The proposed method combines DBSCAN and k-means to converge to global minima and has a time complexity of o(n log n).
JOURNAL OF COMPUTATIONAL SCIENCE
(2021)
Article
Computer Science, Information Systems
Jing Liu, Fuyuan Cao, Jiye Liang
Summary: In this paper, a centroids-guided deep multi-view k-means clustering method is proposed, which incorporates deep representation learning into the multi-view k-means objective. The method produces more k-means-friendly representations by reducing the loss between each representation and its assigned cluster centroid.
INFORMATION SCIENCES
(2022)
Article
Computer Science, Artificial Intelligence
Hongfu Liu, Junxiang Chen, Jennifer Dy, Yun Fu
Summary: K-means is a widely used clustering algorithm known for its simplicity and efficiency. This review paper focuses on generalizing K-means to solve challenging and complex problems. It unifies the available approaches in terms of data representation, distance measure, label assignment, and centroid updating. Concrete applications of modified K-means formulations are reviewed, including iterative subspace projection and clustering, consensus clustering, constrained clustering, domain adaptation, and outlier detection.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Avgoustinos Vouros, Stephen Langdell, Mike Croucher, Eleni Vasilaki
Summary: K-Means is a widely used algorithm for data clustering, but it has limitations such as only finding local minima and being sensitive to initial centroid positions. Various K-Means variations and initialization techniques have been proposed, with more sophisticated techniques reducing the need for complex clustering methods. Deterministic methods generally outperform stochastic methods, but there is a trade-off where simpler stochastic methods run multiple times can result in better clustering.
Article
Computer Science, Artificial Intelligence
Luc Giffon, Valentin Emiya, Hachem Kadri, Liva Ralaivola
Summary: K-means algorithm and Lloyd's algorithm have expanded beyond their original clustering purposes to play pivotal roles in various machine learning and data analysis techniques. QuicK-means is an efficient extension of K-means that reduces computational complexity through sparse matrix products, demonstrating benefits through experimental results.
Article
Computer Science, Artificial Intelligence
Peter Olukanmi, Fulufhelo Nelwamondo, Tshilidzi Marwala
Summary: A key drawback of k-means algorithm is its susceptibility to local minima. The authors propose a technique for comparing initializations directly and selecting the best one based on the maximum minimum inter-center distance. The experiments and mathematical analysis show significant efficiency gains and improved accuracy compared to repeated k-means.
NEURAL COMPUTING & APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Marco Capo, Aritz Perez, Jose A. Antonio
Summary: The K-means algorithm is a popular clustering method, but its performance depends heavily on the initialization phase. Researchers have developed various initialization techniques to address this issue. This article introduces a cost-effective Split-Merge step that can restart the K-means algorithm after reaching a fixed point, reducing error and computing fewer distances.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2022)
Article
Computer Science, Artificial Intelligence
Miaomiao Li, Yi Zhang, Suyuan Liu, Zhe Liu, Xinzhong Zhu
Summary: Multiple kernel clustering (MKC) aims to determine the optimal kernel from several pre-computed basic kernels. A new algorithm called simple multiple kernel k-means with kernel weight regularization (SMKKM-KWR) is proposed to overcome the issue of sparse or over-selected kernel weight coefficients. Experimental results show that SMKKM-KWR achieves effective and efficient clustering performance.
INFORMATION FUSION
(2023)
Article
Computer Science, Information Systems
Simon Harris, Renato Cordeiro De Amorim
Summary: This paper compares the performance of 17 different algorithms on 6,000 synthetic and 28 real-world data sets to investigate the sensitivity of k-means to its initial centroids. The results show that different algorithms may excel in different clustering scenarios, providing valuable insights for those considering k-means for complex clustering tasks.
Article
Computer Science, Interdisciplinary Applications
Rasim M. Alguliyev, Ramiz M. Aliguliyev, Lyudmila Sukhostat
Summary: This article introduces a new parallel batch clustering algorithm based on the k-means algorithm, which reduces computation complexity by splitting the dataset into multiple partitions and proposes a method to determine the optimal batch size. Experimental results show the practical applicability of this method for handling Big Data.
COMPUTERS & INDUSTRIAL ENGINEERING
(2021)
Article
Computer Science, Artificial Intelligence
Chenhui Gao, Wenzhi Chen, Feiping Nie, Weizhong Yu, Feihu Yan
Summary: In this paper, we propose two algorithms, FDKM and IFDKM, for clustering high-dimensional data in a low-dimensional subspace. These algorithms have higher efficiency and lower time complexity compared to traditional methods, and their superior performance is demonstrated in multiple experiments.
KNOWLEDGE-BASED SYSTEMS
(2022)
Review
Computer Science, Information Systems
Abiodun M. Ikotun, Absalom E. Ezugwu, Laith Abualigah, Belal Abuhaija, Jia Heming
Summary: Advances in data collection techniques have enabled the accumulation of large quantities of data. The K-means algorithm, while popular, has challenges such as determining the number of clusters and detecting non-Euclidean shapes. Research efforts have been made to improve its performance and robustness.
INFORMATION SCIENCES
(2023)
Article
Ophthalmology
Sumit Randhir Singh, Mohammed Abdul Rasheed, Nishad Parveen, Abhilash Goud, Samatha Ankireddy, Niroj Kumar Sahoo, Kiran Kumar Vupparaboina, Soumya Jana, Jay Chhablani
Summary: The study reported a significant variation in en-face choroidal vascularity index at different levels of choroidal vessels in healthy eyes. Additionally, the choroidal vascularity index was found to increase with subfoveal choroidal thickness. Age, refraction, and gender did not show significant correlation with en-face choroidal vascularity index.
EUROPEAN JOURNAL OF OPHTHALMOLOGY
(2021)
Article
Engineering, Manufacturing
Nagajyothi Virivinti, Budhaditya Hazra, Kishalay Mitra
Summary: The study highlights the importance of considering parameter dependency and correlations when dealing with uncertain parameters in optimization problems. By using joint Chance-Constrained Programming with parameter dependency information, misleading results can be avoided and a more accurate analysis can be conducted.
MATERIALS AND MANUFACTURING PROCESSES
(2021)
Article
Engineering, Chemical
Surbhi Sharma, Priyanka Devi Pantula, Srinivas Soumitri Miriyala, Kishalay Mitra
Summary: This study focuses on multi-objective optimization of an integrated grinding circuit considering various sources of uncertainties, using Chance constrained programming. A novel Data based Intelligent Sampling strategies for CCP has been proposed, combining machine learning techniques with a Fuzzy C-means algorithm to address sparse uncertain parameter space. The proposed technique demonstrates significant improvements over conventional sampling techniques in optimizing conflicting objectives.
Article
Green & Sustainable Science & Technology
Kapil Gumte, Priyanka Devi Pantula, Srinivas Soumitri Miriyala, Kishalay Mitra
Summary: This study proposes a nationwide supply chain network design based on bio-energy to address the dual crisis of fossil fuels. Through mathematical modeling and sensitivity analysis, the research indicates the need for sufficient biomass feed supply to operate the biofuel supply chain sector.
JOURNAL OF CLEANER PRODUCTION
(2021)
Article
Engineering, Environmental
Ravi Kiran Inapakurthi, Srinivas Soumitri Miriyala, Kishalay Mitra
Summary: The study proposes a method utilizing neural networks to capture the dynamic trends of environmental parameters, achieving a modeling accuracy of 98.97% through balancing accuracy and complexity with an evolutionary algorithm.
CHEMICAL ENGINEERING JOURNAL
(2021)
Article
Engineering, Chemical
Kapil M. Gumte, Priyanka Devi Pantula, Srinivas Soumitri Miriyala, Kishalay Mitra
Summary: This paper proposes a methodology that combines machine learning and data analytics with RO to address uncertainty in supply chain planning. By exploring uncertain spaces and implementing accurate sampling techniques, the proposed method effectively tackles uncertainty issues in supply chain models.
CHEMICAL ENGINEERING SCIENCE
(2021)
Article
Engineering, Manufacturing
Ravi Kiran Inapakurthi, Kishalay Mitra
Summary: Researchers propose a data-driven modeling method using Support Vector Regression (SVR) for transient state modeling of industrial grinding processes. By optimizing the hyper-parameter combination and comparing with traditional methods, the results indicate the superiority of this approach in terms of accuracy and effectiveness.
MATERIALS AND MANUFACTURING PROCESSES
(2022)
Proceedings Paper
Computer Science, Theory & Methods
Priyanka D. Pantula, Srinivas S. Miriyala, Kishalay Mitra
Summary: The authors propose a recurrent neural network (RNN) based clustering algorithm optimization in the context of deep unsupervised learning, which extracts features from dynamic data and efficiently clusters them using an evolutionary clustering algorithm, with results showing an accuracy of 98-100%.
2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC)
(2021)
Proceedings Paper
Computer Science, Theory & Methods
Keerthi NagaSree Pujari, Vivek Srivastava, Srinivas Soumitri Miriyala, Kishalay Mitra
Summary: The control settings of turbines play a crucial role in increasing energy production in a wind farm, with Reinforcement Learning emerging as a promising method for optimization. This study utilizes yaw misalignment to enhance power production and compares the efficiency of various methods for optimization.
2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC)
(2021)
Proceedings Paper
Computer Science, Theory & Methods
Surbhi Sharma, Keerthi Nagasree Pujari, Srinivas Soumitri Miriyala, Lopamudra Giri, Kishalay Mitra
Summary: This study proposes a framework integrating systems biology and artificial intelligence for controlling and optimizing protein/vaccine production in a Baculovirus expression system. Experimental research and mathematical modeling are used to generate large-scale data for building an AI-based RNN model to handle numerical stability issues during optimal control of the biological system. This work serves as a proof of concept for applying experimental studies, mathematical modeling, and AI techniques to optimize protein production in a recombinant expression system in an industrial setting.
2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC)
(2021)
Proceedings Paper
Computer Science, Theory & Methods
Ravi Kiran Inapakurthi, Kishalay Mitra
Summary: The paper proposes an algorithm to estimate the hyper-parameters of SVR for efficient model generation in the continuous casting process, considering RMSE and sample size as conflicting objectives. Different kernel parameters are used for different inputs during model development, and multiple kernels are explored to understand the unknown nature of the process. Simulation results demonstrate the effectiveness of the proposed algorithm in developing temperature and bulging models for optimizing the casting process.
2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC)
(2021)
Proceedings Paper
Computer Science, Theory & Methods
Kalpathy Jayanth Krishnan, Kishalay Mitra
Summary: The increasing usage of sensors has led to a rise in time series data, which can be clustered using Kohonen Maps with DTW distance measure. This method outperforms traditional Hierarchical clustering in terms of both cluster quality and speed.
2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC)
(2021)
Proceedings Paper
Computer Science, Theory & Methods
Arun Ramamurthy, Priyanka Pantula, Mangesh Gharote, Kishalay Mitra, Sachin Lodha
Summary: This paper presents a multi-objective optimization study for scientific workflow in a cloud environment, aiming to minimize execution time and purchasing cost simultaneously while satisfying customer demand requirements; uncertainties are handled using Chance Constrained Programming and the model is solved using Non-dominated Sorting Genetic Algorithm - II. The study shows that solutions obtained considering uncertainties vary from the deterministic case.
CLOSER: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE
(2021)
Proceedings Paper
Engineering, Biomedical
Surbhi Sharma, Abha Saxena, Soumita Chel, Kishalay Mitra, Lopamudra Giri
Summary: The study proposes a computational framework to study infection dynamics of SARS-CoV-2 without conducting animal experiments, utilizing a system of non-linear ODEs model considering the roles of T cells and Macrophages. This method contributes to testing drug efficacy with various mechanisms and analyzing the impact of drug administration timing on virus clearance.
2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC)
(2021)
Article
Biochemistry & Molecular Biology
Dinesh Kankanamge, Sithurandi Ubeysinghe, Mithila Tennakoon, Priyanka Devi Pantula, Kishalay Mitra, Lopamudra Giri, Ajith Karunarathne
Summary: Phospholipase C beta (PLC beta) is activated by the Gq family of heterotrimeric G proteins and plays crucial roles in cellular processes and disease. Signaling crosstalk between Gq and Gi/o pathways influences PIP2 metabolism, with the dissociation of G beta gamma leading to partial recovery of PIP2 levels.
JOURNAL OF BIOLOGICAL CHEMISTRY
(2021)