4.5 Article

Enhancing density-based clustering: Parameter reduction and outlier detection

期刊

INFORMATION SYSTEMS
卷 38, 期 3, 页码 317-330

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.is.2012.09.001

关键词

Density based clustering; Outlier detection; Spatial dataset; Stratification; High dimensional embedding

向作者/读者索取更多资源

Clustering is a widely used unsupervised data mining technique. It allows to identify structures in collections of objects by grouping them into classes, named clusters, in such a way that similarity of objects within any cluster is maximized and similarity of objects belonging to different clusters is minimized. In density-based clustering, a cluster is defined as a connected dense component and grows in the direction driven by the density. The basic structure of density-based clustering presents some common drawbacks: (i) parameters have to be set; (ii) the behavior of the algorithm is sensitive to the density of the starting object; and (iii) adjacent clusters of different densities could not be properly identified. In this paper, we address all the above problems. Our method, based on the concept of space stratification, efficiently identifies the different densities in the dataset and, accordingly, ranks the objects of the original space. Next, it exploits such a knowledge by projecting the original data into a space with one more dimension. It performs a density based clustering taking into account the reverse-nearest-neighbor of the objects. Our method also reduces the number of input parameters by giving a guideline to set them in a suitable way. Experimental results indicate that our algorithm is able to deal with clusters of different densities and outperforms the most popular algorithms DBSCAN and OPTICS in all the standard benchmark datasets. (C) 2012 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Environmental Sciences

TUBE Project: Transport-Derived Ultrafines and the Brain Effects

Maria-Viola Martikainen, Paeivi Aakko-Saksa, Lenie van den Broek, Flemming R. Cassee, Roxana O. Carare, Sweelin Chew, Andras Dinnyes, Rosalba Giugno, Katja M. Kanninen, Tarja Malm, Ala Muala, Maiken Nedergaard, Anna Oudin, Pedro Oyola, Tobias V. Pfeiffer, Topi Ronkko, Sanna Saarikoski, Thomas Sandstrom, Roel P. F. Schins, Jan Topinka, Mo Yang, Xiaowen Zeng, Remco H. S. Westerink, Pasi I. Jalava

Summary: The adverse effects of air pollutants on the respiratory and cardiovascular systems are well-known, but recent studies have found that they also have negative effects on the neurological system and cognitive function. Ultrafine particles (UFPs) play a key role in these effects, but there is still limited understanding about the smallest fraction and semivolatile compounds. The TUBE project aims to increase knowledge about harmful UFPs and semivolatile compounds, provide information for better emission legislation, and assess the impact of air pollution on the brain and its removal.

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2022)

Article Biochemical Research Methods

GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs

Manuel Tognon, Vincenzo Bonnici, Erik Garrison, Rosalba Giugno, Luca Pinello

Summary: GRAFIMO is a command-line tool for scanning known TF DNA motifs in VGs, extending the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG, recovering additional potential binding sites than scanning only the reference genome.

PLOS COMPUTATIONAL BIOLOGY (2021)

Article Cell Biology

Single-Cell RNA-Seq Analysis of Olfactory Mucosal Cells of Alzheimer's Disease Patients

Riikka Lampinen, Mohammad Feroze Fazaludeen, Simone Avesani, Tiit Ord, Elina Penttila, Juha-Matti Lehtola, Toni Saari, Sanna Hannonen, Liudmila Saveleva, Emma Kaartinen, Francisco Fernandez Acosta, Marcela Cruz-Haces, Heikki Lopponen, Alan Mackay-Sim, Minna U. Kaikkonen, Anne M. Koivisto, Tarja Malm, Anthony R. White, Rosalba Giugno, Sweelin Chew, Katja M. Kanninen

Summary: This study evaluated the differences in olfactory mucosa between cognitively healthy individuals and Alzheimer's disease patients. The findings showed increased secretion of amyloid-beta in Alzheimer's disease olfactory mucosal cells and identified 240 differentially expressed disease-associated genes and five distinct cell populations. The study also revealed alterations in RNA and protein metabolism, inflammatory processes, and signal transduction in multiple cell populations, suggesting their involvement in Alzheimer's disease-related olfactory mucosa pathophysiology. Additionally, the study proposed alterations in gene expression of mitochondrially located genes in AD OM cells, which were verified by functional assays, demonstrating altered mitochondrial respiration and a reduction of ATP production. The results highlight the changes in olfactory mucosal cells in Alzheimer's disease and demonstrate the significance of single-cell RNA sequencing data in investigating the molecular and cellular mechanisms associated with the disease.
Article Biochemical Research Methods

PANPROVA: PANgenomic PROkaryotic eVolution of full Assemblies

Vincenzo Bonnici, Rosalba Giugno

Summary: PANPROVA is a benchmark tool that simulates prokaryotic pangenomic evolution by evolving the complete genomic sequence of an ancestral isolate. It enables operation in the pre-assembly phase and includes evolutionary features such as gene set variations, sequence variations, and horizontal acquisition from a pool of external genomes.

BIOINFORMATICS (2022)

Article Biochemistry & Molecular Biology

Biometal Dyshomeostasis in Olfactory Mucosa of Alzheimer's Disease Patients

Riikka Lampinen, Veronika Gorova, Simone Avesani, Jeffrey R. Liddell, Elina Penttila, Tana Zavodna, Zdenek Krejcik, Juha-Matti Lehtola, Toni Saari, Juho Kalapudas, Sanna Hannonen, Heikki Lopponen, Jan Topinka, Anne M. Koivisto, Anthony R. White, Rosalba Giugno, Katja M. Kanninen

Summary: The biometal homeostasis in the olfactory mucosa cells of Alzheimer's disease (AD) patients is disturbed and correlated with the alterations in the brain. This provides new clues for the early diagnosis and treatment of AD.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

Article Oncology

Combined Large Cell Neuroendocrine Carcinomas of the Lung: Integrative Molecular Analysis Identifies Subtypes with Potential Therapeutic Implications

Michele Simbolo, Giovanni Centonze, Luca Giudice, Federica Grillo, Patrick Maisonneuve, Anastasios Gkountakos, Chiara Ciaparrone, Laura Cattaneo, Giovanna Sabella, Rosalba Giugno, Paola Bossi, Paola Spaggiari, Alessandro Del Gobbo, Stefano Ferrero, Luca Mastracci, Alessandra Fabbri, Martina Filugelli, Giovanna Garzone, Natalie Prinzi, Sara Pusceddu, Adele Testi, Valentina Monti, Luigi Rolli, Alessandro Mangogna, Luisa Bercich, Mauro Roberto Benvenuti, Emilio Bria, Sara Pilotto, Alfredo Berruti, Ugo Pastorino, Carlo Capella, Maurizio Infante, Michele Milella, Aldo Scarpa, Massimo Milione

Summary: This study provides an integrated molecular analysis of 44 combined large cell neuroendocrine carcinomas (CoLCNECs), revealing that CoLCNECs are an independent histologic category with specific genomic and transcriptomic features that are different from other lung cancers. The findings of this study contribute to a better understanding of these rare tumors and may lead to the development of new diagnostic approaches for personalized treatments in CoLCNECs.

CANCERS (2022)

Article Biochemistry & Molecular Biology

Esearch3D: propagating gene expression in chromatin networks to illuminate active enhancers

Maninder Heer, Luca Giudice, Claudia Mengoni, Rosalba Giugno, Daniel Rico

Summary: Researchers have developed a new method called Esearch3D to identify active enhancers using network theory approaches. This method calculates the likelihood of enhancer activity in intergenic regions by analyzing the folding of chromatin in the three-dimensional nuclear space, and regions predicted to have high enhancer activity are shown to be enriched in annotations indicative of enhancer activity.

NUCLEIC ACIDS RESEARCH (2023)

Article Multidisciplinary Sciences

Application of the PHENotype SIMulator for rapid identification of potential candidates in effective COVID-19 drug repurposing

Naomi I. Maria, Rosaria Valentina Rapicavoli, Salvatore Alaimo, Evelyne Bischof, Alessia Stasuzzo, Jantine A. C. Broek, Alfredo Pulvirenti, Bud Mishra, Ashley J. Duits, Alfredo Ferro, RxCOVEA Framework

Summary: The current pandemic has created an urgent need for identifying potential drugs for COVID-19. However, our understanding of the host-immune response to SARS-CoV-2 is limited, and there are only a few approved drugs available. To address this, a systems biology tool called PHENotype SIMulator has been introduced. This tool uses transcriptomic and proteomic databases to simulate SARS-CoV-2 infection in host cells, allowing for the identification of viral effects on host-immune response with high sensitivity and specificity (>96%).

HELIYON (2023)

Article Mathematical & Computational Biology

APDB: a database on air pollutant characterization and similarity prediction

Eva Viesi, Davide Stefano Sardina, Ugo Perricone, Rosalba Giugno

Summary: The World Health Organization estimates that 9 out of 10 people worldwide breathe air containing high levels of pollutants, which can have detrimental effects on vital organs. In order to investigate the link between pollutant exposure and human health effects, the development of an online resource collecting and characterizing pollutant molecules could be beneficial. The APDB database was created to collect air-pollutant-related data from various online resources, including molecules, targets, bioassays, and computed properties. The database provides a web interface for browsing, querying, and visualizing the data.

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION (2023)

Article Computer Science, Artificial Intelligence

MultiplexSAGE: A Multiplex Embedding Algorithm for Inter-Layer Link Prediction

Luca Gallo, Vito Latora, Alfredo Pulvirenti

Summary: Research on graph representation learning has been highly focused on single-layer graphs, and there is limited research on representation learning of multilayer structures without known inter-layer links. This study proposes MultiplexSAGE, a generalized algorithm capable of embedding multiplex networks and reconstructing intra-layer and inter-layer connectivity. Experimental analysis reveals that the quality of embedding is strongly influenced by the density and randomness of the graph's links in both simple and multiplex networks.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Biology

Stardust: improving spatial transcriptomics data analysis through space-aware modularity optimization-based clustering

Simone Avesani, Eva Viesi, Luca Alessandri, Giovanni Motterle, Vincenzo Bonnici, Marco Beccuti, Raffaele Calogero, Rosalba Giugno

Summary: In this study, a new clustering method called Stardust was proposed, which can easily utilize spatial and transcriptomic information to improve clustering analysis. By analyzing ST datasets, the method showed excellent performance in clustering.

GIGASCIENCE (2022)

Article Psychiatry

Psychopathological outcomes and defence mechanisms in clinically healed adults with a paediatric cancer history: an exploratory study

Antonino Petralia, Emanuele Bisso, Ilaria Concas, Antonino Maglitto, Nunzio Bucolo, Salvatore Alaimo, Andrea Di Cataldo, Maria Salvina Signorelli, Alfredo Pulvirenti, Eugenio Aguglia

Summary: This study investigated the relationship between defence styles and predisposition to psychiatric diseases in adults with a history of paediatric cancer, finding that survivors exhibited lower scores in certain defence styles and lower psychopathological indices compared to healthy controls. The results of mediation analysis indicated that specific defence styles had mediation effects on certain psychopathological outcomes, suggesting an indirect relationship between oncological pathology and psychopathology mediated by defence styles such as TAS and TAO. However, other defence styles did not show significant mediation effects on psychopathological outcomes.

GENERAL PSYCHIATRY (2021)

Article Biochemistry & Molecular Biology

Exon-Intron Differential Analysis Reveals the Role of Competing Endogenous RNAs in Post-Transcriptional Regulation of Translation

Nicolas Munz, Luciano Cascione, Luca Parmigiani, Chiara Tarantelli, Andrea Rinaldi, Natasa Cmiljanovic, Vladimir Cmiljanovic, Rosalba Giugno, Francesco Bertoni, Sara Napoli

Summary: Under stressful conditions, cells activate a rescue program modulated by mTOR and rely on microRNAs and lncRNAs for translation regulation. Upregulation of lncRNA lncTNK2-2:1 may be associated with the stabilization of translation and DNA damage regulation in response to treatment with bimiralisib.

NON-CODING RNA (2021)

Article Geochemistry & Geophysics

Seismic evidence of the COVID-19 lockdown measures: a case study from eastern Sicily (Italy)

Andrea Cannata, Flavio Cannavo, Giuseppe Di Grazia, Marco Aliotta, Carmelo Cassisi, Raphael S. M. De Plaen, Stefano Gresta, Thomas Lecocq, Placido Montalto, Mariangela Sciotto

Summary: During the COVID-19 pandemic, countries implemented social interventions to restrict human mobility, leading to a decrease in anthropogenic seismic noise. Research found similarities in temporal patterns between the decrease in seismic noise and human mobility.

SOLID EARTH (2021)

Proceedings Paper Biochemical Research Methods

Centrality Speeds the Subgraph Isomorphism Search Up in Target Aware Contexts

Vincenzo Bonnici, Simone Caligola, Antonino Aparo, Rosalba Giugno

COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, CIBB 2018 (2020)

Article Computer Science, Information Systems

Measuring rule-based LTLf process specifications: A probabilistic data-driven approach

Alessio Cecconi, Luca Barbaro, Claudio Di Ciccio, Arik Senderovich

Summary: This paper introduces a framework for designing probabilistic measures for declarative process specifications, which can assess the degree of compliance between process data and specifications. Through experiments, the applicability of the approach for various process mining tasks is demonstrated.

INFORMATION SYSTEMS (2024)

Article Computer Science, Information Systems

A Value Co-Creation Perspective on Data Labeling in Hybrid Intelligence Systems: A Design Study

Mahei Manhai Li, Philipp Reinhard, Christoph Peters, Sarah Oeste-Reiss, Jan Marco Leimeister

Summary: This article introduces a novel human-in-the-loop (HIL) design for ITSM support ticket recommendations by incorporating a value co-creation perspective. The design incentivizes ITSM agents to provide labels during their everyday ticket-handling procedures, and the evaluation shows that recommendations after label improvement have increased user ratings.

INFORMATION SYSTEMS (2024)

Article Computer Science, Information Systems

A survey of approaches for event sequence analysis and visualization

Anton Yeshchenko, Jan Mendling

Summary: This paper presents the development of event sequence data analysis techniques in different fields and proposes an integrated framework to facilitate collaboration and research synergy across various domains.

INFORMATION SYSTEMS (2024)

Article Computer Science, Information Systems

Adoption of IT solutions: A data-driven analysis approach

Iris Reinhartz-Berger, Alan Hartman, Doron Kliger

Summary: Many IT departments provide solutions that partially meet the needs of business units. This research aims to develop a data-driven analysis method to support the selection of solutions with higher prospects of adoption and identify design gaps and barriers.

INFORMATION SYSTEMS (2024)

Article Computer Science, Information Systems

Discovery, simulation, and optimization of business processes with differentiated resources

Orlenys Lopez-Pintado, Marlon Dumas, Jonas Berx

Summary: Business process simulation is a versatile technique that predicts the impact of changes on process performance. However, previous approaches have limitations due to their treatment of resources as undifferentiated entities. This article addresses this issue by proposing a new simulation approach that treats each resource as an individual entity with its own performance and availability. The article also presents methods for discovering simulation models with differentiated resources and optimizing resource availability calendars. Empirical evaluation demonstrates that differentiated resource models better replicate cycle time distributions and work rhythm, and iterative optimization of resource allocations and calendars leads to improved cost-time tradeoffs.

INFORMATION SYSTEMS (2024)