4.5 Article

Convex non-negative matrix factorization for massive datasets

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 29, 期 2, 页码 457-478

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-010-0352-6

关键词

Matrix factorization; Low-rank approximation; Data mining; Information retrieval; Large-scale data analysis

资金

  1. Fraunhofer ATTRACT Fellowship STREAM

向作者/读者索取更多资源

Non-negative matrix factorization (NMF) has become a standard tool in data mining, information retrieval, and signal processing. It is used to factorize a non-negative data matrix into two non-negative matrix factors that contain basis elements and linear coefficients, respectively. Often, the columns of the first resulting factor are interpreted as cluster centroids of the input data, and the columns of the second factor are understood to contain cluster membership indicators. When analyzing data such as collections of gene expressions, documents, or images, it is often beneficial to ensure that the resulting cluster centroids are meaningful, for instance, by restricting them to be convex combinations of data points. However, known approaches to convex-NMF suffer from high computational costs and therefore hardly apply to large-scale data analysis problems. This paper presents a new framework for convex-NMF that allows for an efficient factorization of data matrices of millions of data points. Triggered by the simple observation that each data point can be expressed as a convex combination of vertices of the data convex hull, we require the basic factors to be vertices of the data convex hull. The benefits of convex-hull NMF are twofold. First, for a growing number of data points the expected size of the convex hull, i.e. the number of its vertices, grows much slower than the dataset. Second, distance preserving low-dimensional embeddings allow us to efficiently sample the convex hull and hence to quickly determine candidate vertices. Our extensive experimental evaluation on large datasets shows that convex-hull NMF compares favorably to convex-NMF in terms of both speed and reconstruction quality. We demonstrate that our method can easily be applied to large-scale, real-world datasets, in our case consisting of 750,000 DBLP entries, 4,000,000 digital images, and 150,000,000 votes on World of Warcraft (A (R))guilds, respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Propagation kernels: efficient graph kernels from propagated information

Marion Neumann, Roman Garnett, Christian Bauckhage, Kristian Kersting

MACHINE LEARNING (2016)

Article Computer Science, Information Systems

Fast moving pedestrian detection based on motion segmentation and new motion features

Shanshan Zhang, Dominik A. Klein, Christian Bauckhage, Armin B. Cremers

MULTIMEDIA TOOLS AND APPLICATIONS (2016)

Article Neurosciences

Prediction of successful memory encoding based on single-trial rhinal and hippocampal phase information

Marlene Hoehne, Amirhossein Jahanbekam, Christian Bauckhage, Nikolai Axmacher, Juergen Fell

NEUROIMAGE (2016)

Article Chemistry, Analytical

Simplex Volume Maximization (SiVM): A matrix factorization algorithm with non-negative constrains and low computing demands for the interpretation of full spectral X-ray fluorescence imaging data

Matthias Alfeld, Mirwaes Wahabzada, Christian Bauckhage, Kristian Kersting, Geert van der Snickt, Petria Noble, Koen Janssens, Gerd Wellenreuther, Gerald Falkenberg

MICROCHEMICAL JOURNAL (2017)

Article Neurosciences

Prediction of memory formation based on absolute electroencephalographic phases in rhinal cortex and hippocampus outperforms prediction based on stimulus-related phase shifts

Marlene Derner, Amirhossein Jahanbekam, Christian Bauckhage, Nikolai Axmacher, Juergen Fell

EUROPEAN JOURNAL OF NEUROSCIENCE (2018)

Article Biology

Agricultural plant cataloging and establishment of a data framework from UAV-based crop images by computer vision

Maurice Guender, Facundo R. Ispizua Yamati, Jana Kierdorf, Ribana Roscher, Anne-Katrin Mahlein, Christian Bauckhage

Summary: In this work, a hands-on workflow for the automatized temporal and spatial identification and individualization of crop images from UAVs is presented, improving the analysis and interpretation of UAV data in agriculture significantly. The results show that the approach has similar accuracy to more complex deep learning-based recognition techniques and can automate the processing of large datasets.

GIGASCIENCE (2022)

Article Computer Science, Information Systems

Towards Intelligent Food Waste Prevention: An Approach Using Scalable and Flexible Harvest Schedule Optimization With Evolutionary Algorithms

Maurice Gunder, Nico Piatkowski, Laura Von Rueden, Rafet Sifa, Christian Bauckhage

Summary: Efficient and economical usage of agricultural land is increasingly important in the face of climate change and resource scarcity. Intercropping of various plant species is recommended to avoid the disadvantages of monocropping, but it poses challenges due to the need for balanced planting schedules. The proposed flexible optimization method aims to address these challenges by combining evolutionary algorithms with a hierarchical loss function and adaptive mutation rate, leading to faster and better solutions for a sustainable crop harvesting season.

IEEE ACCESS (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Hopfield Networks for Vector Quantization

C. Bauckhage, R. Ramamurthy, R. Sifa

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II (2020)

Article Computer Science, Artificial Intelligence

Matrix- and Tensor Factorization for Game Content Recommendation

Rafet Sifa, Raheel Yawar, Rajkumar Ramamurthy, Christian Bauckhage, Kristian Kersting

KUNSTLICHE INTELLIGENZ (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Prototypes Within Minimum Enclosing Balls

Christian Bauckhage, Rafet Sifa, Tiansi Dong

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS (2019)

Proceedings Paper Computer Science, Artificial Intelligence

Policy Learning Using SPSA

R. Ramamurthy, C. Bauckhage, R. Sifa, S. Wrobel

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Simple Recurrent Neural Networks for Support Vector Machine Training

Rafet Sifa, Daniel Paurat, Daniel Trabold, Christian Bauckhage

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III (2018)

Proceedings Paper Computer Science, Artificial Intelligence

SPSA for Layer-Wise Training of Deep Networks

Benjamin Wulff, Jannis Schuecker, Christian Bauckhage

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III (2018)

Proceedings Paper Computer Science, Artificial Intelligence

A Neural Network Implementation of Frank-Wolfe Optimization

Christian Bauckhage

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2017, PT I (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Using Echo State Networks for Cryptography

Rajkumar Ramamurthy, Christian Bauckhage, Krisztian Buza, Stefan Wrobel

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II (2017)

暂无数据