4.7 Article

Automatic instance selection via locality constrained sparse representation for missing value estimation

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 85, Issue -, Pages 210-223

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2015.05.007

Keywords

Sparse representation; Locality constrained regularization; Instance selection; Missing value estimation

Funding

  1. National Natural Science Foundation of China (NSFC) [71271027]
  2. China Scholarship Council (CSC)
  3. Fundamental Research Funds for the Central Universities of China [FRF-TP-10-006B]
  4. Research Fund for the Doctoral Program of Higher Education [20120006110037]

Ask authors/readers for more resources

Missing values in real application can significantly disturb the result of knowledge discovery, and it is thus vital to estimate this unknown data accurately. This paper focuses on applying sparse representation to improve the quality of estimation of the absent values. Firstly, a novel sparse representation scheme called locality constrained sparse representation (LCSR) is presented, introducing locality l(1)-norm and l(2)-norm regularization. Taking the advantage of sparsity, smoothness and locality structure, LCSR is capable of automatically selecting instance and avoiding overfitting. Then LCSR-based missing value estimation (LCSR-MVE) is proposed to estimate the unobserved values through the linear combination of automatically selected atoms from dictionary due to the sparsity in reconstruction coefficient vector, while three dictionary constructions are also developed respectively. The proposed LCSR-MVE is evaluated on 6 datasets from UCI and gene expression databases, compared with other instance-based missing value estimation methods. Results show that the proposed LCSR-MVE outperforms other state-of-arts methods in terms of normalized root mean squared error (NRMSE), and is not much sensitive to the dictionary size and regularization parameters. (C) 2015 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Management

Modeling the dynamics of online review life cycle: Role of social and economic moderations

Guoyin Jiang, Jennifer Shang, Wenping Liu, Xiaodong Feng, Junli Lei

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH (2020)

Article Computer Science, Information Systems

Understanding user-to-User interaction on government microblogs: An exponential random graph model with the homophily and emotional effect

Jie Xiong, Xiaodong Feng, Zhiwei Tang

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Information Systems

Prediction of information cascades via content and structure proximity preserved graph level embedding

Xiaodong Feng, Qihang Zhao, Zhen Liu

Summary: The research focuses on understanding the mechanisms of dynamic popularity gain and learning low-dimensional representations of entire cascade graphs for prediction, reducing prediction errors significantly and improving training efficiency compared to baselines.

INFORMATION SCIENCES (2021)

Article Computer Science, Information Systems

Robust sparse coding via self-paced learning for data representation

Xiaodong Feng, Sen Wu

Summary: The study introduces the Self-Paced Sparse Coding (SPSC) framework, which enhances learning robustness by gradually incorporating data from easy to complex into the learning process of SC. The framework implements soft instance selection and generalizes the self-paced learning schema to different levels of dynamic selection. An optimization algorithm and theoretical explanation are provided to analyze the effectiveness of the method.

INFORMATION SCIENCES (2021)

Article Computer Science, Artificial Intelligence

Self-paced learning enhanced neural matrix factorization for noise-aware recommendation

Zhen Liu, Xiaodong Feng, Yecheng Wang, Wenbo Zuo

Summary: An enhanced neural matrix factorization model with a self-paced learning schema has been proposed, which can automatically distinguish noisy instances and learn the model mostly based on clean instances. The effectiveness of this method on collaborative filtering is demonstrated through extensive experiments on three widely used datasets.

KNOWLEDGE-BASED SYSTEMS (2021)

Article Computer Science, Artificial Intelligence

Learning Spatiotemporal Latent Factors of Traffic via Regularized Tensor Factorization: Imputing Missing Values and Forecasting

Abdelkader Baggag, Sofiane Abbar, Ankit Sharma, Tahar Zanouda, Abdulaziz Al-Homaid, Abhiraj Mohan, Jaideep Srivastava

Summary: Intelligent transportation systems play a crucial role in smart cities by estimating and predicting the spatiotemporal traffic state to improve operational efficiency and livability. However, challenges such as data sparsity, incompleteness, and noise still hinder traffic analytics. By utilizing tensor representation and regularized factorization method, missing data and noisy information can be effectively addressed for accurate traffic state prediction in road networks.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2021)

Article Computer Science, Information Systems

Effect of data environment and cognitive ability on participants' attitude towards data governance

Guoyin Jiang, Xingshun Cai, Xiaodong Feng, Wenping Liu

Summary: This study analyzes the attitude towards public participation in data governance within the context of a tourism platform. The findings indicate that factors such as data quality, website design, and platform interaction have a positive impact on users' attitude towards data governance, while data literacy self-efficacy acts as a suppressor or mediator in this relationship. Furthermore, the study provides theoretical and practical implications for government policy implementation and platform management.

JOURNAL OF INFORMATION SCIENCE (2023)

Article Computer Science, Information Systems

Understanding how the semantic features of contents influence the diffusion of government microblogs: Moderating role of content topics

Xiaodong Feng, Kangxin Hui, Xin Deng, Guoyin Jiang

Summary: This study examines the behavior mechanism of information diffusion on government microblogs and the effects of extensive textual features on different topics. A model test with real data from Sina Weibo reveals that positive words, city names, adjectives/adverbs, and dissimilar contents promote diffusion, while negative words hinder it. The study also discusses the varying influences of these features on political news and living information.

INFORMATION & MANAGEMENT (2021)

Article Business

Influence of Consumers' Temporary Affect on Ad Engagement: A Computational Research Approach

Xinyu Lu, Debarati Das, Jisu Huh, Jaideep Srivastava

Summary: Consumers' temporary affective states during ad exposure have a significant impact on their engagement with different types of ads. Consumers in a positive affective state are more likely to engage with high semantic-affinity ads, while those in a negative affective state are more likely to engage with more positively valenced ads. This study provides theoretical contributions and practical implications for ad targeting and placement strategies based on consumers' temporary affect.

JOURNAL OF ADVERTISING (2022)

Article Computer Science, Artificial Intelligence

AECasN: An information cascade predictor by learning the structural representation of the whole cascade network with autoencoder

Xiaodong Feng, Qihang Zhao, Yunkai Li

Summary: Research on the importance of predicting information cascade size has led to the development of a new deep learning framework - AECasN, which significantly improves the accuracy of information cascade prediction.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

Article Computer Science, Artificial Intelligence

Social recommendation via deep neural network-based multi-task learning

Xiaodong Feng, Zhen Liu, Wenbing Wu, Wenbo Zuo

Summary: The rapid development of social recommendation in recent years has greatly improved the performance of recommender systems, especially for the cold start problem. However, existing techniques based on matrix factorization do not effectively capture the complex nonlinear relationships between users and items, as well as between users themselves. To address this, deep learning is employed to model the social network-enhanced collaborative filtering problem. By simultaneously modeling the social and item domain interactions, the proposed SoNeuMF framework shows significant improvements in recommendation accuracy compared to state-of-the-art methods, as demonstrated by comprehensive experiments on real-world datasets.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

Article Computer Science, Information Systems

Understanding how the expression of online citizen petitions influences the government responses in China: An empirical study with automatic text analytics

Xiaodong Feng, Chaorui Wang, Juan Wang

Summary: Online participation is crucial for citizens to express their demands, and it is important for government agencies to respond to public petitions in a timely and effective manner. This study investigates the textual characteristics of citizen petitions and their impact on government response efficiency and outcomes. A theoretical model is constructed based on discourse theory and government pressure theory, and text analysis techniques are used to extract textual features from an online petition platform in a Chinese provincial government. The results show that negative sentiments, public interest, and detailed contents in petitions hinder response timeliness and lead to longer responses due to their complexity. These findings contribute to a better understanding of the dynamics between governments and citizens and have both theoretical and practical implications.

INFORMATION PROCESSING & MANAGEMENT (2023)

Article Computer Science, Interdisciplinary Applications

Towards popularity prediction of information cascades via degree distribution and deep neural networks

Xiaodong Feng, Qihang Zhao, RuiJie Zhu

Summary: Understanding and predicting paper citation dynamics is of interest, and modeling citation dynamics as an information cascade has attracted attention. However, existing deep learning-based prediction models focus on individual nodes, limiting robustness. To address this, we propose CasDENN, a sequential deep neural network that learns the dynamic structural representation of the entire cascade graph using degree distribution vectors at different timestamps as input. Experiments on academic paper citations and social media posts show significant improvement in prediction accuracy and reduced running time compared to baselines.

JOURNAL OF INFORMETRICS (2023)

Article Computer Science, Information Systems

RGSE: Robust Graph Structure Embedding for Anomalous Link Detection

Zhen Liu, Wenbo Zuo, Dongning Zhang, Xiaodong Feng

Summary: Anomalous links such as noisy links or adversarial edges are common in real-world networks, which can undermine the credibility of network studies, such as community detection in social networks. To address this issue, a robust graph structure embedding framework called RGSE is proposed, which utilizes link-level feature representations generated from both global embedding view and local stable view for anomalous link detection on contaminated graphs. Experimental results on various datasets show that the new model and its variants achieve up to an average 5.2% improvement in accuracy compared to traditional graph representation models. Further analysis provides interpretable evidence supporting the superiority of the model.

IEEE TRANSACTIONS ON BIG DATA (2023)

Article Communication

Relationship between Citizen-Eyewitness Images and Audience Engagement with News

Jisu Kim, Jisu Huh, Bhavtosh Rath, Aadesh Salecha, Jaideep Srivastava

Summary: The study found that U.S. newspapers tended to incorporate a small number of citizen-eyewitness images in their news reports, and this was positively related to audience engagement with the news.

JOURNALISM PRACTICE (2022)

Article Computer Science, Artificial Intelligence

Confidence-based and sample-reweighted test-time adaptation

Hao Yang, Min Wang, Zhengfei Yu, Hang Zhang, Jinshen Jiang, Yun Zhou

Summary: In this paper, a novel method called CSTTA is proposed for test time adaptation (TTA), which utilizes confidence-based optimization and sample reweighting to better utilize sample information. Extensive experiments demonstrate the effectiveness of the proposed method.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

A novel method for generating a canonical basis for decision implications based on object-induced three-way operators

Jin Liu, Ju-Sheng Mi, Dong-Yun Niu

Summary: This article focuses on a novel method for generating a canonical basis for decision implications based on object-induced operators (OE operators). The logic of decision implication based on OE operators is described, and a method for obtaining the canonical basis for decision implications is given. The completeness, nonredundancy, and optimality of the canonical basis are proven. Additionally, a method for generating true premises based on OE operators is proposed.

KNOWLEDGE-BASED SYSTEMS (2024)

Review Computer Science, Artificial Intelligence

Efficient utilization of pre-trained models: A review of sentiment analysis via prompt learning

Kun Bu, Yuanchao Liu, Xiaolong Ju

Summary: This paper discusses the importance of sentiment analysis and pre-trained models in natural language processing, and explores the application of prompt learning. The research shows that prompt learning is more suitable for sentiment analysis tasks and can achieve good performance.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

M-EDEM: A MNN-based Empirical Decomposition Ensemble Method for improved time series forecasting

Xiangjun Cai, Dagang Li

Summary: This paper presents a new decomposition mechanism based on learned decomposition mapping. By using a neural network to learn the relationship between original time series and decomposed results, the repetitive computation overhead during rolling decomposition is relieved. Additionally, extended mapping and partial decomposition methods are proposed to alleviate boundary effects on prediction performance. Comparative studies demonstrate that the proposed method outperforms existing RDEMs in terms of operation speed and prediction accuracy.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Privacy-preserving trust management method based on blockchain for cross-domain industrial IoT

Xu Wu, Yang Liu, Jie Tian, Yuanpeng Li

Summary: This paper proposes a blockchain-based privacy-preserving trust management architecture, which adopts federated learning to train task-specific trust models and utilizes differential privacy to protect device privacy. In addition, a game theory-based incentive mechanism and a parallel consensus protocol are proposed to improve the accuracy of trust computing and the efficiency of consensus.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

MV-ReID: 3D Multi-view Transformation Network for Occluded Person Re-Identification

Zaiyang Yu, Prayag Tiwari, Luyang Hou, Lusi Li, Weijun Li, Limin Jiang, Xin Ning

Summary: This study introduces a 3D view-based approach that effectively handles occlusions and leverages the geometric information of 3D objects. The proposed method achieves state-of-the-art results on occluded ReID tasks and exhibits competitive performance on holistic ReID tasks.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

City-scale continual neural semantic mapping with three-layer sampling and panoptic representation

Yongliang Shi, Runyi Yang, Zirui Wu, Pengfei Li, Caiyun Liu, Hao Zhao, Guyue Zhou

Summary: Neural implicit representations have gained attention due to their expressive, continuous, and compact properties. However, there is still a lack of research on city-scale continual implicit dense mapping based on sparse LiDAR input. In this study, a city-scale continual neural mapping system with a panoptic representation is developed, incorporating environment-level and instance-level modeling. A tailored three-layer sampling strategy and category-specific prior are proposed to address the challenges of representing geometric information in city-scale space and achieving high fidelity mapping of instances under incomplete observation.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

MDSSN: An end-to-end deep network on triangle mesh parameterization

Ruihan Hu, Zhi-Ri Tang, Rui Yang, Zhongjie Wang

Summary: Mesh data is crucial for 3D computer vision applications worldwide, but traditional deep learning frameworks have struggled with handling meshes. This paper proposes MDSSN, a simple mesh computation framework that models triangle meshes and represents their shape using face-based and edge-based Riemannian graphs. The framework incorporates end-to-end operators inspired by traditional deep learning frameworks, and includes dedicated modules for addressing challenges in mesh classification and segmentation tasks. Experimental results demonstrate that MDSSN outperforms other state-of-the-art approaches.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Semi-supervised learning with missing values imputation

Buliao Huang, Yunhui Zhu, Muhammad Usman, Huanhuan Chen

Summary: This paper proposes a novel semi-supervised conditional normalizing flow (SSCFlow) algorithm that combines unsupervised imputation and supervised classification. By estimating the conditional distribution of incomplete instances, SSCFlow facilitates imputation and classification simultaneously, addressing the issue of separated tasks ignoring data distribution and label information in traditional methods.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Emotion-and-knowledge grounded response generation in an open-domain dialogue setting

Deeksha Varshney, Asif Ekbal, Erik Cambria

Summary: This paper focuses on the neural-based interactive dialogue system that aims to engage and retain humans in long-lasting conversations. It proposes a new neural generative model that combines step-wise co-attention, self-attention-based transformer network, and an emotion classifier to control emotion and knowledge transfer during response generation. The results from quantitative, qualitative, and human evaluation show that the proposed models can generate natural and coherent sentences, capturing essential facts with significant improvement over emotional content.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

MvTS-library: An open library for deep multivariate time series forecasting

Junchen Ye, Weimiao Li, Zhixin Zhang, Tongyu Zhu, Leilei Sun, Bowen Du

Summary: Modeling multivariate time series has long been a topic of interest for scholars in various fields. This paper introduces MvTS, an open library based on Pytorch, which provides a unified framework for implementing and evaluating these models. Extensive experiments on public datasets demonstrate the effectiveness and universality of the models reproduced by MvTS.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

An adaptive hybrid mutated differential evolution feature selection method for low and high-dimensional medical datasets

Reham R. Mostafa, Ahmed M. Khedr, Zaher Al Aghbari, Imad Afyouni, Ibrahim Kamel, Naveed Ahmed

Summary: Feature selection is crucial in classification procedures, but it faces challenges in high-dimensional datasets. To overcome these challenges, this study proposes an Adaptive Hybrid-Mutated Differential Evolution method that incorporates the mechanics of the Spider Wasp Optimization algorithm and the concept of Enhanced Solution Quality. Experimental results demonstrate the effectiveness of the method in terms of accuracy and convergence speed, and it outperforms contemporary cutting-edge algorithms.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

TCM Model for improving track sequence classification in real scenarios with Multi-Feature Fusion and Transformer Block

Ti Xiang, Pin Lv, Liguo Sun, Yipu Yang, Jiuwu Hao

Summary: This paper introduces a Track Classification Model (TCM) based on marine radar, which can effectively recognize and classify shipping tracks. By using a feature extraction network with multi-feature fusion and a dataset production method to address missing labels, the classification accuracy is improved, resulting in successful engineering application in real scenarios.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Language model as an Annotator: Unsupervised context-aware quality phrase generation

Zhihao Zhang, Yuan Zuo, Chenghua Lin, Junjie Wu

Summary: This paper proposes a novel unsupervised context-aware quality phrase mining framework called LMPhrase, which is built upon large pre-trained language models. The framework mines quality phrases as silver labels using a parameter-free probing technique on the pre-trained language model BERT, and formalizes the phrase tagging task as a sequence generation problem by fine-tuning on the Sequence to-Sequence pre-trained language model BART. The results of extensive experiments show that LMPhrase consistently outperforms existing competitors in two different granularity phrase mining tasks.

KNOWLEDGE-BASED SYSTEMS (2024)

Article Computer Science, Artificial Intelligence

Stochastic Gradient Descent for matrix completion: Hybrid parallelization on shared- and distributed-memory systems

Kemal Buyukkaya, M. Ozan Karsavuran, Cevdet Aykanat

Summary: The study aims to investigate the hybrid parallelization of the Stochastic Gradient Descent (SGD) algorithm for solving the matrix completion problem on a high-performance computing platform. A hybrid parallel decentralized SGD framework with asynchronous inter-process communication and a novel flexible partitioning scheme is proposed to achieve scalability up to hundreds of processors. Experimental results on real-world benchmark datasets show that the proposed algorithm achieves 6x higher throughput on sparse datasets compared to the state-of-the-art, while achieving comparable throughput on relatively dense datasets.

KNOWLEDGE-BASED SYSTEMS (2024)