Article
Computer Science, Artificial Intelligence
Yubo Zhu, Wentian Zhao, Rui Hua, Xinxiao Wu
Summary: Video summarization is the task of generating a concise and compact summary to represent the original video. Existing methods focus on extracting objective summaries that accurately summarize the video content. However, videos often contain diverse content with multiple topics, and people may have different interests in the visual contents of the same video. In this paper, we propose a novel topic-aware video summarization task that generates multiple video summaries with different topics. We build a benchmark dataset and propose a multimodal Transformer model to address this task, achieving effective results.
PATTERN RECOGNITION
(2023)
Article
Computer Science, Information Systems
Xueming Qian, Yuxia Wu, Mingdi Li, Yayun Ren, Shuhui Jiang, Zhetao Li
Summary: This paper introduces a City-POI-LOI (CPL) summarization method to automatically mine POIs from city-level landmark images, and proposes a Location-Appearance-Semantic-Temporal (LAST) clustering method to mine popular viewpoints termed Location-Of-Interest (LOI) in each POI. Experimental results demonstrate the effectiveness of the proposed POI summarization approach.
IEEE TRANSACTIONS ON MULTIMEDIA
(2021)
Article
Computer Science, Artificial Intelligence
JunHo Yoon, GyuHo Choi, Chang Choi
Summary: Recently, research has focused on multimodal learning for detecting disinformation in multimedia using all modality information. Existing methods for multimodal learning include score-level fusion and feature-level fusion. However, there are limitations in late-level fusion methods, as the recognition performance of a unimodal determines the overall performance, and there are constraints in matching data across modalities. In this study, a classification system called RoBERTaMFT is proposed, which addresses these limitations by using a co-learning method to improve recognition performance and balance data among modalities. Experimental results show that RoBERTaMFT outperforms unimodal learning and existing multimodal learning in terms of accuracy and f1-score.
INFORMATION FUSION
(2023)
Article
Computer Science, Information Systems
Michael Moses Thiruthuvanathan, Balachandran Krishnan
Summary: A model was developed to accurately acquire keyframes through hierarchical summarization for facial detection and emotional recognition, significantly improving accuracy. Emotional prediction achieved 90% accuracy on Indian faces, and computational requirements were reduced by 40%.
MULTIMEDIA TOOLS AND APPLICATIONS
(2022)
Article
Computer Science, Information Systems
Bohui Xia, Xueting Wang, Toshihiko Yamasaki
Summary: This study introduces a method to generate richer and more easily interpretable predictive explanations by considering the interactions between features, with experiments demonstrating its superior performance compared to existing methods.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2021)
Article
Computer Science, Artificial Intelligence
Yong Guan, Shaoru Guo, Ru Li, Xiaoli Li, Hu Zhang
Summary: The paper introduces a new Frame Semantics guided network for Abstractive Sentence Summarization, which can learn better text semantic representation and significantly outperforms existing techniques in extensive experiments.
KNOWLEDGE-BASED SYSTEMS
(2021)
Article
Computer Science, Information Systems
Jiehang Xie, Xuanbai Chen, Tianyi Zhang, Yixuan Zhang, Shao-Ping Lu, Pablo Cesar, Yulu Yang
Summary: This paper presents a multimodal-based and aesthetic-guided method for narrative video summarization, which effectively preserves important narrative information and quickly produces high-quality summaries.
IEEE TRANSACTIONS ON MULTIMEDIA
(2023)
Article
Computer Science, Hardware & Architecture
Shu-Ching Chen
Summary: Multimedia data analysis is essential in multimedia research, mining patterns, information, and knowledge from the data for various applications. Multimodal multimedia data analysis techniques have shown better performance than unimodal methods, improving the reliability of data analysis systems and applications. Further research is needed to address challenges in analyzing multimodal multimedia data.
Article
Engineering, Electrical & Electronic
Ye Yuan, Jiawan Zhang
Summary: In this paper, an unsupervised video summarization approach via reinforcement learning with shot-level semantics is proposed. The approach utilizes an encoder-decoder model to extract convolutional feature matrix from the video using a convolutional neural network as an encoder. A bidirectional LSTM is then used as a decoder to obtain probability weights for selecting keyframes to preserve spatio-temporal dependence. A shot-level semantic reward function is designed to reduce the influence of user subjectivity in generating more representative summarization results. The approach outperforms others and achieves satisfactory results according to evaluation on four classical datasets.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2023)
Article
Computer Science, Information Systems
Kazuma Ohtomo, Ryosuke Harakawa, Takahiro Ogawa, Miki Haseyama, Masahiro Iwahashi
Summary: This paper introduces a user-centric method of multimodal feature extraction for personalized retrieval of Tumblr posts by incorporating a triplet loss into multivariational autoencoder. By considering user preferences, the proposed method can effectively extract relationships between text- and image-related features, leading to improved performance in post retrieval algorithms.
MULTIMEDIA TOOLS AND APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Helena Liz-Lopez, Mamadou Keita, Abdelmalik Taleb-Ahmed, Abdenour Hadid, Javier Huertas-Tato, David Camacho
Summary: Generative deep learning techniques have been widely discussed in the public, but the slow progress in applying these techniques to counter disinformation is concerning. With the ease and credibility of manipulating multimedia content, developing effective forensic techniques becomes invaluable. This survey comprehensively describes modern manipulation and forensic techniques, focusing on their applications in video, audio, and multimodal fusion. The classification of manipulation techniques and the generation of datasets using generative techniques are provided for forensic purposes. The review and comparative analysis of forensic techniques from 2018 to 2023, as well as the comparison of end-to-end forensic tools for end-users, are presented. Clear trends and challenges, such as multilinguality, multimodality, and improving data quality, are identified for future research in an ever-changing adversarial environment.
INFORMATION FUSION
(2024)
Article
Computer Science, Artificial Intelligence
Bin Zhao, Maoguo Gong, Xuelong Li
Summary: The paper introduces a hierarchical multimodal Transformer model for video summarization, which effectively captures dependencies among video frames and shots, utilizes audio and visual information for summarization, and outperforms traditional, RNN-based, and attention-based video summarization methods on the SumMe and TVsum datasets.
Article
Computer Science, Artificial Intelligence
Parul Saini, Krishan Kumar, Shamal Kashid, Ashray Saini, Alok Negi
Summary: Video summarization is an important multimedia analysis problem in the digital world today. Current deep learning-based methods for video summarization are inefficient in extracting information from long-duration videos in a timely manner. This study conducts a detailed analysis and investigation of various deep learning techniques to address the issues associated with identifying and summarizing key activities in videos. The limitations of each category are discussed, along with suggested strategies for evaluating and improving video summaries.
ARTIFICIAL INTELLIGENCE REVIEW
(2023)
Article
Chemistry, Multidisciplinary
Theodoros Psallidas, Panagiotis Koromilas, Theodoros Giannakopoulos, Evaggelos Spyrou
Summary: This approach utilizes both aural and visual features to create dynamic video summaries from user-generated videos, training a classifier to recognize important parts of the videos. Additionally, a novel dataset with videos from various categories has been introduced to evaluate the approach, showing its potential in video summarization.
APPLIED SCIENCES-BASEL
(2021)
Article
Neurosciences
Jiaqing Tong, Jeffrey R. Binder, Colin Humphries, Stephen Mazurchuk, Lisa L. Conant, Leonardo Fernandino
Summary: Neuroimaging studies have found that lexical concepts are represented across a network of high-level cortical regions, especially those in the default mode network, which encode multimodal experiential information.
JOURNAL OF NEUROSCIENCE
(2022)
Article
Computer Science, Information Systems
Farzad Tashtarian, Abdelhak Bentaleb, Alireza Erfanian, Hermann Hellwagner, Christian Timmerer, Roger Zimmermann
Summary: This paper proposes a novel architecture, HxL3, for low-latency live streaming. By implementing efficient caching and prefetching policies, HxL3 minimizes the number of live media segments, reducing rebuffering and startup delay, and achieving high-quality live streaming experiences.
IEEE TRANSACTIONS ON MULTIMEDIA
(2023)
Article
Computer Science, Artificial Intelligence
Ying Zhang, Yonit Zall, Ronen Nissim, Satyam, Roger Zimmermann
Summary: Automated visual evaluation (AVE) is a promising method for detecting and diagnosing cervical precancerous lesions through deep learning classifier analysis of images. The introduction of a new dataset (EVA dataset) collected using a mobile colposcope shows potential challenges for high-grade SIL diagnosis and AVE classifier development. The results suggest that a deep learning framework is effective for high-grade SIL diagnosis but improvements are needed, especially for the EVA dataset.
EXPERT SYSTEMS WITH APPLICATIONS
(2022)
Article
Automation & Control Systems
Xing Zhang, Wei Sun, Jin Zheng, Min Xue, Chenjun Tang, Roger Zimmermann
Summary: This study focuses on indoor WiFi fingerprint localization and proposes a floor identification module and a fingerprint graph attention mechanism. By comprehensively analyzing fingerprint attributes and using a two-panel fingerprint homogeneity graph, the experimental results show that the proposed method achieves better performance in floor identification and 2-D geometric positioning.
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS
(2022)
Article
Computer Science, Information Systems
Jiwei Zhang, Yi Yu, Suhua Tang, Jianming Wu, Wei Li
Summary: Cross-modal retrieval is a popular topic in information retrieval, machine learning, and databases. The major challenge is to measure the similarity between different modality data effectively. Current methods struggle to extract features from multi-modal information. In this article, we propose a novel variational autoencoder architecture that improves the performance of audio-visual cross-modal retrieval.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Wei Duan, Yi Yu, Xulong Zhang, Suhua Tang, Wei Li, Keizo Oyama
Summary: This article proposes a model for melody generation from lyrics with local interpretability, which enhances the understanding of the relationship between input lyrics and generated melodies.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2023)
Editorial Material
Computer Science, Information Systems
Zhenguang Liu, Roger Zimmermann, Li Cheng
MULTIMEDIA SYSTEMS
(2023)
Article
Chemistry, Analytical
Hafiz Hasnain Imtiaz, Suhua Tang
Summary: The fifth-generation (5G) wireless network enables low latency services in Internet of Things (IoT) networks. However, IoT nodes lack computational capabilities for real-time complex tasks. To address this issue, multi-access edge computing (MEC) allows IoT nodes to offload their computational tasks to MEC servers. This paper proposes a method that combines relay selection and adaptive bandwidth allocation to improve the efficiency of multi-task partial offloading in IoT networks. Simulation results show that the proposed method outperforms other methods without these functions or with only one of them.
Article
Transportation
Yutong Xia, Huanfa Chen, Roger Zimmermann
Summary: This study proposes a framework of Random Effect-Bayesian Neural Network (RE-BNN) for predicting and explaining travel mode choice across multiple regions. The results show that this model outperforms the plain Deep Neural Network (DNN) in terms of prediction accuracy and is more robust across different datasets. Additionally, the capability of the RE-BNN model to learn travel behaviors across regions is demonstrated through offset utilities, choice probability functions, and local travel mode shares.
TRAVEL BEHAVIOUR AND SOCIETY
(2023)
Article
Chemistry, Analytical
Jingyang Zhou, Suhua Tang
Summary: In a wireless sensor network, conventional methods for data collection and computation have scalability issues and transmission collisions. Using over-the-air computation (AirComp) can efficiently perform data collection and computation, but it has problems with low channel gain and computation errors. To solve these problems, this paper investigates relay communication for AirComp and proposes a relay selection protocol. The proposed method helps to prolong network lifetime and reduce computation errors.
Article
Computer Science, Artificial Intelligence
Yuxuan Liang, Kun Ouyang, Yiwei Wang, Zheyi Pan, Yifang Yin, Hongyang Chen, Junbo Zhang, Yu Zheng, David S. Rosenblum, Roger Zimmermann
Summary: Spatio-temporal forecasting has various applications in smart cities, but the state-of-the-art method, GCRNN, fails to consider higher-order spatial relations and underlying physics in real-world systems. Therefore, we propose MixRNN+, a general model that captures complex spatial relations and addresses underlying physics, for spatio-temporal forecasting. Experimental results on three forecasting tasks demonstrate the superiority of MixRNN+ against existing methods, and a cloud-based system using MixRNN+ as the bedrock model showcases its practicality.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2023)
Article
Engineering, Electrical & Electronic
Suhua Tang, Petar Popovski, Chao Zhang, Sadao Obana
Summary: This paper proposes a multi-slot over-the-air computation (MS-AirComp) framework for fading channels in IoT systems, which improves channel gains and reduces signal distortion by utilizing multiple slots. The closed-form of the computation error is derived through theoretical analysis, and optimal parameters are found. Simulations show that the proposed method effectively reduces computation error.
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS
(2023)
Article
Computer Science, Information Systems
Suhua Tang, Sadao Obana
Summary: Pedestrian-to-vehicle communication is crucial for preventing pedestrian accidents, especially when pedestrians are in blind spots. However, in urban canyons, buildings obstruct satellite signals, leading to interruptions in pedestrian positioning. This paper proposes using vehicles and roadside units as positioning anchors to address this issue.
Article
Computer Science, Artificial Intelligence
Junfeng Hu, Yuxuan Liang, Zhencheng Fan, Li Liu, Yifang Yin, Roger Zimmermann
Summary: Sensors are crucial for environmental monitoring in smart cities, but it is impractical to deploy massive sensors due to high costs, resulting in sparse data collection. This article focuses on inferring values at nonsensor locations based on observations from available sensors (spatiotemporal inference) by capturing relationships among the data. The investigations reveal distinct patterns at both long and short-term temporal scales, and propose decoupling the modeling of short and long-term patterns. Experimental results demonstrate the effectiveness of the proposed method in capturing both long and short-term relations.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Ying Zhang, Lilei Zheng, Vrizlynn L. L. Thing, Roger Zimmermann, Bin Guo, Zhiwen Yu
Summary: Face verification is commonly used to verify someone's identity, but it can be vulnerable to face spoofing attacks. To enhance security and reduce computational and storage costs, a new system has been developed that learns a single and universal face descriptor for both face verification and liveness detection.
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Tiancong Cheng, Ying Zhang, Yifang Yin, Roger Zimmermann, Zhiwen Yu, Bin Guo
Summary: This paper proposes a compressed multitask model that performs face recognition and face anti-spoofing tasks simultaneously in a lightweight manner, reducing the redundancy of the original dual-model. By using a multi-teacher-assisted knowledge distillation method and feature alignment, satisfying performance is achieved with significant reductions in model size and inference time.
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023
(2023)
Article
Computer Science, Artificial Intelligence
Hao Yang, Min Wang, Zhengfei Yu, Hang Zhang, Jinshen Jiang, Yun Zhou
Summary: In this paper, a novel method called CSTTA is proposed for test time adaptation (TTA), which utilizes confidence-based optimization and sample reweighting to better utilize sample information. Extensive experiments demonstrate the effectiveness of the proposed method.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Jin Liu, Ju-Sheng Mi, Dong-Yun Niu
Summary: This article focuses on a novel method for generating a canonical basis for decision implications based on object-induced operators (OE operators). The logic of decision implication based on OE operators is described, and a method for obtaining the canonical basis for decision implications is given. The completeness, nonredundancy, and optimality of the canonical basis are proven. Additionally, a method for generating true premises based on OE operators is proposed.
KNOWLEDGE-BASED SYSTEMS
(2024)
Review
Computer Science, Artificial Intelligence
Kun Bu, Yuanchao Liu, Xiaolong Ju
Summary: This paper discusses the importance of sentiment analysis and pre-trained models in natural language processing, and explores the application of prompt learning. The research shows that prompt learning is more suitable for sentiment analysis tasks and can achieve good performance.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Xiangjun Cai, Dagang Li
Summary: This paper presents a new decomposition mechanism based on learned decomposition mapping. By using a neural network to learn the relationship between original time series and decomposed results, the repetitive computation overhead during rolling decomposition is relieved. Additionally, extended mapping and partial decomposition methods are proposed to alleviate boundary effects on prediction performance. Comparative studies demonstrate that the proposed method outperforms existing RDEMs in terms of operation speed and prediction accuracy.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Xu Wu, Yang Liu, Jie Tian, Yuanpeng Li
Summary: This paper proposes a blockchain-based privacy-preserving trust management architecture, which adopts federated learning to train task-specific trust models and utilizes differential privacy to protect device privacy. In addition, a game theory-based incentive mechanism and a parallel consensus protocol are proposed to improve the accuracy of trust computing and the efficiency of consensus.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Zaiyang Yu, Prayag Tiwari, Luyang Hou, Lusi Li, Weijun Li, Limin Jiang, Xin Ning
Summary: This study introduces a 3D view-based approach that effectively handles occlusions and leverages the geometric information of 3D objects. The proposed method achieves state-of-the-art results on occluded ReID tasks and exhibits competitive performance on holistic ReID tasks.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Yongliang Shi, Runyi Yang, Zirui Wu, Pengfei Li, Caiyun Liu, Hao Zhao, Guyue Zhou
Summary: Neural implicit representations have gained attention due to their expressive, continuous, and compact properties. However, there is still a lack of research on city-scale continual implicit dense mapping based on sparse LiDAR input. In this study, a city-scale continual neural mapping system with a panoptic representation is developed, incorporating environment-level and instance-level modeling. A tailored three-layer sampling strategy and category-specific prior are proposed to address the challenges of representing geometric information in city-scale space and achieving high fidelity mapping of instances under incomplete observation.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Ruihan Hu, Zhi-Ri Tang, Rui Yang, Zhongjie Wang
Summary: Mesh data is crucial for 3D computer vision applications worldwide, but traditional deep learning frameworks have struggled with handling meshes. This paper proposes MDSSN, a simple mesh computation framework that models triangle meshes and represents their shape using face-based and edge-based Riemannian graphs. The framework incorporates end-to-end operators inspired by traditional deep learning frameworks, and includes dedicated modules for addressing challenges in mesh classification and segmentation tasks. Experimental results demonstrate that MDSSN outperforms other state-of-the-art approaches.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Buliao Huang, Yunhui Zhu, Muhammad Usman, Huanhuan Chen
Summary: This paper proposes a novel semi-supervised conditional normalizing flow (SSCFlow) algorithm that combines unsupervised imputation and supervised classification. By estimating the conditional distribution of incomplete instances, SSCFlow facilitates imputation and classification simultaneously, addressing the issue of separated tasks ignoring data distribution and label information in traditional methods.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Deeksha Varshney, Asif Ekbal, Erik Cambria
Summary: This paper focuses on the neural-based interactive dialogue system that aims to engage and retain humans in long-lasting conversations. It proposes a new neural generative model that combines step-wise co-attention, self-attention-based transformer network, and an emotion classifier to control emotion and knowledge transfer during response generation. The results from quantitative, qualitative, and human evaluation show that the proposed models can generate natural and coherent sentences, capturing essential facts with significant improvement over emotional content.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Junchen Ye, Weimiao Li, Zhixin Zhang, Tongyu Zhu, Leilei Sun, Bowen Du
Summary: Modeling multivariate time series has long been a topic of interest for scholars in various fields. This paper introduces MvTS, an open library based on Pytorch, which provides a unified framework for implementing and evaluating these models. Extensive experiments on public datasets demonstrate the effectiveness and universality of the models reproduced by MvTS.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Reham R. Mostafa, Ahmed M. Khedr, Zaher Al Aghbari, Imad Afyouni, Ibrahim Kamel, Naveed Ahmed
Summary: Feature selection is crucial in classification procedures, but it faces challenges in high-dimensional datasets. To overcome these challenges, this study proposes an Adaptive Hybrid-Mutated Differential Evolution method that incorporates the mechanics of the Spider Wasp Optimization algorithm and the concept of Enhanced Solution Quality. Experimental results demonstrate the effectiveness of the method in terms of accuracy and convergence speed, and it outperforms contemporary cutting-edge algorithms.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Ti Xiang, Pin Lv, Liguo Sun, Yipu Yang, Jiuwu Hao
Summary: This paper introduces a Track Classification Model (TCM) based on marine radar, which can effectively recognize and classify shipping tracks. By using a feature extraction network with multi-feature fusion and a dataset production method to address missing labels, the classification accuracy is improved, resulting in successful engineering application in real scenarios.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Zhihao Zhang, Yuan Zuo, Chenghua Lin, Junjie Wu
Summary: This paper proposes a novel unsupervised context-aware quality phrase mining framework called LMPhrase, which is built upon large pre-trained language models. The framework mines quality phrases as silver labels using a parameter-free probing technique on the pre-trained language model BERT, and formalizes the phrase tagging task as a sequence generation problem by fine-tuning on the Sequence to-Sequence pre-trained language model BART. The results of extensive experiments show that LMPhrase consistently outperforms existing competitors in two different granularity phrase mining tasks.
KNOWLEDGE-BASED SYSTEMS
(2024)
Article
Computer Science, Artificial Intelligence
Kemal Buyukkaya, M. Ozan Karsavuran, Cevdet Aykanat
Summary: The study aims to investigate the hybrid parallelization of the Stochastic Gradient Descent (SGD) algorithm for solving the matrix completion problem on a high-performance computing platform. A hybrid parallel decentralized SGD framework with asynchronous inter-process communication and a novel flexible partitioning scheme is proposed to achieve scalability up to hundreds of processors. Experimental results on real-world benchmark datasets show that the proposed algorithm achieves 6x higher throughput on sparse datasets compared to the state-of-the-art, while achieving comparable throughput on relatively dense datasets.
KNOWLEDGE-BASED SYSTEMS
(2024)