Article
Computer Science, Artificial Intelligence
Hashem Parvin, Ahmad Reza Naghsh-Nilchi, Hossein Mahvash Mohammadi
Summary: This paper proposes a transformer-based image captioning structure to address the challenges and limitations in image captioning. By designing a generator network and a selector network, textual descriptions are generated collaboratively and the text-image relation is learned. Experimental results demonstrate that the proposed approach outperforms state-of-the-art models on COCO and Flickr datasets.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Article
Business
Fang Da, Gang Kou, Yi Peng
Summary: This paper proposes a deep-learning-based dual encoder retrieval model for improving the performance of citation recommendation. It encodes the input query and paper titles into semantic vectors, matches them to compute similarity scores, and generates a sorted list of documents.
TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE
(2022)
Article
Biochemical Research Methods
Hiroki Konishi, Rui Yamaguchi, Kiyoshi Yamaguchi, Yoichi Furukawa, Seiya Imoto
Summary: Recent advancements in nanopore sequencing technology have enabled cost-effective long-read sequencing, making precise DNA sequence analysis increasingly important. Researchers have developed a new basecaller, Halcyon, incorporating neural network techniques and monotonic-attention mechanisms to learn semantic correspondences between nucleotides and signal levels. Evaluation with human whole-genome sequencing data showed that Halcyon outperformed existing basecallers and achieved competitive performance against the latest basecallers from Oxford Nanopore Technologies.
Article
Automation & Control Systems
Hashem Parvin, Ahmad Reza Naghsh-Nilchi, Hossein Mahvash Mohammadi
Summary: Image captioning is the process of generating a human-like description for a query image, which has recently gained significant attention. The most commonly used model for image description is an encoder-decoder structure, where the encoder extracts visual information and the decoder generates textual descriptions. Transformers have greatly improved the performance, but they struggle to consider complex relationships between key and query vectors. This paper proposes a new double-attention framework, utilizing a local generator module and a global generator module to collaboratively predict textual descriptions and improve the performance of image description.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2023)
Article
Construction & Building Technology
Yuan Gao, Yingjun Ruan
Summary: This paper proposes three interpretable encoder and decoder models based on LSTM and self-attention to improve the interpretability of deep learning models. In a case study, the addition of future real weather information only slightly improved the MAPE, and the model's attention to different time steps and features was discussed. The most important features were identified as daily max temperature, mean temperature, min temperature, and dew point temperature, with other features like pressure, wind speed, and holidays receiving lower weights.
ENERGY AND BUILDINGS
(2021)
Article
Computer Science, Information Systems
Himanshu Sharma, Swati Srivastava
Summary: This paper presents a method for image captioning using a Local Relation Network (LRN) to understand the semantic concepts of objects and their relationships in an image. The proposed model achieves superior performance on three benchmark datasets, demonstrating its effectiveness in generating natural language descriptions.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Xiaoyan Cai, Nanxin Wang, Libin Yang, Xin Mei
Summary: This study proposes a network representation model called GLNNR, which integrates global and local neighborhoods to obtain better node representation in the network. Empirical experiments prove the effectiveness of GLNNR.
APPLIED INTELLIGENCE
(2022)
Article
Computer Science, Information Systems
Xian Zhong, Guozhang Nie, Wenxin Huang, Wenxuan Liu, Bo Ma, Chia-Wen Lin
Summary: The proposed image captioning scheme based on adaptive spatial information attention (ASIA) effectively extracts spatial information of salient objects, utilizes different techniques in encoding and decoding stages, improving captioning performance according to extensive experiments on two datasets.
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION
(2021)
Article
Computer Science, Artificial Intelligence
Himanshu Sharma, Swati Srivastava
Summary: This paper presents an algorithm for image captioning that utilizes a local relation network and multilevel attention to understand semantic concepts in an image, improving the image representation and resulting in improved caption generation.
NEURAL PROCESSING LETTERS
(2023)
Article
Computer Science, Information Systems
Md. Shahir Zaoad, M. M. Rushadul Mannan, Angshu Bikash Mandol, Mostafizur Rahman, Md Adnanul Islam, Md. Mahbubur Rahman
Summary: Video captioning is an automated process that generates captions for videos by understanding their content. This research focuses on Bengali video captioning, which is an underexplored area compared to English video captioning. The study implements sequence-to-sequence models like LSTM, BiLSTM, and GRU combined with CNN models VGG-19, Inceptionv3, and ResNet50v2 to extract video frame features and generate textual descriptions. Attention mechanism is also incorporated for the first time in Bengali video captioning. A novel Bengali video captioning dataset is created from Microsoft Research Video Description Corpus (MSVD) dataset, and the model's performance is evaluated using popular metrics such as BLEU, METEOR, and ROUGE. The proposed attention-based hybrid model outperforms existing models and sets a new benchmark for Bengali video captioning.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2023)
Article
Computer Science, Artificial Intelligence
Mohit Singh, Vijay Laxmi, Parvez Faruki
Summary: Haze severely affects computer vision algorithms by degrading the quality of captured images and results in image data loss. This paper proposes a novel end-to-end Encoder-decoder architecture to learn the residual haze layer between the hazy and haze-free image, and experimental results demonstrate significant improvement over other methods under different haze conditions.
APPLIED INTELLIGENCE
(2022)
Review
Computer Science, Artificial Intelligence
Peipei Wang, Lin Li, Ru Wang, Xinhao Zheng, Jiaxi He, Guandong Xu
Summary: This study proposes a review-based recommendation model based on personalized sentimental interactive representation learning. The model simultaneously learns fragment-level and sequence-level personalized sentimental representations to capture differences in users' sentimental expression styles and language usage habits.
EXPERT SYSTEMS WITH APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Wanting Ji, Ruili Wang, Yan Tian, Xun Wang
Summary: Video captioning is an important task in multimedia processing, and traditional approaches only utilize visual information to generate captions. This paper proposes a novel attention based dual learning approach (ADL) that improves the quality of video captions by minimizing the differences between generated and raw videos.
APPLIED SOFT COMPUTING
(2022)
Article
Computer Science, Information Systems
Ahmed Iqbal, Muhammad Sharif
Summary: Accurate breast lesion segmentation is essential for breast cancer treatment planning. This study proposes a multiscale dual attention-based network (MDA-Net) for concurrent segmentation of breast lesion images. The MDA-Net achieves high performance on various datasets and demonstrates broad applicability in different imaging modalities.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2022)
Article
Computer Science, Hardware & Architecture
Haixia Xu, Yunjia Huang, Edwin R. Hancock, Shuailong Wang, Qijun Xuan, Wei Zhou
Summary: This paper proposes an Encoder-Decoder network for image semantic segmentation using pooling SE-ResNet attention module, called PAEDN, to address the challenge of poor pixel-consistency in inter-category and pixel-similarity in inter-category. Experimental evaluations on PASCAL and Cityscapes datasets show that the proposed method achieves good pixel-consistency semantic label and a 15.1% improvement over FCN.
COMPUTERS & ELECTRICAL ENGINEERING
(2021)